diff options
Diffstat (limited to 'site/docs/tutorials/cloud-capacity-planning.mdx')
| -rw-r--r-- | site/docs/tutorials/cloud-capacity-planning.mdx | 277 |
1 files changed, 0 insertions, 277 deletions
diff --git a/site/docs/tutorials/cloud-capacity-planning.mdx b/site/docs/tutorials/cloud-capacity-planning.mdx deleted file mode 100644 index df9cb566..00000000 --- a/site/docs/tutorials/cloud-capacity-planning.mdx +++ /dev/null @@ -1,277 +0,0 @@ ---- -sidebar_position: 1 -title: Cloud Capacity Planning -hide_title: true -sidebar_label: Cloud Capacity Planning -description: Cloud Capacity Planning ---- - -# Cloud Capacity Planning Tutorial - -Using OpenDC to plan and design cloud datacenters. - -:::info Learning goal - -By doing this assignment, you will learn more about the basic concepts of datacenters and cloud computing systems, as -well as how such systems are planned and designed. -::: - -## Preamble - -Datacenter infrastructure is important in today’s digital society. Stakeholders across industry, government, and -academia employ a vast and diverse array of cloud services hosted by datacenter infrastructure, and expect services to -be reliable, high speed, and low cost. In turn, datacenter operators must maintain efficient operation at unprecedented -scale. - -To keep up with growing demand and increasing complexity, architects of datacenters must address complex challenges in -distributed systems, software engineering and performance engineering. One of these challenges is efficient utilization -of resources in datacenters, which is only 6-12% industry-wide despite the fact that it is inconvenient for datacenter -operators to keep much of their infrastructure idle, due to resulting high energy consumption and thus unnecessary -costs. - -It is often quite difficult to implement optimizations or other changes in datacenters. Datacenter operators tend to be -conservative in adopting such changes in fear of failure or misbehaving systems. Furthermore, testing changes at the -scale of modern datacenter infrastructure in a real-world setting is prohibitively expensive and hard to reproduce, -notwithstanding environmental concerns. - -A more viable alternative is the use of datacenter simulators such as OpenDC or CloudSim. These tools model datacenter -infrastructure at a good accuracy and allow us to test changes in a controllable and repeatable environment. - -In this tutorial, we will use the OpenDC datacenter simulator to experiment with datacenters and demonstrate the -process of designing and optimizing datacenters using simulation. - -## What is OpenDC - -OpenDC is an open source platform for datacenter simulation developed by AtLarge Research. The purpose of OpenDC is -twofold: we aim to both enable cloud computing education and support research into datacenters. -An example of the former is this tutorial, and examples of the latter include the numerous BSc and MSc research projects -that are using OpenDC to run experiments and perform research. - -:::caution - -OpenDC is still an experimental tool. Your data may get lost, overwritten, or otherwise become unavailable. Sorry for -the inconvenience. - -::: - -If you are not familiar with the OpenDC web interface, please follow the [Getting Started](/docs/category/getting-started) -guide to get an understanding of the main concepts of OpenDC and how to design a datacenter. - -:::note Action - -Set up a project on the OpenDC website. - -::: - - -## Assignment - -Acme Inc. is a small datacenter operator in the Netherlands. They are currently in the process of acquiring a new client -and closing a deal where Acme will migrate all of the client’s business-critical workloads from internal machines to -Acme’s datacenters. With this deal, the client aims to outsource the maintenance of their digital infrastructure, but in -turn expects reliable and efficient operation from Acme. - -To demonstrate that Acme is capable of this task, it has started a pilot project with the client where Acme will migrate -already a small subset of the client’s workloads. You are an engineer at Acme. and have been tasked with the design and -procurement of the datacenter infrastructure required for this pilot project. - -To guide your design, the client has provided a workload trace of their business-critical workloads, which consist of -the historical runtime behavior of 50 virtual machines over time. These virtual machines differ in resource -requirements (e.g. number of vCPUs or memory) and in resource consumption over time. We can use OpenDC to simulate this -workload trace and validate your datacenter design. - -The assignment is divided into four parts: -1. Analyzing the requirements to estimate what resources are needed. -2. Building your design in OpenDC -3. Validating your design in OpenDC -4. Optimizing your design in OpenDC - -Make notes of your thoughts on the following assignments & questions and discuss with your partner(s). - -## Analyze the Requirements - -The first step of the assignment is to analyze the requirements of the client in order to come up with a reasonable -estimation of the datacenter infrastructure needed. This estimation will become our initial design which we will build -and validate in OpenDC. - -Since the client has provided a workload trace representative of the workload that will eventually be running in the -datacenter, we can use it to guide our design. In [Figure 1](#resource-distribution), the requested memory and vCPUs are -depicted for the virtual machines in the workload trace. - -:::note Action - -Determine the total amount of vCPUs and memory required in the trace. - -::: - -<figure className="figure" id="resource-distribution"> - <img src={require("./img/resource-distribution.png").default} alt="Resource requirements for the workload" /> - <figcaption> - Requested number of vCPUs and memory (in GB) by the - virtual machines in the workload. The left figure shows the number of virtual machines that have requested 1, 2, 4 or 8 - vCPUs. The right figure shows the amount of memory requested compared to the number of vCPUs in the virtual machine. - </figcaption> -</figure> - -Based on this information, we could choose to purchase a new machine for every virtual machine in the workload trace. -Such a design will most certainly be able to handle the workload. At the same time, it is much more expensive and -probably unnecessary. - -In [Figure 2](#cpu-usage), the CPU Usage (in MHz) of the virtual machines in the workload is depicted over time. Observe that the -median CPU usage of the virtual machines over the whole trace is approximately 100 MHz. This means that a 2-core -processor with a base clock 3500 MHz would have utilization of only 1.4% (`100 MHz / (3500 MHz x 2)`) for such a median -workload. - -<figure className="figure" id="cpu-usage"> - <img src={require("./img/cpu-usage.png").default} alt="CPU usage over time for the workload" /> - <figcaption>CPU Usage of the virtual machines in the workload over time.</figcaption> -</figure> - -Instead, we could try to fit multiple virtual machines onto a single machine. For instance, the 2-core processor -mentioned before is able to handle 70 virtual machines, each running at 100 MHz (`(3500 MHz x 2) / 100 MHz`), ignoring -virtualization overhead and memory requirements. - -:::note Action - -Make a rough estimate of the number of physical cores required to host the vCPUs in the workload trace. - -::: - -Now that we have an indication of the number of physical cores we need to have, we can start to compose the servers in -our datacenter. See **Table 1 and 2** for the equipment list you can choose from. Don’t forget to put enough memory in your -servers, or otherwise you risk that not all virtual machines will fit on the servers in your datacenter. - -| Processor | Intel® Xeon® E-2224G | Intel® Xeon® E-2244G | Intel® Xeon® E-2246G | -|----------------------------------|----------------------|----------------------|----------------------| -| Base clock (in MHz) | 3500 | 3800 | 3600 | -| Core count | 4 | 8 | 12 | -| Average power consumption (in W) | 71 | 71 | 80 | - -**Table 1:** Processor options for your datacenter - - -| Memory module | Crucial MTA9ASF2G72PZ-3G2E | Crucial MTA18ASF4G72PDZ-3G2E1 | -|----------------------------------|----------------------------|-------------------------------| -| Size (in GB) | 16 | 32 | -| Speed (in MHZ) | 3200 | 3200 | -**Table 2:** Memory options for your datacenter - - -:::note Action - -Create a plan detailing the servers you want to have in your datacenter and what resources (e.g. processor or memory) -they should contain. For instance, such a plan could look like: - -1. 8x Server (2x Intel® Xeon® E-2244G, 4x Crucial MTA18ASF4G72PDZ-3G2E1) - -::: - -:::tip Hint - -Budget more capacity than your initial estimates to prevent your datacenter from running at a very high -utilization. Think about how your datacenter would handle a machine failure, will you still have enough capacity left? - -::: - -## Build the datacenter - -Based on the plan we devised in the previous section, we will now construct a (virtual) datacenter in OpenDC. If you -have not yet used the OpenDC web interface to design a datacenter, please read [Getting Started](/docs/category/getting-started) -guide to get an understanding of the main concepts of OpenDC and how to design a datacenter. - -:::note Action - -Implement your plan in the OpenDC web interface. - -::: - -## Validate your design - -We are now at a stage where we can validate whether the datacenter we have just designed and built in OpenDC is suitable -for the workload of the client. We will use OpenDC to simulate the workload in our datacenter and keep track of several -metrics to ensure efficient and reliable operation. - -One of our concerns is that our datacenter does not have enough computing power to deal with the client’s -business-critical workload, leading to degraded performance and consequently an unhappy client. - -A metric that gives us an insight in performance degradation is the Overcommitted CPU Cycles, which represents the -number of CPU cycles that a virtual machine wanted to run, but could not due to the host machine not having enough -computing capacity at that moment. To keep track of this metric during simulation, we create a new portfolio by clicking -the ‘+’ next to “Portfolio” in the left sidebar and select the metrics of interest. - -:::note Action - -Add a new portfolio and select at least the following targets: -1. Overcommitted CPU Cycles -2. Granted CPU Cycles -3. Requested CPU Cycles -4. Maximum Number VMs Finished - -::: - -We will now try to simulate the client’s workload trace (called _Bitbrains (Sample)_ in OpenDC). By clicking on ‘New -Scenario’ below your created portfolio, we can create a base scenario which will represent our baseline datacenter -design which we will compare against future improvements. - -:::note Action - -Add a base scenario to your new portfolio and select as trace _Bitbrains (Sample)_. - -::: - -By creating a new scenario, you will schedule a simulation of your datacenter design that will run on one of the OpenDC -simulation servers. Press the Play button next to your portfolio to see the results of the simulations. If you have -chosen the _Bitbrains (Sample)_ trace, the results should usually appear within one minute or less depending on the queue -size. In case they do not appear within a reasonable timeframe, please contact the instructors. - -You can now see how your design has performed. Check whether all virtual machines have finished and whether the -_Overcommitted CPU Cycles_ metric is not too high. Try to aim for anything below 1 bn cycles. In the next section, we’ll -try to further optimize our design. For now, think of an explanation for the performance of your design. - -## Optimize your design - -Finally, let’s try to optimize your design so that it meets the requirements of the client and is beneficial for your -employer as well. In particular, your company is interested in the follow goals: - -1. Reducing _Overcommitted CPU Cycles_ to a minimum for reliability. -2. Reducing _Total Power Consumption_ to a minimum to save energy costs. - -:::note Action - -Add a new portfolio and select at least the following targets: -1. Overcommitted CPU Cycles -2. Granted CPU Cycles -3. Requested CPU Cycles -4. Total Power Consumption - -Then, add a base scenario to your new portfolio and select as trace _Bitbrains (Sample)_. - -::: - -Try to think of ways in which you can reduce both _Overcommitted CPU Cycles_ and _Total Power Consumption_. Create a new -topology based on your initial topology and apply the changes you have come up with. In this way, you can easily compare -the performance of different topologies in different scenarios. Note that the choice of scheduler might also influence -your results. - -:::tip Hint - -The choice of scheduler (and thus the placement of VMs) might also influence your results. - -::: - - -:::note Action - -1. Create a new topology based on your existing topology. -2. Add a new scenario to your created portfolio and select your newly created topology. -3. Compare the results against the base scenario. - -::: - -Repeat this approach until you are satisfied with your design. - -## Epilogue - -In this tutorial, you should have learned briefly about what datacenters are, and the process of designing and -optimizing a datacenter yourself. If you have any feedback (positive or negative) about your experience using OpenDC -during this tutorial, please let us know! |
