1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
|
---
sidebar_position: 1
title: Cloud Capacity Planning
hide_title: true
sidebar_label: Cloud Capacity Planning
---
# Cloud Capacity Planning Tutorial
Using OpenDC to plan and design cloud datacenters.
:::info Learning goal
By doing this assignment, you will learn more about the basic concepts of datacenters and cloud computing systems, as
well as how such systems are planned and designed.
:::
## Preamble
Datacenter infrastructure is important in today’s digital society. Stakeholders across industry, government, and
academia employ a vast and diverse array of cloud services hosted by datacenter infrastructure, and expect services to
be reliable, high speed, and low cost. In turn, datacenter operators must maintain efficient operation at unprecedented
scale.
To keep up with growing demand and increasing complexity, architects of datacenters must address complex challenges in
distributed systems, software engineering and performance engineering. One of these challenges is efficient utilization
of resources in datacenters, which is only 6-12% industry-wide despite the fact that it is inconvenient for datacenter
operators to keep much of their infrastructure idle, due to resulting high energy consumption and thus unnecessary
costs.
It is often quite difficult to implement optimizations or other changes in datacenters. Datacenter operators tend to be
conservative in adopting such changes in fear of failure or misbehaving systems. Furthermore, testing changes at the
scale of modern datacenter infrastructure in a real-world setting is prohibitively expensive and hard to reproduce,
notwithstanding environmental concerns.
A more viable alternative is the use of datacenter simulators such as OpenDC or CloudSim. These tools model datacenter
infrastructure at a good accuracy and allow us to test changes in a controllable and repeatable environment.
In this tutorial, we will use the OpenDC datacenter simulator to experiment with datacenters and demonstrate the
process of designing and optimizing datacenters using simulation.
## What is OpenDC
OpenDC is an open source platform for datacenter simulation developed by AtLarge Research. The purpose of OpenDC is
twofold: we aim to both enable cloud computing education and support research into datacenters.
An example of the former is this tutorial, and examples of the latter include the numerous BSc and MSc research projects
that are using OpenDC to run experiments and perform research.
:::caution
OpenDC is still an experimental tool. Your data may get lost, overwritten, or otherwise become unavailable. Sorry for
the inconvenience.
:::
If you are not familiar with the OpenDC web interface, please follow the [Getting Started](/docs/category/getting-started)
guide to get an understanding of the main concepts of OpenDC and how to design a datacenter.
:::note Action
Set up a project on the OpenDC website.
:::
## Assignment
Acme Inc. is a small datacenter operator in the Netherlands. They are currently in the process of acquiring a new client
and closing a deal where Acme will migrate all of the client’s business-critical workloads from internal machines to
Acme’s datacenters. With this deal, the client aims to outsource the maintenance of their digital infrastructure, but in
turn expects reliable and efficient operation from Acme.
To demonstrate that Acme is capable of this task, it has started a pilot project with the client where Acme will migrate
already a small subset of the client’s workloads. You are an engineer at Acme. and have been tasked with the design and
procurement of the datacenter infrastructure required for this pilot project.
To guide your design, the client has provided a workload trace of their business-critical workloads, which consist of
the historical runtime behavior of 50 virtual machines over time. These virtual machines differ in resource
requirements (e.g. number of vCPUs or memory) and in resource consumption over time. We can use OpenDC to simulate this
workload trace and validate your datacenter design.
The assignment is divided into four parts:
1. Analyzing the requirements to estimate what resources are needed.
2. Building your design in OpenDC
3. Validating your design in OpenDC
4. Optimizing your design in OpenDC
Make notes of your thoughts on the following assignments & questions and discuss with your partner(s).
## Analyze the Requirements
The first step of the assignment is to analyze the requirements of the client in order to come up with a reasonable
estimation of the datacenter infrastructure needed. This estimation will become our initial design which we will build
and validate in OpenDC.
Since the client has provided a workload trace representative of the workload that will eventually be running in the
datacenter, we can use it to guide our design. In [Figure 1](#resource-distribution), the requested memory and vCPUs are
depicted for the virtual machines in the workload trace.
:::note Action
Determine the total amount of vCPUs and memory required in the trace.
:::
<figure className="figure" id="resource-distribution">
<img src={require("./img/resource-distribution.png").default} alt="Resource requirements for the workload" />
<figcaption>
Requested number of vCPUs and memory (in GB) by the
virtual machines in the workload. The left figure shows the number of virtual machines that have requested 1, 2, 4 or 8
vCPUs. The right figure shows the amount of memory requested compared to the number of vCPUs in the virtual machine.
</figcaption>
</figure>
Based on this information, we could choose to purchase a new machine for every virtual machine in the workload trace.
Such a design will most certainly be able to handle the workload. At the same time, it is much more expensive and
probably unnecessary.
In [Figure 2](#cpu-usage), the CPU Usage (in MHz) of the virtual machines in the workload is depicted over time. Observe that the
median CPU usage of the virtual machines over the whole trace is approximately 100 MHz. This means that a 2-core
processor with a base clock 3500 MHz would have utilization of only 1.4% (`100 MHz / (3500 MHz x 2)`) for such a median
workload.
<figure className="figure" id="cpu-usage">
<img src={require("./img/cpu-usage.png").default} alt="CPU usage over time for the workload" />
<figcaption>CPU Usage of the virtual machines in the workload over time.</figcaption>
</figure>
Instead, we could try to fit multiple virtual machines onto a single machine. For instance, the 2-core processor
mentioned before is able to handle 70 virtual machines, each running at 100 MHz (`(3500 MHz x 2) / 100 MHz`), ignoring
virtualization overhead and memory requirements.
:::note Action
Make a rough estimate of the number of physical cores required to host the vCPUs in the workload trace.
:::
Now that we have an indication of the number of physical cores we need to have, we can start to compose the servers in
our datacenter. See **Table 1 and 2** for the equipment list you can choose from. Don’t forget to put enough memory in your
servers, or otherwise you risk that not all virtual machines will fit on the servers in your datacenter.
| Processor | Intel® Xeon® E-2224G | Intel® Xeon® E-2244G | Intel® Xeon® E-2246G |
|----------------------------------|----------------------|----------------------|----------------------|
| Base clock (in MHz) | 3500 | 3800 | 3600 |
| Core count | 4 | 8 | 12 |
| Average power consumption (in W) | 71 | 71 | 80 |
**Table 1:** Processor options for your datacenter
| Memory module | Crucial MTA9ASF2G72PZ-3G2E | Crucial MTA18ASF4G72PDZ-3G2E1 |
|----------------------------------|----------------------------|-------------------------------|
| Size (in GB) | 16 | 32 |
| Speed (in MHZ) | 3200 | 3200 |
**Table 2:** Memory options for your datacenter
:::note Action
Create a plan detailing the servers you want to have in your datacenter and what resources (e.g. processor or memory)
they should contain. For instance, such a plan could look like:
1. 8x Server (2x Intel® Xeon® E-2244G, 4x Crucial MTA18ASF4G72PDZ-3G2E1)
:::
:::tip Hint
Budget more capacity than your initial estimates to prevent your datacenter from running at a very high
utilization. Think about how your datacenter would handle a machine failure, will you still have enough capacity left?
:::
## Build the datacenter
Based on the plan we devised in the previous section, we will now construct a (virtual) datacenter in OpenDC. If you
have not yet used the OpenDC web interface to design a datacenter, please read [Getting Started](/docs/category/getting-started)
guide to get an understanding of the main concepts of OpenDC and how to design a datacenter.
:::note Action
Implement your plan in the OpenDC web interface.
:::
## Validate your design
We are now at a stage where we can validate whether the datacenter we have just designed and built in OpenDC is suitable
for the workload of the client. We will use OpenDC to simulate the workload in our datacenter and keep track of several
metrics to ensure efficient and reliable operation.
One of our concerns is that our datacenter does not have enough computing power to deal with the client’s
business-critical workload, leading to degraded performance and consequently an unhappy client.
A metric that gives us an insight in performance degradation is the Overcommitted CPU Cycles, which represents the
number of CPU cycles that a virtual machine wanted to run, but could not due to the host machine not having enough
computing capacity at that moment. To keep track of this metric during simulation, we create a new portfolio by clicking
the ‘+’ next to “Portfolio” in the left sidebar and select the metrics of interest.
:::note Action
Add a new portfolio and select at least the following targets:
1. Overcommitted CPU Cycles
2. Granted CPU Cycles
3. Requested CPU Cycles
4. Maximum Number VMs Finished
:::
We will now try to simulate the client’s workload trace (called _Bitbrains (Sample)_ in OpenDC). By clicking on ‘New
Scenario’ below your created portfolio, we can create a base scenario which will represent our baseline datacenter
design which we will compare against future improvements.
:::note Action
Add a base scenario to your new portfolio and select as trace _Bitbrains (Sample)_.
:::
By creating a new scenario, you will schedule a simulation of your datacenter design that will run on one of the OpenDC
simulation servers. Press the Play button next to your portfolio to see the results of the simulations. If you have
chosen the _Bitbrains (Sample)_ trace, the results should usually appear within one minute or less depending on the queue
size. In case they do not appear within a reasonable timeframe, please contact the instructors.
You can now see how your design has performed. Check whether all virtual machines have finished and whether the
_Overcommitted CPU Cycles_ metric is not too high. Try to aim for anything below 1 bn cycles. In the next section, we’ll
try to further optimize our design. For now, think of an explanation for the performance of your design.
## Optimize your design
Finally, let’s try to optimize your design so that it meets the requirements of the client and is beneficial for your
employer as well. In particular, your company is interested in the follow goals:
1. Reducing _Overcommitted CPU Cycles_ to a minimum for reliability.
2. Reducing _Total Power Consumption_ to a minimum to save energy costs.
:::note Action
Add a new portfolio and select at least the following targets:
1. Overcommitted CPU Cycles
2. Granted CPU Cycles
3. Requested CPU Cycles
4. Total Power Consumption
Then, add a base scenario to your new portfolio and select as trace _Bitbrains (Sample)_.
:::
Try to think of ways in which you can reduce both _Overcommitted CPU Cycles_ and _Total Power Consumption_. Create a new
topology based on your initial topology and apply the changes you have come up with. In this way, you can easily compare
the performance of different topologies in different scenarios. Note that the choice of scheduler might also influence
your results.
:::tip Hint
The choice of scheduler (and thus the placement of VMs) might also influence your results.
:::
:::note Action
1. Create a new topology based on your existing topology.
2. Add a new scenario to your created portfolio and select your newly created topology.
3. Compare the results against the base scenario.
:::
Repeat this approach until you are satisfied with your design.
## Epilogue
In this tutorial, you should have learned briefly about what datacenters are, and the process of designing and
optimizing a datacenter yourself. If you have any feedback (positive or negative) about your experience using OpenDC
during this tutorial, please let us know!
|