summaryrefslogtreecommitdiff
path: root/content/background.tex
blob: e65dc1730ad3cfb3c129c09644f812af94888099 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
\chapter{Background}\label{s:background}
Predictive modelling uses statistics to predict outcomes.
When deployed commercially, for example in datacenters, predictive modelling is often referred to as predictive analytics~\cite{Wikipedia:PredictiveModelling}.
Almost any statistical model can be used for prediction purposes, but nowadays predictive analysis is synonymous with machine learning.
A primary example of popular analysis type is linear regression.
A major limitation of predictive analytics is that history cannot always predict the future.
Using historical data to predict outcomes works only under the assumption that there are certain long lasting patterns in the system.
Additionally, no matter how extensive is the training data, there is always the possibility of new variables that have not been considered or even defined, yet are critical to the outcome of the prediction~\cite{Wikipedia:PredictiveModelling}.

%Here you have to cite Deisenroth, 2024, chapter 8.1.4.
An inference function is a machine learning model which uses probabilistic parameter estimation~\cite{}.
A prime example of using probability to find a good machine learning model is Bayesian inference.
% Stanford Encyclopedia of Philosophy, Douven 2017
The process of inference from data to provide the best explanation is called abduction.



A \gls{dt} is a digital model of an intended or actual real-world system that serves as a digital counterpart of it for purposes such as simulation, integration, testing, monitoring and maintenance %cite the Wikipedia page here!.
The system requires real-time synchronization with the actual system.
A closed loop of continuous feedback exists between the digital twin and physical object.

The digital twin replicates the physical system to predict failures and opportunities for changing, to prescribe real-time actions for optimizing and/or mitigating unexpected events, observing and evaluating the profile of the system.

A digital twin is often called a virtual twin.

The communication between a physical entity and the digital twin is referred to as a digital thread.

One key application is predictive maintenance, where the digital twin analyzes operational data (e.g., temperature, vibration) to predict when a component is likely to fail.

This allows maintenance to be scheduled proactively, reducing unplanned downtime and preventing catastrophic failures.

%Include something about data-preprocessing in the pipeline.
%See the article by Fei Tao

One of the key arguments that speak for a datacenter digital twin is that datacenters already connect hundreds of monitoring sensors and data coming from them.
Monitoring of server racks, VM's, CPU profiling and all that give us lots of data.

Data analytics, such as ODA can give actual meaningful insights into what we are doing.
Moreover, advanced technologies have made sensors, IoT give us much information.
ODA can predict failures, help maintain the equipment, save bills, cut costs.
But currently one of the key challenges is to somehow connect the physical and virtual spaces.
The answer to how to do this is a digital twin.

Since DT's are relatively a new concept, I think they require a short introduction to their history.
It's enough to mention that the first presentation was done by Grieves in 2003, from 2003 to 2018 we have seen a slow incline in numbers of papers (around 50) and now DT's are re-emerging.

You must include the DT white paper from 2014.

The concept of a \gls{dt} dates back to 2003, when Dr. Michael Grieves of Dassault Syst\'emes introduced the 3 core components of a \gls{dt}: the virtual entity, physical entity and the two-way connection (see Figure \ref{fig:five_dimensional_dt}).
Due to insufficient technological foundations, little work is available on \gls{dt}s between 2003 and 2018~\cite{DBLP:conf/cirp/TAO2018169}, and it is only with the rapid growth of cloud computing, \gls{iot} and big data analytics that \gls{dt}s have re-emerged.
Today, research is focused on bridging the gap between the long-established foundations of \gls{dt}s and new, novel applications in academia and industry, such as the \gls{dcdt}.
%[citation needed]

As of 2026, there is a lack of consensus of what is a digital twin.
By proxy, there is neither consensus on what is the definition of a datacenter digital twin.
A generic definition is needed.


Most of \gls{dt} usages are related to prognostics and health management.


One of the many applications of \gls{dt} is timely system maintenance.
In aerospace engineering, the \gls{dt} can reliably manage the health of the physical entity by detecting \eg fatigue cracks on aircraft wings or damage to the wind turbine blades~\cite{DBLP:conf/cirp/TAO2018169}.
A forecast of future maintenance and virtual health management are the prime purpose of many \gls{dt}s~\cite{DBLP:conf/AIAA/Teugel2012}.

Optimal datacenter management is characterized by high service availability and low downtime.
However, achieving this in a 21\textsuperscript{st} century datacenter requires revolutionary changes in the way datacenters are operated and maintained.
A concept that creates just such a revolutionary change is the \gls{dcdt}.
% This sentence is stolen from an article.
% Make sure to paraphrase it.

% This is stolen from the AIAA article.
% Make sure to paraphrase this.