summaryrefslogtreecommitdiff
path: root/content
diff options
context:
space:
mode:
authormjkwiatkowski <mati.rewa@gmail.com>2026-06-06 14:17:56 +0200
committermjkwiatkowski <mati.rewa@gmail.com>2026-06-06 14:17:56 +0200
commit06ef5701dec475df00270ddd871091a0c41d5d25 (patch)
tree2377bacedb1cb0cad317725d90518b455e2ab13d /content
parent980c99a32e09273039b918f4d15692fc41140f1c (diff)
feat: changed the introduction figure, moved five_dimensional_dt -> background section
Diffstat (limited to 'content')
-rw-r--r--content/background.tex151
-rw-r--r--content/intro.tex11
2 files changed, 84 insertions, 78 deletions
diff --git a/content/background.tex b/content/background.tex
index ebbfa1e..7e89d77 100644
--- a/content/background.tex
+++ b/content/background.tex
@@ -2,18 +2,92 @@
\section{Overview}\label{s:background_overview}
-\section{Digital Twinning}\label{ss:digital-twinning}
+\section{Datacenters}\label{ss:datacenters}
+
+\subsection{Computing Infrastructure}\label{sss:failures}
+
+\subsection{Datacenter Simulation}\label{sss:simulation}
+
+Predictive modelling uses statistics to predict outcomes.
+When deployed commercially, for example in datacenters, predictive modelling is often referred to as predictive analytics~\cite{Wikipedia:PredictiveModelling}.
+Almost any statistical model can be used for prediction purposes, but nowadays predictive analysis is synonymous with machine learning.
+A primary example of popular analysis type is linear regression.
+A major limitation of predictive analytics is that history cannot always predict the future.
+Using historical data to predict outcomes works only under the assumption that there are certain long lasting patterns in the system.
+Additionally, no matter how extensive is the training data, there is always the possibility of new variables that have not been considered or even defined, yet are critical to the outcome of the prediction~\cite{Wikipedia:PredictiveModelling}.
+
+%Here you have to cite Deisenroth, 2024, chapter 8.1.4.
+An inference function is a machine learning model which uses probabilistic parameter estimation~\cite{}.
+A prime example of using probability to find a good machine learning model is Bayesian inference.
+% Stanford Encyclopedia of Philosophy, Douven 2017
+The process of inference from data to provide the best explanation is called abduction.
+
+\ipsum[1-2]
+
+\input{sources/simulator_comparison.tex}
+\section{Digital Twinning}\label{ss:digital-twinning}
% To fix: remove the \gls commands for ExaDigiT.
% This is getting silly.
\subsection{What is Digital Twinning?}\label{sss:what_is_digital_twinning}
+% Here talk a bit about different types of data analytics that are performed in a digital twin.
+``A \emph{digital twin} is a set of virtual information constructs that mimics the structure, context and behaviour of a natural, engineered or social system, is dynamically updated with data from its physical twin, has predictive capability, and informs decisions that realize value''~\cite{DBLP:usdoe/report/AP26894}.
+A crucial characteristic that differentiates digital twinning from simulation and statistical modelling is the \emph{digital thread}: a bi-directional channel that enables continuous interaction between the virtual and physical entities.
+An example \gls{dt} architecture is depicted in Figure \ref{fig:five_dimensional_dt} Section \ref{s:intro} from Tao \etal~\cite{DBLP:conf/cirp/TAO2018169}.
+
+The longer the \gls{dt} is working, the more accurate its predictions, because a holistic twin aggregates historical patterns together with up-to-date monitoring data.
+% Why has not anyone done this before?
+Digital twinning has only recently become feasible because of the developments in \gls{hpc}.
+Between 2003 and 2011 the compute needed to run a Digital Twin was simply not present.
+As such, while the concept existed, the hardware did not catch up yet.
+However, in the last decade, multicore computing paradigms and the advent of GPU computing has finally enabled computation needed to run digital twins.
+As a result, digital twins have become more relevant today than 10 years ago~\cite{DBLP:conf/cirp/TAO2018169}.
+\begin{figure}
+ \centering
+ \includegraphics[width=0.95\linewidth]{images/five_dimensional_dt.pdf}
+ \caption{A basic framework for the \gls{dt}. Four core elements of a \gls{dt} are defined: The physical entity \one and the simulated virtual twin \two. A service for out-of-band data analytics \three and a persistent storage of historical data \four are crucial to the \gls{dt} because they are necessary to gain meaningful monitoring insights. Adapted from Tao \etal ~\cite{DBLP:conf/cirp/TAO2018169}.}
+ %Fei Tao is a renowned figure with over 62k citations. He is a figure of authority on digital twins.%
+ \label{fig:five_dimensional_dt}
+\end{figure}
+% (3) in the original paper by Fei Tao is referenced to just `Services`.
+% Nonetheless I name them here as Data Analysis Services, because what Fei Tao lists (e.g., fault detection, fault determination, fault-tolerant management, maintenance) is inherently reliant on good data analytics.
\subsection{Digital Twins across Domains}\label{sss:digital_twins_across_domains}
\subsection{Digital Twins for Datacenters}\label{sss:digital_twins_for_datacenters}
-\gls{ed}~\cite{DBLP:conf/sc/BrewerMKWBHSGGW24} is an open-source framework for developing digital twins of supercomputers.
+
+One of the key arguments that speak for a datacenter digital twin is that datacenters already connect hundreds of monitoring sensors and data coming from them.
+Monitoring of server racks, VM's, CPU profiling and all that give us lots of data.
+
+Data analytics, such as ODA can give actual meaningful insights into what we are doing.
+Moreover, advanced technologies have made sensors, IoT give us much information.
+ODA can predict failures, help maintain the equipment, save bills, cut costs.
+But currently one of the key challenges is to somehow connect the physical and virtual spaces.
+The answer to how to do this is a digital twin.
+
+%[citation needed]
+
+%Why predictive analytics? Why predictive behaviour?
+
+%What is below here is true, but nonetheless the argumentation should be slightly changed. And a citation is needed.
+However, there has been little effor made to integrate analytics that enable consistent and relaible prediction of datacenter behaviour into a holistic digital twin of a datacenter.
+Nor has the fidelity of failure modeling inside a datacenter simulation increased.
+The failure model is still a linear model.
+Since a datacenter simulator is quite different from a digital twin, we cannot use the same computation methods (not as they are right now, at least) -- we must adapt them.
+The prediciton models are the same ones for the digital twin as the ones used for the datacenter simulator.
+Since a digital twin is not a standalone simulator, a change to how we both predict and model failures is necessary.
+
+
+Because of judgement born out of experience, evolution of existing datacenters is fairly successful; however the development of a new, modern datacenters is fraught with unexpected problems that results in weight growth, schedule delays and cost overruns.
+Optimal datacenter management is characterized by high service availability and low downtime.
+Achieving this in a 21\textsuperscript{st} century datacenter requires revolutionary changes in the way datacenters are operated and maintained.
+A concept that creates just such a revolutionary change is the \gls{dcdt}.
+
+\input{sources/dt_features_comparison.tex}
+
+ExaDigiT~\cite{DBLP:conf/sc/BrewerMKWBHSGGW24} is an open-source framework for developing digital twins of supercomputers.
It consists of 3 modules:
\begin{enumerate*}[label=(\arabic*)]
\item resource allocator and power simulator
@@ -21,23 +95,23 @@ It consists of 3 modules:
\item augmented reality 3D model
\end{enumerate*}
of the supercomputer.
-\gls{ed} has been used at the Frontier supercomputer at the Oak Ridge National Laboratory in the USA, successfully predicting potential energy losses at the supercomputer.
+ExaDigiT has been used at the Frontier supercomputer at the Oak Ridge National Laboratory in the USA, successfully predicting potential energy losses at the supercomputer.
Brewer \etal include alongside the framework architecture an open-source artifact and a set of extensive verification and validation experiments.
-The authors differentiate between different digital twins within \gls{ed}, such as \begin{enumerate*}[label=(\arabic*)]
+The authors differentiate between different digital twins within ExaDigiT, such as \begin{enumerate*}[label=(\arabic*)]
\item descriptive twin
\item informative twin
\item predictive twin
\item comprehensive twin
\item autonomous twin
\end{enumerate*}
-that together form the \gls{ed}.
+that together form the system.
The \emph{predictive twin} leverages data driven operational analytics to create \gls{ml} models. Authors argue that alongside simulation, \gls{ml} models should also have a significant role for modeling system workloads in \eg application fingerprinting.
Within the \emph{autonomous twin} the authors use \gls{rl} to train agents that can be used to make control decisions in order to optimize different processes.
In order to model the cooling system the authors use the Modelica software, and to predict energy power draw they coded a Python script.
The authors provide a intuitive way to interact with the system using a visual dashboard, and an advanced augmented reality model.
The authors posit that the best way to address the 3V's of data (velocity, volume and variety) is to use augmented reality coupled with dashboards.
-SmarDC~\cite{DBLP:conf/noms/ZhangZLZWC22} is a digital twin solution for optimization of power consumption in datacenters.
+SmartDC~\cite{DBLP:conf/noms/ZhangZLZWC22} is a digital twin solution for optimization of power consumption in datacenters.
Specifically, Zhang \etal propose that using \gls{ai} enhanced modeling paired with digital twinning can help make dynamic adjustments to the datacenter cooling subsystem.
SmartDC has been proven to ensure efficient energy-saving rate of a China Telecom datacenter at 41\%.
However, the main purpose of SmartDC is not to continuously interact with the facility, but to provide additional training data for a more accurate, \gls{ml} solution.
@@ -53,19 +127,6 @@ DyTwin~\cite{DBLP:conf/sc/TaheriBPRHDEWPM24} is an adaptive digital twin with vi
% Documentation: https://learn.microsoft.com/en-us/azure/digital-twins/
% Moreover, NVIDIA is doing too as well https://www.nvidia.com/en-sg/omniverse/
-Predictive modelling uses statistics to predict outcomes.
-When deployed commercially, for example in datacenters, predictive modelling is often referred to as predictive analytics~\cite{Wikipedia:PredictiveModelling}.
-Almost any statistical model can be used for prediction purposes, but nowadays predictive analysis is synonymous with machine learning.
-A primary example of popular analysis type is linear regression.
-A major limitation of predictive analytics is that history cannot always predict the future.
-Using historical data to predict outcomes works only under the assumption that there are certain long lasting patterns in the system.
-Additionally, no matter how extensive is the training data, there is always the possibility of new variables that have not been considered or even defined, yet are critical to the outcome of the prediction~\cite{Wikipedia:PredictiveModelling}.
-
-%Here you have to cite Deisenroth, 2024, chapter 8.1.4.
-An inference function is a machine learning model which uses probabilistic parameter estimation~\cite{}.
-A prime example of using probability to find a good machine learning model is Bayesian inference.
-% Stanford Encyclopedia of Philosophy, Douven 2017
-The process of inference from data to provide the best explanation is called abduction.
%Include something about data-preprocessing in the pipeline.
%See the article by Fei Tao
@@ -78,55 +139,3 @@ The process of inference from data to provide the best explanation is called abd
\end{figure}
-\section{Datacenters}\label{ss:datacenters}
-\subsection{A Primer on Datacenter simulation}\label{sss:simulation}
-\input{sources/dt_features_comparison.tex}
-
-Explain the high risk phenomena that occur in datacenters, which includes failures.
-% Ask Jesse if you can have both of such tables in this section
-\input{sources/simulator_comparison.tex}
-
-One of the key arguments that speak for a datacenter digital twin is that datacenters already connect hundreds of monitoring sensors and data coming from them.
-Monitoring of server racks, VM's, CPU profiling and all that give us lots of data.
-
-Data analytics, such as ODA can give actual meaningful insights into what we are doing.
-Moreover, advanced technologies have made sensors, IoT give us much information.
-ODA can predict failures, help maintain the equipment, save bills, cut costs.
-But currently one of the key challenges is to somehow connect the physical and virtual spaces.
-The answer to how to do this is a digital twin.
-
-%[citation needed]
-
-As of 2026, there is a lack of consensus of what is a digital twin.
-By proxy, there is neither consensus on what is the definition of a datacenter digital twin.
-A generic definition is needed.
-
-
-
-\subsection{Failures}\label{sss:failures}
-%Why predictive analytics? Why predictive behaviour?
-
-%What is below here is true, but nonetheless the argumentation should be slightly changed. And a citation is needed.
-However, there has been little effor made to integrate analytics that enable consistent and relaible prediction of datacenter behaviour into a holistic digital twin of a datacenter.
-Nor has the fidelity of failure modeling inside a datacenter simulation increased.
-The failure model is still a linear model.
-% Since a datacenter simulator is quite different from a digital twin, we cannot use the same computation methods (not as they are right now, at least) -- we must adapt them.
-The prediciton models are the same ones for the digital twin as the ones used for the datacenter simulator.
-Since a digital twin is not a standalone simulator, a change to how we both predict and model failures is necessary.
-
-The longer the DT is working, the more accurate its predictions.
-All the results are aggregated.
-% Why has not anyone done this before?
-It is also the case that currently this is possible only and only because of the recent development in High Performance Computing.
-Between 2003 and 2011 the compute needed to run a Digital Twin was simply not there.
-As such, while the concept existed, the hardware did not catch up yet.
-However, in the last decade, multicore computing paradigms and the advent of GPU computing has finally enabled computation needed to run a Digital Twin.
-This is what has changed, so that today running a digital twin is relevant, much more relevant than it was 10 years ago.
-This is also why nobody has done a Digital Twin of a datacenter before.
-The current widespread availability of HPC makes this possible.
-
-
-Because of judgement born out of experience, evolution of existing datacenters is fairly successful; however the development of a new, modern datacenters is fraught with unexpected problems that results in weight growth, schedule delays and cost overruns.
-Optimal datacenter management is characterized by high service availability and low downtime.
-Achieving this in a 21\textsuperscript{st} century datacenter requires revolutionary changes in the way datacenters are operated and maintained.
-A concept that creates just such a revolutionary change is the \gls{dcdt}.
diff --git a/content/intro.tex b/content/intro.tex
index 35ff21e..2a077bc 100644
--- a/content/intro.tex
+++ b/content/intro.tex
@@ -28,13 +28,10 @@ To address this new problem a concept of a datacenter \gls{dt} was proposed~\cit
\begin{figure}
\centering
- \includegraphics[width=0.95\linewidth]{images/five_dimensional_dt.pdf}
- \caption{A basic framework for the \gls{dt}. Four core elements of a \gls{dt} are defined: The physical entity \one and the simulated virtual twin \two. A service for out-of-band data analytics \three and a persistent storage of historical data \four are crucial to the \gls{dt} because they are necessary to gain meaningful monitoring insights. Adapted from Tao \etal ~\cite{DBLP:conf/cirp/TAO2018169}.}
- %Fei Tao is a renowned figure with over 62k citations. He is a figure of authority on digital twins.%
- \label{fig:five_dimensional_dt}
+ \includegraphics[width=0.8\linewidth]{images/simple_dt.pdf}
+ \caption{Elements of the digital twin ecosystem~\cite{DBLP:modsim24/presentation/Iosup2024}.}
+ \label{fig:simple_dt}
\end{figure}
-% (3) in the original paper by Fei Tao is referenced to just `Services`.
-% Nonetheless I name them here as Data Analysis Services, because what Fei Tao lists (e.g., fault detection, fault determination, fault-tolerant management, maintenance) is inherently reliant on good data analytics.
\section{Context}\label{s:context}
@@ -49,7 +46,7 @@ The \gls{dt} can reliably manage the health of the physical entity by detecting
This allows maintenance to be scheduled proactively, reducing unplanned downtime and preventing catastrophic failures.
Forecasting future maintenance and managing the physical health of an object or facility are the prime purpose of many \gls{dt}s used in practice~\cite{DBLP:conf/AIAA/Teugel2012}.
-The first mention of a \gls{dt} dates back to 2003, when Dr. Michael Grieves of Dassault Syst\'emes introduced the 3 core components of a \gls{dt}: the virtual entity, physical entity and the two-way connection (see Figure \ref{fig:five_dimensional_dt}).
+The first mention of a \gls{dt} dates back to 2003, when Dr. Michael Grieves of Dassault Syst\'emes introduced the 3 core components of a \gls{dt}: the virtual entity, physical entity and the two-way connection (see Figure \ref{fig:simple_dt}).
Due to insufficient technological foundations, little work is available on \gls{dt}s between 2003 and 2018, and it is only with the rapid growth of cloud computing, \gls{iot} and Big Data analytics that \gls{dt}s have re-emerged.
Today, research is focused on bridging the gap between the long-established foundations of \gls{dt}s and new, novel applications in academia and industry, such as the \gls{dcdt}~\cite{DBLP:conf/cirp/TAO2018169, DBLP:journals/computer/AthavaleBBMMPS24}.