summaryrefslogtreecommitdiff
path: root/content
diff options
context:
space:
mode:
authormjkwiatkowski <mati.rewa@gmail.com>2026-05-17 14:21:09 +0200
committermjkwiatkowski <mati.rewa@gmail.com>2026-05-17 14:21:09 +0200
commita5a140c6286e8b113ca8d371f88e3ed54e731cea (patch)
treecd648c36df09d30c217166865a81a0c4e523932b /content
parenta4102d0252236e85b2813160b4b11e3a19a00d62 (diff)
feat: added lots of citations and slowly finishing the introduction
Diffstat (limited to 'content')
-rw-r--r--content/background.tex78
-rw-r--r--content/intro.tex76
2 files changed, 132 insertions, 22 deletions
diff --git a/content/background.tex b/content/background.tex
index 03c924d..e65dc17 100644
--- a/content/background.tex
+++ b/content/background.tex
@@ -1,10 +1,78 @@
\chapter{Background}\label{s:background}
+Predictive modelling uses statistics to predict outcomes.
+When deployed commercially, for example in datacenters, predictive modelling is often referred to as predictive analytics~\cite{Wikipedia:PredictiveModelling}.
+Almost any statistical model can be used for prediction purposes, but nowadays predictive analysis is synonymous with machine learning.
+A primary example of popular analysis type is linear regression.
+A major limitation of predictive analytics is that history cannot always predict the future.
+Using historical data to predict outcomes works only under the assumption that there are certain long lasting patterns in the system.
+Additionally, no matter how extensive is the training data, there is always the possibility of new variables that have not been considered or even defined, yet are critical to the outcome of the prediction~\cite{Wikipedia:PredictiveModelling}.
+
+%Here you have to cite Deisenroth, 2024, chapter 8.1.4.
+An inference function is a machine learning model which uses probabilistic parameter estimation~\cite{}.
+A prime example of using probability to find a good machine learning model is Bayesian inference.
+% Stanford Encyclopedia of Philosophy, Douven 2017
+The process of inference from data to provide the best explanation is called abduction.
+
+
+
+A \gls{dt} is a digital model of an intended or actual real-world system that serves as a digital counterpart of it for purposes such as simulation, integration, testing, monitoring and maintenance %cite the Wikipedia page here!.
+The system requires real-time synchronization with the actual system.
+A closed loop of continuous feedback exists between the digital twin and physical object.
+
+The digital twin replicates the physical system to predict failures and opportunities for changing, to prescribe real-time actions for optimizing and/or mitigating unexpected events, observing and evaluating the profile of the system.
+
+A digital twin is often called a virtual twin.
+
+The communication between a physical entity and the digital twin is referred to as a digital thread.
+
+One key application is predictive maintenance, where the digital twin analyzes operational data (e.g., temperature, vibration) to predict when a component is likely to fail.
+
+This allows maintenance to be scheduled proactively, reducing unplanned downtime and preventing catastrophic failures.
+
+%Include something about data-preprocessing in the pipeline.
+%See the article by Fei Tao
+
+One of the key arguments that speak for a datacenter digital twin is that datacenters already connect hundreds of monitoring sensors and data coming from them.
+Monitoring of server racks, VM's, CPU profiling and all that give us lots of data.
+
+Data analytics, such as ODA can give actual meaningful insights into what we are doing.
+Moreover, advanced technologies have made sensors, IoT give us much information.
+ODA can predict failures, help maintain the equipment, save bills, cut costs.
+But currently one of the key challenges is to somehow connect the physical and virtual spaces.
+The answer to how to do this is a digital twin.
+
+Since DT's are relatively a new concept, I think they require a short introduction to their history.
+It's enough to mention that the first presentation was done by Grieves in 2003, from 2003 to 2018 we have seen a slow incline in numbers of papers (around 50) and now DT's are re-emerging.
+
+You must include the DT white paper from 2014.
+
+The concept of a \gls{dt} dates back to 2003, when Dr. Michael Grieves of Dassault Syst\'emes introduced the 3 core components of a \gls{dt}: the virtual entity, physical entity and the two-way connection (see Figure \ref{fig:five_dimensional_dt}).
+Due to insufficient technological foundations, little work is available on \gls{dt}s between 2003 and 2018~\cite{DBLP:conf/cirp/TAO2018169}, and it is only with the rapid growth of cloud computing, \gls{iot} and big data analytics that \gls{dt}s have re-emerged.
+Today, research is focused on bridging the gap between the long-established foundations of \gls{dt}s and new, novel applications in academia and industry, such as the \gls{dcdt}.
+%[citation needed]
+
+As of 2026, there is a lack of consensus of what is a digital twin.
+By proxy, there is neither consensus on what is the definition of a datacenter digital twin.
+A generic definition is needed.
+
+
+Most of \gls{dt} usages are related to prognostics and health management.
+
+
+One of the many applications of \gls{dt} is timely system maintenance.
+In aerospace engineering, the \gls{dt} can reliably manage the health of the physical entity by detecting \eg fatigue cracks on aircraft wings or damage to the wind turbine blades~\cite{DBLP:conf/cirp/TAO2018169}.
+A forecast of future maintenance and virtual health management are the prime purpose of many \gls{dt}s~\cite{DBLP:conf/AIAA/Teugel2012}.
+
+Optimal datacenter management is characterized by high service availability and low downtime.
+However, achieving this in a 21\textsuperscript{st} century datacenter requires revolutionary changes in the way datacenters are operated and maintained.
+A concept that creates just such a revolutionary change is the \gls{dcdt}.
+% This sentence is stolen from an article.
+% Make sure to paraphrase it.
+
+% This is stolen from the AIAA article.
+% Make sure to paraphrase this.
+
-\todo{
-This section provides the necessary context to help the reader understand the
-remainder of the thesis.
-}
-\lipsum[1-8]
diff --git a/content/intro.tex b/content/intro.tex
index 58a759a..246d8bf 100644
--- a/content/intro.tex
+++ b/content/intro.tex
@@ -1,39 +1,81 @@
\chapter{Introduction}\label{s:intro}
-Today's transportation systems, education and government largely depend on server-side services, which are hosted in datacentres~\cite{DBLP:journals/corr/IosupKLVG22}.
-To facilitate the rising demand managers expand datacenters with new components and more heterogenous architectures (e.g., GPUs and NPUs)~\cite{DBLP:conf/date/MilojicicFDR21}.
+Modern society is a technological society.
+Presently, computer and network ecosystems play a crucial part not only in the digital industry, but also in everyone's daily lives.
+Today, the transport, education and government sectors largely depend on server-side services, which are hosted in datacentres~\cite{DBLP:journals/corr/IosupKLVG22}.
+To address the recent rise in demand due to the \gls{ai} revolution managers expand datacenters with new components and more heterogenous architectures (e.g., GPUs and NPUs)~\cite{DBLP:conf/date/MilojicicFDR21}.
However, in return datacenter complexity increases significantly.
-To make better operational decisions despite the massive scale, new, promising technologies arise, such as datacenter Digital Twins.
+To make better operational decisions despite the massive scale, promising technologies arise such as \gls{dcdt}.
\section{Context}\label{s:context}
-Datacenters are one of the most important components of the digital society.
-For example, over 25\% of professionals in the Netherlands depend on cloud services in their everyday work.
-Faced with growing demand, this fraction will exceed 35\% by 2025~\cite{DBLP:journals/corr/IosupKLVG22}.
-What is more, the surge of AI and Machine Learning workloads opens the need for versatile server architectures, pushing datacenter managers to meet customer expectations by adding more specialized hardware~\cite{DBLP:conf/date/MilojicicFDR21}.
-In return, operating a modern datacenter with thousands of diversified servers presents a yet unsolved, non-trivial challenge that requires fast and well-informed decisions from on-site engineers.
+% Why is it important?
+Datacenters house large volume of computers for processing and storage of data from various organizations and fields of activity.
+76\% of large companies worldwide spend more than 5 million USD\$ on hosted services each month, making datacenters one of the most important components of the digital society~\cite{DBLP:report/Flexera2026}.
+Additionally, in Netherlands alone over 25\% of professionals depend on cloud services in their everyday work.
+Faced with growing demand, this fraction will exceed 35\% by 2025~\cite{DBLP:journals/corr/IosupKLVG22}.
+% Why is this a problem now?
-To aid in datacenter management, operators turn to \gls{oda}, which is the process of analyzing monitoring data to gain insights into the system behavior.
-For example, OMNI at \gls{nersc} and Wintermute at \gls{lrz} employ descriptive analytics to optimize power usage effectivenes~\cite{DBLP:conf/icppw/BourassaJBCJVS19} and prescriptive analysis for energy efficient scheduling~\cite{DBLP:conf/hpdc/NettiMGOTO020}.
-Nonetheless, we observe a critical lack of predictive analysis capabilities~\cite{DBLP:conf/wosp/SumanCNTMI24} among the existing \gls{oda} frameworks.
-In result, datacenter operators are often confronted with operational decisions with limited time to react, which can lead to missed \gls{sla}.
+The increasing popularity of \gls{genai} and monthly releases of powerful \gls{llm} have driven the demand for datacenter services for the past 4 years.
+In the \gls{ai} economy datacenters need diverse and scalable server architectures, because inference-based workloads require more heterogenous server components (GPUs, TPUs, NPUs \etc) to perform well.
+As such, datacenter operators try to meet customer expectations by adding more specialized hardware~\cite{DBLP:conf/date/MilojicicFDR21}, at a cost of increased system complexity.
+In return, operating a modern datacenter warehouse with thousands of diversified servers presents a difficult challenge that requires fast and well-informed decisions from on-site engineers.
-``Lab-built, preproduction, or early hardware does \textit{not} work as defined, does \textit{not} work reliably and does \textit{not} stay the same from day to day'',
-according to Frederick P. Brooks.
-A solution is a dependable simulator of the system~\cite{DBLP:books/daglib/Brooks0080747}.
-A novel improvement on simulation is a datacenter \gls{dt}~\cite{DBLP:journals/computer/AthavaleBBMMPS24}.
+Quick and correct decision-making in a 21\textsuperscript{st} century datacenter is a hard task.
+Oftentimes unexpected events such as \eg service failures or hardware faults result in a downtime that disturbs the users and produces unfulfilled \gls{sla}.
+What is more, the rapid expansion of datacenters promotes increased presence of failures across all cloud services~\cite{DBLP:conf/acsos/TalluriOVTI21}.
+Currently, preventing service outages in advance could help datacenter operators reduce substantial operational costs, as over 20\% of all reported failure-caused outages amount to more than 1 million US\$~\cite{DBLP:report/AnnualOutageAnalysis2025}.
+However, predicting datacenter behaviour quickly and reliably is a non-trivial problem that still remains insufficiently unaddressed~\cite{DBLP:conf/wosp/SumanCNTMI24}.
+\begin{figure}
+ \centering
+ \includegraphics[width=0.95\linewidth]{images/five_dimensional_dt.pdf}
+ \caption{A basic framework for the \gls{dt}. Four core elements of a \gls{dt} are defined: The physical entity \one and the simulated virtual twin \two. A service for out-of-band data analytics \three and a persistent storage of historical data \four are crucial to the \gls{dt} because they are necessary to gain meaningful monitoring insights. Adapted from Tao \etal ~\cite{DBLP:conf/cirp/TAO2018169}.}
+ %Fei Tao is a renowned figure with over 62k citations. He is a figure of authority on digital twins.%
+ \label{fig:five_dimensional_dt}
+\end{figure}
+% (3) in the original paper by Fei Tao is referenced to just `Services`.
+% Nonetheless I name them here as Data Analysis Services, because what Fei Tao lists (e.g., fault detection, fault determination, fault-tolerant management, maintenance) is inherently reliant on good data analytics.
+The expanding \gls{ai} economy and the end of Moore's law have resulted in the rise of more heterogeneous datacenter architectures~\cite{DBLP:conf/date/MilojicicFDR21}.
+This means that in modern datacenters there are more server racks and each rack may contain multiple different hardware architectures.
+These events have created a need for:
+\begin{enumerate}
+ \item More careful datacenter management to tackle the unprecedented complexity
+ \item Greater availability of cloud services
+ \item Lesser downtime and lower electricity cost
+\end{enumerate}
+Specific goals that can help satisfy these needs are:
+\begin{enumerate}
+ \item Reducing the downtime of failured-caused outages
+ \item Maximising the monitoring insights that can help make better informed operational decisions
+ \item Minimizing the downtime caused by server maintenance and hardware inspections
+\end{enumerate}
+
+A \gls{dcdt} mirrors the structure, context and behaviour of a datacenter~\cite{DBLP:journals/computer/AthavaleBBMMPS24}.
+Crucial to \gls{dt} operation are predictive capabilities and the continuous interaction with the real-world datacenter.
+There already exist digital twin deployments.
+For example, ExaDigiT~\cite{DBLP:conf/sc/BrewerMKWBHSGGW24} is a framework for digital twin development of supercomputers.
+It has been demonstrated at the Frontier supercomputer and it facilitates virtual prototyping and system optimization, however it lacks core \gls{dt} functions, such as reliable predictive analytics.
\section{Problem statement}\label{s:problem-statement}
+In this work we argue that the current state-of-the-art ICT Digital Twins lack predictive capabilities that are essential to real-time facility management.
+We propose that digital twinning can be enhanced by integrating \gls{oda} through predictive analytics.
+
\section{Research Questions}\label{s:research-questions}
+We divide the problem of enabling predictive analytics using digital twinning into three research questions:
+\begin{enumerate}[label=\textbf{RQ\arabic*.}, align=left]
+ \item \textbf{How to define 5 \gls{dcdt} use-cases and their functional and non-functional requirements?}
+ \item \textbf{How to design a \gls{dcdt} system model using discrete-event simulation and operational data analysis?}
+ \item \textbf{How to validate if the \gls{dcdt} system meets the functional and non-functional requirements?}
+\end{enumerate}
\section{Research Methodology}\label{s:research-methodology}
\section{Thesis Contributions}\label{s:thesis-contributions}
\section{Plagiarism Declaration}\label{s:plagiarism-declaraion}
-I hereby declare that this thesis is my own independent work and writing.
+I hereby declare that this thesis is my own independent work and writing.
The thesis does not contain any material copied from other sources (person, Internet, or AI), and has not been submitted for assessment elsewhere.
\section{Societal Impact}\label{s:societal-impact}