diff options
Diffstat (limited to 'content')
| -rw-r--r-- | content/background.tex | 87 | ||||
| -rw-r--r-- | content/intro.tex | 9 |
2 files changed, 94 insertions, 2 deletions
diff --git a/content/background.tex b/content/background.tex index 85f3c57..c88367f 100644 --- a/content/background.tex +++ b/content/background.tex @@ -2,10 +2,38 @@ \section{Datacenters}\label{ss:datacenters} Explain the high risk phenomena that occur in datacenters, which includes failures. -\subsection{Failures} +\subsection{Failures}\label{sss:failures} + \section{Digital Twinning}\label{ss:digital-twinning} + +\gls{ed} is an open-source framework for developing digital twins of supercomputers. +It consists of 3 modules: +\begin{enumerate*}[label=(\arabic*)] + \item resource allocator and power simulator + \item thermal cooling model + \item augmented reality 3D model +\end{enumerate*} +of the supercomputer. +\gls{ed} has been used at the Frontier supercomputer at the Oak Ridge National Laboratory in the USA, successfully predicting potential energy losses at the supercomputer. +Brewer \etal include alongside the framework architecture an open-source artifact and a set of extensive verification and validation experiments. +The authors differentiate between different digital twins within \gls{ed}, such as \begin{enumerate*}[label=(\arabic*)] + \item descriptive twin + \item informative twin + \item predictive twin + \item comprehensive twin + \item autonomous twin +\end{enumerate*} +that together form the \gls{ed}. +The \emph{predictive twin} leverages data driven operational analytics to create \gls{ml} models. Authors argue that alongside simulation, \gls{ml} models should also have a significant role for modeling system workloads in \eg application fingerprinting. +Within the \emph{autonomous twin} the authors use \gls{rl} to train agents that can be used to make control decisions in order to optimize different processes. +In order to model the cooling system the authors use the Modelica software, and to predict energy power draw they coded a Python script. +The authors provide a intuitive way to interact with the system using a visual dashboard, and an advanced augmented reality model. +The authors posit that the best way to address the 3V's of data (velocity, volume and variety) is to use augmented reality coupled with dashboards. + + + Predictive modelling uses statistics to predict outcomes. When deployed commercially, for example in datacenters, predictive modelling is often referred to as predictive analytics~\cite{Wikipedia:PredictiveModelling}. Almost any statistical model can be used for prediction purposes, but nowadays predictive analysis is synonymous with machine learning. @@ -23,6 +51,62 @@ The process of inference from data to provide the best explanation is called abd %Include something about data-preprocessing in the pipeline. %See the article by Fei Tao +\subsection{Datacenter simulation}\label{sss:simulation} + +\begin{table}[h] + \centering + \renewcommand{\arraystretch}{1.4} + \begin{tabular}{m{0.7\linewidth}cc} + \toprule + Feature & \gls{ed} & \\ + \midrule + Virtual Prototyping & & \\ + Scenario Exploration & & \\ + 3D Facility Modelling & & \\ + Predictive maintenance & & \\ + Predictive energy modelling & & \\ + Reliability and availability modeling & & \\ + Cooling modelling & & \\ + Network modelling & & \\ + Predictive modelling & & \\ + Power consumption modelling & & \\ + Visual analytics dashboard & & \\ + Forensic analysis and diagnostics & & \\ + Failure detection & & \\ + Operational optimization & & \\ + Resource allocation & & \\ + \midrule + \end{tabular} + \caption{Comparison of selected features of existing datacenter digital twins.} +\end{table} + + +\begin{table}[h] + \centering + \renewcommand{\arraystretch}{1.4} + \begin{tabular}{cccm{0.3\linewidth}c} + \toprule + Project & Environment & Stakeholders & Highlighted Features & GUI \\ + \midrule + + CloudSim & Cloud, Fog, Edge & Research & VC\textsuperscript{$\star$}, N, S, E, WF, FD, EXP, CM, PI & \ding{51}\textsuperscript{$\dagger$} \\ + \midrule + SimGrid & Grid, P3P, Cloud & Research, Edu. & VC\textsuperscript{$\star$}, N\textsuperscript{$\star$}, S, E\textsuperscript{$\star$}, WF\textsuperscript{$\star$} & \ding{51}\textsuperscript{$\dagger$} \\ + \midrule + DGSim & Grid & Research & WF, F, EXP & \ding{55} \\ + \midrule + GroudSim & Grid, Cloud & Research & WF, CM, F & \ding{55} \\ + \midrule + iCanCloud & Cloud & Research & VC, N\textsuperscript{$\star$}, S, CM & \ding{51}\textsuperscript{$\star$} \\ + \midrule + \textbf{OpenDC} & Cloud & Research, Edu. & VC\textsuperscript{$\star$}, N, S, E\textsuperscript{$\star$},, CM, FS\textsuperscript{$\star$}, ML, WF, F\textsuperscript{$\star$}, PI, EXP\textsuperscript{$\star$} & \ding{51}\textsuperscript{$\star$} \\ + \bottomrule + \end{tabular} + \caption{Comparison of selected datacenter simulators. \textbf{Models:} VC = VMs and containers; N = Network, S = Storage, E = Energy, CM = Cost Models, FS = FaaS, ML = Machine Learning, WF = Workflows, FD = Federation; \textbf{Phenomena:} F = Failures, PI = Performance interface; \textbf{Tools:} EXP = Experiment automation; \textbf{Support:} \ding{51} = Yes, \ding{55} = No; $\dagger$ = extension, not integrated; $\star$ = advanced, carefully calibrated feature. Adapted form Mastenbroek \etal} +\end{table} + + + One of the key arguments that speak for a datacenter digital twin is that datacenters already connect hundreds of monitoring sensors and data coming from them. Monitoring of server racks, VM's, CPU profiling and all that give us lots of data. @@ -32,7 +116,6 @@ ODA can predict failures, help maintain the equipment, save bills, cut costs. But currently one of the key challenges is to somehow connect the physical and virtual spaces. The answer to how to do this is a digital twin. - %[citation needed] As of 2026, there is a lack of consensus of what is a digital twin. diff --git a/content/intro.tex b/content/intro.tex index 0aa5b84..6b92521 100644 --- a/content/intro.tex +++ b/content/intro.tex @@ -65,6 +65,8 @@ Many \gls{dcdt} frameworks still lack critical data analysis components, fault d Such limitations gravely reduce the applicability of \gls{dcdt}'s in real world scenarios~\cite{DBLP:journals/corr/IosupKLVG22}. \gls{dcdt}'s are urgently needed, because datacenters exhibit hundreds unexpected events every day,such as \eg service failures or hardware faults. Downtime, which is the result of failures, disturbs the users and produces unfulfilled \gls{sla}~\cite{DBLP:conf/acsos/TalluriOVTI21}. +% On the operational side, two main areas have been instrumental for improving datacenter efficiency: simulations and analysis of system telemetry. Additional improvements necessitate innovative tools that focus on end-to-end improvement, such as digital twins~\cite{DBLP:ExaDigiT}. +% DT's merge both simulation and telemetry to develop a holistic virtual representation of the system, bridging both the physical and virtual worlds. However, predicting datacenter behaviour quickly and reliably is a non-trivial problem that remains insufficiently unaddressed in the existing \gls{dcdt} architectures ~\cite{DBLP:conf/wosp/SumanCNTMI24, DBLP:journals/computer/AthavaleBBMMPS24} and deployments~\cite{DBLP:conf/sc/BrewerMKWBHSGGW24}. @@ -116,6 +118,13 @@ To answer the third research question, we will need to design comprehensive expe \section{Thesis Contributions}\label{s:thesis-contributions} + +\begin{enumerate}[label=\textbf{C\arabic*.}, align=left] + \item An open-source \gls{dcdt} prototype for predictive facility maintenance, with data analysis supported by in-band and out-of-band telemetry and discrete-event simulation. + \item Extensive evaluation and validation experiments of the system. + \item Demonstration of the \gls{dcdt} in pair with a simulated datacenter. +\end{enumerate} + \section{Plagiarism Declaration}\label{s:plagiarism-declaraion} I hereby declare that this thesis is my own independent work and writing. The thesis does not contain any material copied from other sources (person, Internet, or AI), and has not been submitted for assessment elsewhere. |
