| Age | Commit message (Collapse) | Author |
|
This change updates the simulator implementation to flush the active
progress when accessing the hypervisor counters. Previously, if the
counters were accessed, while the mux or consumer was in progress, its
counter values were not accurate.
|
|
This change fixes an issue with the ComputeServiceHelper where it
allowed users to register multiple SimHost objects with the same UID.
See this issue for more information:
https://github.com/atlarge-research/opendc/issues/51
|
|
This change removes the opendc-platform module from the project. This
module represented a Java platform which was previously used for sharing
a set of dependency versions between subprojects. However, with the
version catalogue that was added by Gradle, we currently do not use the
platform anymore.
|
|
This change adds a new module, opendc-common, that contains
functionality that is shared across OpenDC's modules.
We move the existing utils module into this new module.
|
|
This change updates the OpenDC codebase to use OpenTelemetry v1.11,
which stabilizes the metrics API. This stabilization brings quite a few
breaking changes, so significant changes are necessary inside the OpenDC
codebase.
|
|
This change updates the SimMachine interface to drop the coroutine
requirement for running a workload on a machines. Users can now
asynchronously start a workload and receive notifications via the
workload callbacks.
Users still have the possibility to suspend execution during workload
execution by using the new `runWorkload` method, which is implemented on
top of the new `startWorkload` primitive.
|
|
This change redesigns the ComputeMonitor interface to reduce the number
of memory allocations necessary during a collection cycle.
|
|
This change adds support for collecting the provisioning time of virtual
machines in addition to their boot time.
|
|
This change allows users to create servers with a smaller CPU capacity
than the host, by specifying the CPU capacity via metadata. This also
allows filtering hosts based on their available CPU capacity.
|
|
This change improves the performance of the SimTraceWorkload class by
changing the way trace fragments are read and processed by the CPU
consumers.
|
|
This change optimizes the telemetry collection in the SimHost class.
Previously, there was significant overhead in collecting the metrics of
this and associated classes due large `Attributes` object that did not
cache accesses to `hashCode()`. We now wrap this object and manually
cache the hash code.
|
|
This change adds a new interface to the SimHypervisor interface that
exposes the CPU time counters directly. These are derived from the flow
counters and will be used by SimHost to expose them via telemetry.
|
|
This change renames the `opendc-simulator-resources` module into the
`opendc-simulator-flow` module to indicate that the core simulation
model of OpenDC is based around modelling and simulating flows.
Previously, the distinction between resource consumer and provider, and
input and output caused some confusion. By switching to a flow-based
model, this distinction is now clear (as in, the water flows from source
to consumer/sink).
|
|
This change removes the distributor and aggregator interfaces in favour
of a single switch interface. Since the switch interface is as powerful
as both the distributor and aggregator, we don't need the latter two.
|
|
|
|
This change drops the requirement for a clock parameter when
constructing a ComputeMetricExporter, since it will now derive the
timestamp from the recorded metrics.
|
|
This change updates the workload sampling implementation to be more
flexible in the way the workload is constructed. Users can now sample
multiple workloads at the same time using multiple samplers and use them
as a single workload to simulate.
|
|
This change adds an option for optimizing SimHost simulation by
combining all the CPUs of a machine into a single large CPU. For most
workloads, this does not significantly affect the simulation results,
but does improve the simulation time by a lot.
|
|
This change standardizes the metrics emitted by SimHost instances and
their guests based on the OpenTelemetry semantic conventions. We now
also report CPU time as opposed to CPU work as this metric is more
commonly used.
|
|
This change refactors the telemetry implementation by creating a
separate MeterProvider per service or host. This means we have to keep
track of multiple metric producers, but that we can attach resource
information to each of the MeterProviders like we would in a real world
scenario.
|
|
This change simplifies the CoroutineMetricReader implementation by
removing the seperation of reader and exporter jobs.
|
|
|
|
This change moves the fault injection logic directly into the
opendc-compute-simulator module, so that it can operate at a higher
abstraction. In the future, we might again split the module if we can
re-use some of its logic.
|
|
This change fixes an issue in SimHost where guests that where inactive
were also failed, causing an IllegalStateException.
|
|
This change fixes an issue where all servers could not be scheduled due
to the memory size of the host being computed incorrectly.
|
|
This change updates the SimHost implementation to track the up and
downtime of hypervisor guests.
|
|
|
|
This change enables host to overcommit their memory when testing whether
new servers can fit on the host.
|
|
This change adds new metrics for tracking the up and downtime of hosts
due to failures. In addition, this change adds a test to verify whether
the metrics are collected correctly.
|
|
This change updates the SimHost implementation to measure the power draw
of the machine without PSU overhead to make the results more realistic.
|
|
This change eliminates unnecessary double to long conversions in the
simulator. Previously, we used longs to denote the amount of work.
However, in the mean time we have switched to doubles in the lower
stack.
|
|
This change upgrades the OpenTelemetry dependency to version 1.5, which
contains various breaking changes in the metrics API.
|
|
This change adds support for failures in the SimHost implementation.
Failing a host will now cause the virtual machine to enter an error
state.
|
|
This change updates the Bitbrains trace tests with the updated trace
that does not hardcode the duration of the trace fragments.
|
|
This change refactors the trace workload in the OpenDC simulator to
track execute a fragment based on the fragment's timestamp. This makes
sure that the trace is replayed identically to the original execution.
|
|
This change updates reimplements the performance interference model to
work on top of the universal resource model in
`opendc-simulator-resources`. This enables us to model interference and
performance variability of other resources such as disk or network in
the future.
|
|
This change re-organizes the classes of the compute simulator module to
make a clearer distinction between the hardware, firmware and software
interfaces in this module.
|
|
This change removes the AutoCloseable interface from the
SimResourceProvider and removes the concept of a resource lifecycle.
Instead, resource providers are now either active (running a resource
consumer) or in-active (being idle), which simplifies implementation.
|
|
This change enables the experiments to share the SimResourceInterpreter
across multiple hosts, which allows updates to be scheduled efficiently
for all machines at the same time. This is especially beneficial if the
machines operate on the same time slices.
|
|
This change integrates the power subsystem of the simulator with the
compute subsystem by exposing a new field on a SimBareMetalMachine, psu,
which provides access to the machine's PSU, which in turn can be
connected to a SimPowerOutlet.
|
|
This change splits the functionality present in the CPUFreq subsystem of
the compute simulation. Currently, the DVFS functionality is embedded in
SimBareMetalMachine. However, this functionality should not exist within
the firmware layer of a machine. Instead, the operating system should
perform this logic (in OpenDC this should be the hypervisor).
Furthermore, this change moves the scaling driver into the power
package. The power driver is a machine/firmware specific implementation
that computes the power consumption of a machine.
|
|
This change adds a new interface to the resources library for accessing
metrics of resources such as work, demand and overcommitted work. With
this change, we do not need an implementation specific listener
interface in SimResourceSwitchMaxMin anymore.
Another benefit of this approach is that updates will be scheduled more
efficiently and progress will only be reported once the system has
reached a steady-state for that timestamp.
|
|
This change introduces the SimResourceInterpreter which centralizes the
logic for scheduling and interpreting the communication between resource
consumer and provider.
This approach offers better performance due to avoiding invalidating the
state of the resource context when not necessary. Benchmarks show in the
best case a 5x performance improvement and at worst a 2x improvement.
|
|
This change adds support for the Gradle version catalog feature in our
build configuration. This allows us to have a single file,
gradle/libs.versions.toml, which contains all the dependency versions
used in this project.
|
|
This change updates the build scripts to use type-safe project accessors
when specifying build dependencies between modules.
|
|
This change introduces the SimResourceScheduler interface, which is a
generic interface for scheduling the coordination and synchronization
between resource providers and resource consumers.
This interface replaces the need for users to manually specify the clock
and coroutine context per resource provider.
|
|
This change updates the project structure to become flattened.
Previously, the simulator, frontend and API each lived into their own
directory.
With this change, all modules of the project live in the top-level
directory of the repository. This should improve discoverability of
modules of the project.
|