| Age | Commit message (Collapse) | Author |
|
This change updates the interface of `ComputeService` to provide access
to the instances (servers) that have been registered with the compute
service. This allows metric collectors to query the metrics of the
servers that are currently running.
|
|
This change updates the `ComputeServiceHelper` class to provide the
failure model via a parameter to the `run` method instead of constructor
parameter. This separates the construction of the topology from the
simulation of the workload.
|
|
This change updates the virtual machine performance interference model
so that the interference domain can be constructed independently of the
interference profile. As a consequence, the construction of the topology
now does not depend anymore on the interference profile.
|
|
This change moves the Random dependency outside the interference model,
to allow the interference model to be completely immutable and passable
between different simulations.
|
|
This change introduces a new interface `JobManager` that is responsible
for communicating with the backend about the available jobs and updating
their status when the runner is simulating a job. This manager can be
injected into the `OpenDCRunner` class and allows users to provide
different sources for the jobs, not only the current REST API.
|
|
This change fixes an issue with the OpenDC web runner where it would
report NaN values for some of the metrics due to the topology being
empty. This in turn causes issues in the frontend.
|
|
This change updates the web runner implementation to gracefully exit the
current thread when interrupted.
|
|
This change updates the OpenDC web runner implementation to use the
correct context ClassLoader for simulation jobs running inside a
ForkJoinPool. By default, the ForkJoinPool will use the system class
loader which does not have access to the services needed by the web
runner.
|
|
This change splits the command line interface from the OpenDC web runner
into a separate configuration. We plan to re-use the runner code for a Quarkus
extension that integrates the runner in development mode.
|
|
This change removes the OpenTelemetry integration from the OpenDC
Compute modules. Previously, we chose to integrate OpenTelemetry to
provide a unified way to report metrics to the users.
Although this worked as expected, the overhead of the OpenTelemetry when
collecting metrics during simulation was considerable and lacked more
optimization opportunities (other than providing a separate API
implementation). Furthermore, since we were tied to OpenTelemetry's SDK
implementation, we experienced issues with throttling and registering
multiple instruments.
We will instead use another approach, where we expose the core metrics
in OpenDC via specialized interfaces (see the commits before) such that
access is fast and can be done without having to interface with
OpenTelemetry. In addition, we will provide an adapter to that is able
to forward these metrics to OpenTelemetry implementations, so we can
still integrate with the wider ecosystem.
|
|
This change introduces a `ComputeMetricReader` class that can be used as
a replacement for the `CoroutineMetricReader` class when reading metrics
from the Compute service. This implementation operates directly on a
`ComputeService` instance, providing better performance.
|
|
This change updates the compute support library to load the VM
interference model via the OpenDC trace library, which provides a
generic interface for reading interference models associated with
workload traces.
|
|
This change contains a rewrite of the OpenDC web runner implementation,
which now supports terminating simulations when exceeding a deadline, as
well as executing multiple simulation jobs at the same time.
Furthermore, we have extracted the runner from the command line
interface, so that we can offer this functionality as a library in the
future.
|
|
This change updates the web runner implementation to use the new API
client introduced in the previous commit.
|
|
This change adds support for custom audience values in the web runner.
If the audience used by the user is different from the default value
(https://api.opendc.org/v2/), then the runner fails to obtain a valid
access token for the API.
|
|
This change updates the OpenDC codebase to use OpenTelemetry v1.11,
which stabilizes the metrics API. This stabilization brings quite a few
breaking changes, so significant changes are necessary inside the OpenDC
codebase.
|
|
This change adds a new module, opendc-workflow-workload that contains
helper code for constructing workflow simulations using OpenDC.
|
|
This change redesigns the ComputeMonitor interface to reduce the number
of memory allocations necessary during a collection cycle.
|
|
This change redesigns the virtual machine interference algorithm to have
a fixed memory usage per `VmInterferenceModel` instance. Previously, for
every interference domain, a copy of the model would be created, leading
to OutOfMemory errors when running multiple experiments at the same
time.
|
|
This change renames the `opendc-simulator-resources` module into the
`opendc-simulator-flow` module to indicate that the core simulation
model of OpenDC is based around modelling and simulating flows.
Previously, the distinction between resource consumer and provider, and
input and output caused some confusion. By switching to a flow-based
model, this distinction is now clear (as in, the water flows from source
to consumer/sink).
|
|
This change drops the requirement for a clock parameter when
constructing a ComputeMetricExporter, since it will now derive the
timestamp from the recorded metrics.
|
|
This change adds a new API for writing traces in a trace format.
Currently, writing is only supported by the OpenDC VM format, but over
time the other formats will also have support for writing added.
|
|
This change updates the workload sampling implementation to be more
flexible in the way the workload is constructed. Users can now sample
multiple workloads at the same time using multiple samplers and use them
as a single workload to simulate.
|
|
This change adds support for creating flexible topologies by creating a
TopologyFactory interface that is responsible for configuring the hosts
of a compute service.
|
|
This change creates a new module for doing simulations with virtual
machine workloads. We have found that a lot of code in the Capelin
experiments code is being re-used by non-experiment modules.
|
|
This change standardizes the metrics emitted by SimHost instances and
their guests based on the OpenTelemetry semantic conventions. We now
also report CPU time as opposed to CPU work as this metric is more
commonly used.
|
|
This change updates the OpenDC compute service implementation with
multiple meters that follow the OpenTelemetry conventions.
|
|
This change refactors the telemetry implementation by creating a
separate MeterProvider per service or host. This means we have to keep
track of multiple metric producers, but that we can attach resource
information to each of the MeterProviders like we would in a real world
scenario.
|
|
This change moves the fault injection logic directly into the
opendc-compute-simulator module, so that it can operate at a higher
abstraction. In the future, we might again split the module if we can
re-use some of its logic.
|
|
|
|
This change moves the metric collection outside the Capelin codebase in
a separate module so other modules can also benefit from the compute
metric collection code.
|
|
|
|
This change removes the environment reader from the format library since
they are highly specific for the particular experiment. In the future,
we hope to have a single format to setup the entire datacenter (perhaps
similar to the format used by the web runner).
|
|
This change eliminates the unnecessary conversions from double to long
in the Capelin metric processing code.
|
|
This change upgrades the OpenTelemetry dependency to version 1.5, which
contains various breaking changes in the metrics API.
|
|
This change updates the FilterScheduler implementation to follow more
closely the scheduler implementation in OpenStack's Nova. We now
normalize the weights, support many of the filters and weights in
OpenStack and support overcommitting resources.
|
|
This change fixes an issue where the topology generated by the frontend
was not accepted by the API server.
|
|
This change updates the web runner to not require direct database access
for scheduling simulation jobs. Instead, the runner polls the public
REST API for available jobs and reports its results through there.
|
|
This change updates reimplements the performance interference model to
work on top of the universal resource model in
`opendc-simulator-resources`. This enables us to model interference and
performance variability of other resources such as disk or network in
the future.
|
|
This change updates the trace reader implementation to remove their
dependency on the performance interference model. In a future commit, we
will instead pass the performance interference model via the
host/hypervisor.
|
|
This change re-organizes the classes of the compute simulator module to
make a clearer distinction between the hardware, firmware and software
interfaces in this module.
|
|
This change addresses the deprecations that were caused by the migration
to Kotlin 1.5.
|
|
This change updates the project structure to become flattened.
Previously, the simulator, frontend and API each lived into their own
directory.
With this change, all modules of the project live in the top-level
directory of the repository. This should improve discoverability of
modules of the project.
|