| Age | Commit message (Collapse) | Author |
|
This change adds a new module called opendc-experiments-base which will
provide a base for doing experiments with OpenDC. The initial feature we
introduce is the service registry which acts as DNS for services to
register during experimentation.
|
|
This change updates the interface of `ComputeService` to provide access
to the instances (servers) that have been registered with the compute
service. This allows metric collectors to query the metrics of the
servers that are currently running.
|
|
This change updates the `ComputeServiceHelper` class to provide the
failure model via a parameter to the `run` method instead of constructor
parameter. This separates the construction of the topology from the
simulation of the workload.
|
|
This change updates the virtual machine performance interference model
so that the interference domain can be constructed independently of the
interference profile. As a consequence, the construction of the topology
now does not depend anymore on the interference profile.
|
|
This change moves the Random dependency outside the interference model,
to allow the interference model to be completely immutable and passable
between different simulations.
|
|
This change moves the core of the VM interference model from the flow
module into the compute simulator. This logic can be contained in the
compute simulator and does not need to leak into the flow-level
simulator.
|
|
This change fixes an issue in the `SimBareMetalMachine` implementation
where the power usage was only updated after a non-zero duration. However,
this would mean that OpenDC would possibly report incorrect power usage
values when multiple convergence calls occured at the same timestamp.
|
|
This change updates the implementation of SimTFDevice to directly use
the metrics provided by the `SimBareMetalMachine` class, instead of
computing these metrics itself.
|
|
This change updates the Capelin experiments so it can be distributed and
executed independently of the main OpenDC distribution. We provide a new
command line interface for users to directly run the experiments.
Alternatively, the `CapelinRunner` class encapsulates the logic for
running the experiments and can be used programmatically.
|
|
This change removes the `TensorFlowExperiment` in favour of an
integration test that can be run during CI invocations. Given that the
experiment was not very sophisticated (in terms of data collection), we
believe it is better suited as an integration test.
|
|
This change fixes an issue with the `SimTFDevice` implementation where
very small amounts of FLOPs would cause the device to enter an infinite
loop. We now round the value up to ensure that the device always
consumes FLOPs.
|
|
This change adds a new module, opendc-faas-workload that contains
helper code for constructing simulations of FaaS-based workloads
using OpenDC. In addition, we add an integration test that demonstrates
the capabilities of the helper tool and the FaaS platform of OpenDC.
|
|
This change removes the OpenTelemetry integration from the OpenDC
Tensorflow 2020 experiments. Previously, we chose to integrate
OpenTelemetry to provide a unified way to report metrics to the users.
See the previous commit removing it from the "Compute" modules for the
reasoning behind this change.
|
|
This change removes the OpenTelemetry integration from the OpenDC
FaaS modules. Previously, we chose to integrate OpenTelemetry to
provide a unified way to report metrics to the users.
See the previous commit removing it from the "Compute" modules for the
reasoning behind this change.
|
|
This change removes the OpenTelemetry integration from the OpenDC
Compute modules. Previously, we chose to integrate OpenTelemetry to
provide a unified way to report metrics to the users.
Although this worked as expected, the overhead of the OpenTelemetry when
collecting metrics during simulation was considerable and lacked more
optimization opportunities (other than providing a separate API
implementation). Furthermore, since we were tied to OpenTelemetry's SDK
implementation, we experienced issues with throttling and registering
multiple instruments.
We will instead use another approach, where we expose the core metrics
in OpenDC via specialized interfaces (see the commits before) such that
access is fast and can be done without having to interface with
OpenTelemetry. In addition, we will provide an adapter to that is able
to forward these metrics to OpenTelemetry implementations, so we can
still integrate with the wider ecosystem.
|
|
This change updates the `TFDevice` interface to directly expose
statistics about the accelerator device to the user. Previously, the
user had to access these values through OpenTelemetry, which required
substantial extra work.
|
|
This change introduces a `ComputeMetricReader` class that can be used as
a replacement for the `CoroutineMetricReader` class when reading metrics
from the Compute service. This implementation operates directly on a
`ComputeService` instance, providing better performance.
|
|
This change updates the Gradle build configuration of the project to
publish the different type of modules (e.g., opendc-compute,
opendc-simulator) into their own groups.
|
|
This change updates the Gradle build configuration to ensure that all
library modules (that will be published) use testing and are included in
coverage reports. This should ensure the public modules remain well
tested.
|
|
This change updates the compute support library to load the VM
interference model via the OpenDC trace library, which provides a
generic interface for reading interference models associated with
workload traces.
|
|
This change removes the delta parameter from the callbacks of the flow
framework. This parameter was used to indicate the duration in time
between the last call and the current call. However, its usefulness was
limited since the actual delta values needed by implementors of this
method had to be bridged across different flow callbacks.
|
|
This change updates the simulator implementation to flush the active
progress when accessing the hypervisor counters. Previously, if the
counters were accessed, while the mux or consumer was in progress, its
counter values were not accurate.
|
|
This change removes the opendc-platform module from the project. This
module represented a Java platform which was previously used for sharing
a set of dependency versions between subprojects. However, with the
version catalogue that was added by Gradle, we currently do not use the
platform anymore.
|
|
This change updates the TimerScheduler implementation to directly use
the Delay object instead of running the timers inside a coroutine.
Constructing the coroutine is more expensive, so we prefer running in a
Runnable.
|
|
This change adds a new module, opendc-common, that contains
functionality that is shared across OpenDC's modules.
We move the existing utils module into this new module.
|
|
This change updates the OpenDC codebase to use OpenTelemetry v1.11,
which stabilizes the metrics API. This stabilization brings quite a few
breaking changes, so significant changes are necessary inside the OpenDC
codebase.
|
|
This change adds a new module, opendc-workflow-workload that contains
helper code for constructing workflow simulations using OpenDC.
|
|
This change updates the implementation of the trace converter and
SimTrace implementation to support cases where there is a gap between
samples in the trace data.
This change allows users to specify what to do in case samples are
missing in the trace. The available options are specified in
`SimTrace.FillMode`. Currently, we support either carrying the previous
value forward or set the usage to zero.
|
|
This change updates the SimMachine interface to drop the coroutine
requirement for running a workload on a machines. Users can now
asynchronously start a workload and receive notifications via the
workload callbacks.
Users still have the possibility to suspend execution during workload
execution by using the new `runWorkload` method, which is implemented on
top of the new `startWorkload` primitive.
|
|
This change redesigns the ComputeMonitor interface to reduce the number
of memory allocations necessary during a collection cycle.
|
|
This change redesigns the virtual machine interference algorithm to have
a fixed memory usage per `VmInterferenceModel` instance. Previously, for
every interference domain, a copy of the model would be created, leading
to OutOfMemory errors when running multiple experiments at the same
time.
|
|
This change addresses an issue where energy usage was not computed
correctly if the machine performed no work in between collection cycles.
|
|
This change updates the MaxMinFlowMultiplexer implementation to skip the
fair-share algorithm in case the total demand is lower than the
available capacity. In this case, no re-division of capacity is
necessary.
|
|
This change improves the performance of the SimTraceWorkload class by
changing the way trace fragments are read and processed by the CPU
consumers.
|
|
This change updates the MaxMinFlowMultiplexer implementation to
centrally manage the deadlines of the `FlowSource`s as opposed to each
source using its own timers. For large amounts of inputs, this is much
faster as the multiplexer already needs to traverse each input on a
timer expiration of an input.
|
|
|
|
This change adds a new interface to the SimHypervisor interface that
exposes the CPU time counters directly. These are derived from the flow
counters and will be used by SimHost to expose them via telemetry.
|
|
This change adds two new properties for controlling whether the
convergence callbacks of the source and consumer respectively should be
invoked. This saves a lot of unnecessary calls for stages that do not
have any implementation of the `onConvergence` method.
|
|
This change creates separate callbacks for the remaining events:
onStart, onStop and onConverge.
|
|
This change removes the Capacity entry from FlowEvent. Since the source
is always pulled on a capacity change, we do not need a separate event
for this.
|
|
This change separates the push and pull flags in
FlowConsumerContextImpl, meaning that sources can now push directly
without pulling and vice versa.
|
|
This change renames the `opendc-simulator-resources` module into the
`opendc-simulator-flow` module to indicate that the core simulation
model of OpenDC is based around modelling and simulating flows.
Previously, the distinction between resource consumer and provider, and
input and output caused some confusion. By switching to a flow-based
model, this distinction is now clear (as in, the water flows from source
to consumer/sink).
|
|
This change removes the distributor and aggregator interfaces in favour
of a single switch interface. Since the switch interface is as powerful
as both the distributor and aggregator, we don't need the latter two.
|
|
This change removes the `onUpdate` callback from the
`SimResourceProviderLogic` interface. Instead, users should now update
counters using either `onConsume` or `onConverge`.
|
|
This change updates the simulator implementation to always invoke the
`SimResourceConsumer.onNext` callback when the resource context is
invalidated. This allows users to update the resource counter or do some
other work if the context has changed.
|
|
This change removes unnecessary allocations in the SimResourceInterpreter
caused by the way timers were allocated for the resource context.
|
|
This change adds a new method to `SimResourceContext` called `push`
which allows users to change the requested flow rate directly without
having to interrupt the consumer.
|
|
This change removes the work and deadline properties from the
SimResourceCommand.Consume class and introduces a new property duration.
This property is now used in conjunction with the limit to compute the amount
of work processed by a resource provider.
Previously, we used both work and deadline to compute the duration and
the amount of remaining work at the end of a consumption. However, with
this change, we ensure that a resource consumption always runs at the
same speed once establishing, drastically simplifying the computation
for the amount of work processed during the consumption.
|
|
This change drops the requirement for a clock parameter when
constructing a ComputeMetricExporter, since it will now derive the
timestamp from the recorded metrics.
|
|
This change adds a new API for writing traces in a trace format.
Currently, writing is only supported by the OpenDC VM format, but over
time the other formats will also have support for writing added.
|