summaryrefslogtreecommitdiff
path: root/opendc-compute
AgeCommit message (Collapse)Author
2022-06-23bug(compute/workload): Fix conversion from UUID to BinaryFabian Mastenbroek
This change fixes an issue with the metric exporting code in OpenDC where a UUID is not converted correctly into a `Binary` object that is consumed by the Apache Parquet library.
2022-06-07refactor(trace/api): Introduce type system for trace APIFabian Mastenbroek
This change updates the trace API by introducing a limited type system for the table columns. Previously, the table columns could have any possible type representable by the JVM. With this change, we limit the available types to a small type system.
2022-05-06refactor(telemetry): Remove dependency on OpenTelemetry SDKFabian Mastenbroek
This change removes the dependency on the OpenTelemetry SDK. Instead, we'll only expose metrics via the OpenTelemetry API in the future via adapter classes.
2022-05-06refactor(compute/service): Remove OpenTelemetry from "compute" modulesFabian Mastenbroek
This change removes the OpenTelemetry integration from the OpenDC Compute modules. Previously, we chose to integrate OpenTelemetry to provide a unified way to report metrics to the users. Although this worked as expected, the overhead of the OpenTelemetry when collecting metrics during simulation was considerable and lacked more optimization opportunities (other than providing a separate API implementation). Furthermore, since we were tied to OpenTelemetry's SDK implementation, we experienced issues with throttling and registering multiple instruments. We will instead use another approach, where we expose the core metrics in OpenDC via specialized interfaces (see the commits before) such that access is fast and can be done without having to interface with OpenTelemetry. In addition, we will provide an adapter to that is able to forward these metrics to OpenTelemetry implementations, so we can still integrate with the wider ecosystem.
2022-05-06refactor(telemetry/compute): Support direct metric accessFabian Mastenbroek
This change introduces a `ComputeMetricReader` class that can be used as a replacement for the `CoroutineMetricReader` class when reading metrics from the Compute service. This implementation operates directly on a `ComputeService` instance, providing better performance.
2022-05-04refactor(compute): Directly expose scheduler stats to userFabian Mastenbroek
This change updates the `ComputeService` interface to directly expose statistics about the scheduler to the user, such that they do not necessarily have to interact with OpenTelemetry to obtain these values.
2022-05-04feat(compute): Add support for looking up hostsFabian Mastenbroek
This change adds the ability for users to lookup the `Host` on which a `Server` is hosted (if any). This allows the user to potentially interact with the `Host` directly, e.g., in order to obtain advanced metrics.
2022-05-03refactor(compute): Expose CPU and system stats via Host interfaceFabian Mastenbroek
This change updates the `Host` interface to directly expose CPU and system stats to be used by components that interface with the `Host` interface. Previously, this would require the user to interact with the OpenTelemetry SDK. Although that is still possible for more advanced usage cases, users can use the following methods to easily access common host and guest statistics.
2022-05-02refactor(compute): Do not use Avro when exporting experiment dataFabian Mastenbroek
This change updates the `ParquetDataWriter` class to not use the `parquet-avro` library for exporting experiment data, but instead to use the low-level APIs to directly write the data in Parquet format.
2022-05-01refactor(trace/parquet): Support custom ReadSupport implementationsFabian Mastenbroek
This change updates the `LocalParquetReader` implementation to support custom `ReadSupport` implementations, so we do not have to rely on the Avro implementation necessarily.
2022-04-24build: Move modules into subgroupsFabian Mastenbroek
This change updates the Gradle build configuration of the project to publish the different type of modules (e.g., opendc-compute, opendc-simulator) into their own groups.
2022-04-23build: Enable testing for all library modulesFabian Mastenbroek
This change updates the Gradle build configuration to ensure that all library modules (that will be published) use testing and are included in coverage reports. This should ensure the public modules remain well tested.
2022-04-22refactor(trace/api): Move conventions into separate packageFabian Mastenbroek
This change moves the trace conventions (such as table and column names) in a separate conv package, so that it is separated from the main API. This also allows for a potential move into a separate module in the future.
2022-04-22refactor(compute): Load interference model via trace libraryFabian Mastenbroek
This change updates the compute support library to load the VM interference model via the OpenDC trace library, which provides a generic interface for reading interference models associated with workload traces.
2022-02-18fix(simulator): Flush results before accessing countersFabian Mastenbroek
This change updates the simulator implementation to flush the active progress when accessing the hypervisor counters. Previously, if the counters were accessed, while the mux or consumer was in progress, its counter values were not accurate.
2022-02-18fix(compute): Disallow duplicate UIDs for SimHostFabian Mastenbroek
This change fixes an issue with the ComputeServiceHelper where it allowed users to register multiple SimHost objects with the same UID. See this issue for more information: https://github.com/atlarge-research/opendc/issues/51
2022-02-18build: Remove opendc-platform moduleFabian Mastenbroek
This change removes the opendc-platform module from the project. This module represented a Java platform which was previously used for sharing a set of dependency versions between subprojects. However, with the version catalogue that was added by Gradle, we currently do not use the platform anymore.
2022-02-18refactor(utils): Rename utils module to common moduleFabian Mastenbroek
This change adds a new module, opendc-common, that contains functionality that is shared across OpenDC's modules. We move the existing utils module into this new module.
2022-02-18feat(utils): Add Pacer to pace scheduling cyclesFabian Mastenbroek
This change adds a new Pacer class that can pace the incoming scheduling requests into scheduling cycles by allowing the user to specify a scheduling quantum.
2022-02-15refactor: Update OpenTelemetry to version 1.11Fabian Mastenbroek
This change updates the OpenDC codebase to use OpenTelemetry v1.11, which stabilizes the metrics API. This stabilization brings quite a few breaking changes, so significant changes are necessary inside the OpenDC codebase.
2021-11-16feat(workflow): Add helper tools for workflow simulationsFabian Mastenbroek
This change adds a new module, opendc-workflow-workload that contains helper code for constructing workflow simulations using OpenDC.
2021-11-02refactor(trace): Support gaps in trace dataFabian Mastenbroek
This change updates the implementation of the trace converter and SimTrace implementation to support cases where there is a gap between samples in the trace data. This change allows users to specify what to do in case samples are missing in the trace. The available options are specified in `SimTrace.FillMode`. Currently, we support either carrying the previous value forward or set the usage to zero.
2021-10-25refactor(simulator): Support running workloads without coroutinesFabian Mastenbroek
This change updates the SimMachine interface to drop the coroutine requirement for running a workload on a machines. Users can now asynchronously start a workload and receive notifications via the workload callbacks. Users still have the possibility to suspend execution during workload execution by using the new `runWorkload` method, which is implemented on top of the new `startWorkload` primitive.
2021-10-25perf(telemetry): Prevent allocations during collection cycleFabian Mastenbroek
This change redesigns the ComputeMonitor interface to reduce the number of memory allocations necessary during a collection cycle.
2021-10-25feat(telemetry): Report provisioning time of virtual machinesFabian Mastenbroek
This change adds support for collecting the provisioning time of virtual machines in addition to their boot time.
2021-10-25perf(compute): Redesign VM interference algorithmFabian Mastenbroek
This change redesigns the virtual machine interference algorithm to have a fixed memory usage per `VmInterferenceModel` instance. Previously, for every interference domain, a copy of the model would be created, leading to OutOfMemory errors when running multiple experiments at the same time.
2021-10-25feat(compute): Support filtering hosts based on CPU capacityFabian Mastenbroek
This change allows users to create servers with a smaller CPU capacity than the host, by specifying the CPU capacity via metadata. This also allows filtering hosts based on their available CPU capacity.
2021-10-08perf(simulator): Optimize SimTraceWorkloadFabian Mastenbroek
This change improves the performance of the SimTraceWorkload class by changing the way trace fragments are read and processed by the CPU consumers.
2021-10-03perf(compute): Optimize telemetry collectionFabian Mastenbroek
This change optimizes the telemetry collection in the SimHost class. Previously, there was significant overhead in collecting the metrics of this and associated classes due large `Attributes` object that did not cache accesses to `hashCode()`. We now wrap this object and manually cache the hash code.
2021-10-03feat(simulator): Expose CPU time counters directly on hypervisorFabian Mastenbroek
This change adds a new interface to the SimHypervisor interface that exposes the CPU time counters directly. These are derived from the flow counters and will be used by SimHost to expose them via telemetry.
2021-10-03refactor(simulator): Migrate to flow-based simulationFabian Mastenbroek
This change renames the `opendc-simulator-resources` module into the `opendc-simulator-flow` module to indicate that the core simulation model of OpenDC is based around modelling and simulating flows. Previously, the distinction between resource consumer and provider, and input and output caused some confusion. By switching to a flow-based model, this distinction is now clear (as in, the water flows from source to consumer/sink).
2021-10-03refactor(simulator): Merge distributor and aggregator into switchFabian Mastenbroek
This change removes the distributor and aggregator interfaces in favour of a single switch interface. Since the switch interface is as powerful as both the distributor and aggregator, we don't need the latter two.
2021-09-28fix(compute): Write null values explicitly in Parquet exporterFabian Mastenbroek
2021-09-28fix(compute): Do not recover guests in non-error stateFabian Mastenbroek
2021-09-28refactor(telemetry): Do not require clock for ComputeMetricExporterFabian Mastenbroek
This change drops the requirement for a clock parameter when constructing a ComputeMetricExporter, since it will now derive the timestamp from the recorded metrics.
2021-09-21feat(trace): Add support for writing tracesFabian Mastenbroek
This change adds a new API for writing traces in a trace format. Currently, writing is only supported by the OpenDC VM format, but over time the other formats will also have support for writing added.
2021-09-20refactor(trace): Simplify TraceFormat SPI interfaceFabian Mastenbroek
This change simplifies the TraceFormat SPI interface by reducing the number of interfaces that implementors need to implement to only TraceFormat.
2021-09-20perf(compute): Use index lookup in trace loaderFabian Mastenbroek
This change updates the ComputeWorkloadLoader to use index column lookups in order to prevent having to lookup the index for every row.
2021-09-20refactor(trace): Unify columns of different tablesFabian Mastenbroek
This change unifies columns of different tables used by trace formats. This concretely means that instead of having columns specific per table (e.g., RESOURCE_ID and RESOURCE_STATE_ID), with this changes these columns are shared between the tables with a single definition (RESOURCE_ID).
2021-09-19feat(trace): Add tool for converting workload tracesFabian Mastenbroek
This change adds an initial implementation to the trace library for converting between workload trace formats. Currently the tool supports only converting to the OpenDC VM trace format. However, in the future, we will add support for converting between other formats as well.
2021-09-19feat(trace): Update OpenDC VM trace formatFabian Mastenbroek
This change optimizes the OpenDC VM trace format by removing unnecessary columns as well as optimizing the writer settings. The new implementation still supports reading the old trace format in case users run OpenDC with older workload traces.
2021-09-19feat(trace): Add support for internal OpenDC VM trace formatFabian Mastenbroek
This change adds official support to the trace library for the internal VM trace format used by OpenDC for its experiments. This is a compact format that uses Parquet to store the virtual machine trace data in two Parquet files.
2021-09-19feat(trace): Add support for Azure VM trace formatFabian Mastenbroek
This change adds support in the trace library for the Azure VM trace format.
2021-09-19feat(trace): Add support for extended Bitbrains trace formatFabian Mastenbroek
This change adds support in the trace library for the extended Bitbrains format. This format is slightly different than the CSV format used by the original Bitbrains traces and contains more fields.
2021-09-19refactor(capelin): Make workload sampling model extensibleFabian Mastenbroek
This change updates the workload sampling implementation to be more flexible in the way the workload is constructed. Users can now sample multiple workloads at the same time using multiple samplers and use them as a single workload to simulate.
2021-09-19feat(capelin): Support creating CPU-optimized topologyFabian Mastenbroek
This change adds support for creating a topology that is CPU-optimized for simulation. This means that all the CPU resources of a machine are merged into a single large CPU in order to reduce simulation time.
2021-09-19perf(compute): Add option for optimizing SimHost simulationFabian Mastenbroek
This change adds an option for optimizing SimHost simulation by combining all the CPUs of a machine into a single large CPU. For most workloads, this does not significantly affect the simulation results, but does improve the simulation time by a lot.
2021-09-19refactor(capelin): Support flexible topology creationFabian Mastenbroek
This change adds support for creating flexible topologies by creating a TopologyFactory interface that is responsible for configuring the hosts of a compute service.
2021-09-19refactor(capelin): Extract common code out of Capelin experimentsFabian Mastenbroek
This change creates a new module for doing simulations with virtual machine workloads. We have found that a lot of code in the Capelin experiments code is being re-used by non-experiment modules.
2021-09-17refactor(telemetry): Standardize SimHost metricsFabian Mastenbroek
This change standardizes the metrics emitted by SimHost instances and their guests based on the OpenTelemetry semantic conventions. We now also report CPU time as opposed to CPU work as this metric is more commonly used.