summaryrefslogtreecommitdiff
path: root/opendc-experiments/opendc-experiments-tf20
AgeCommit message (Collapse)Author
2022-10-05refactor(sim/core): Use SimulationScheduler in coroutine dispatcherFabian Mastenbroek
This change updates the implementation of `SimulationDispatcher` to use a (possibly user-provided) `SimulationScheduler` for managing the execution of the simulation and future tasks.
2022-06-15fix(sim/compute): Always recompute power usageFabian Mastenbroek
This change fixes an issue in the `SimBareMetalMachine` implementation where the power usage was only updated after a non-zero duration. However, this would mean that OpenDC would possibly report incorrect power usage values when multiple convergence calls occured at the same timestamp.
2022-06-15fix(exp/tf20): Derive device statistics directly from SimMachineFabian Mastenbroek
This change updates the implementation of SimTFDevice to directly use the metrics provided by the `SimBareMetalMachine` class, instead of computing these metrics itself.
2022-05-06refactor(exp/tf20): Convert experiment into integration testFabian Mastenbroek
This change removes the `TensorFlowExperiment` in favour of an integration test that can be run during CI invocations. Given that the experiment was not very sophisticated (in terms of data collection), we believe it is better suited as an integration test.
2022-05-06fix(exp/tf20): Fix infinite loop due to invalid roundingFabian Mastenbroek
This change fixes an issue with the `SimTFDevice` implementation where very small amounts of FLOPs would cause the device to enter an infinite loop. We now round the value up to ensure that the device always consumes FLOPs.
2022-05-06refactor(exp/tf20): Remove OpenTelemetry from TF20 experimentFabian Mastenbroek
This change removes the OpenTelemetry integration from the OpenDC Tensorflow 2020 experiments. Previously, we chose to integrate OpenTelemetry to provide a unified way to report metrics to the users. See the previous commit removing it from the "Compute" modules for the reasoning behind this change.
2022-05-06refactor(exp/tf20): Directly expose device stats stats to userFabian Mastenbroek
This change updates the `TFDevice` interface to directly expose statistics about the accelerator device to the user. Previously, the user had to access these values through OpenTelemetry, which required substantial extra work.
2022-02-18refactor(simulator): Remove delta parameter from flow callbacksFabian Mastenbroek
This change removes the delta parameter from the callbacks of the flow framework. This parameter was used to indicate the duration in time between the last call and the current call. However, its usefulness was limited since the actual delta values needed by implementors of this method had to be bridged across different flow callbacks.
2022-02-18build: Remove opendc-platform moduleFabian Mastenbroek
This change removes the opendc-platform module from the project. This module represented a Java platform which was previously used for sharing a set of dependency versions between subprojects. However, with the version catalogue that was added by Gradle, we currently do not use the platform anymore.
2022-02-18perf(common): Optimize TimerSchedulerFabian Mastenbroek
This change updates the TimerScheduler implementation to directly use the Delay object instead of running the timers inside a coroutine. Constructing the coroutine is more expensive, so we prefer running in a Runnable.
2022-02-18refactor(utils): Rename utils module to common moduleFabian Mastenbroek
This change adds a new module, opendc-common, that contains functionality that is shared across OpenDC's modules. We move the existing utils module into this new module.
2022-02-15refactor: Update OpenTelemetry to version 1.11Fabian Mastenbroek
This change updates the OpenDC codebase to use OpenTelemetry v1.11, which stabilizes the metrics API. This stabilization brings quite a few breaking changes, so significant changes are necessary inside the OpenDC codebase.
2021-10-25refactor(simulator): Support running workloads without coroutinesFabian Mastenbroek
This change updates the SimMachine interface to drop the coroutine requirement for running a workload on a machines. Users can now asynchronously start a workload and receive notifications via the workload callbacks. Users still have the possibility to suspend execution during workload execution by using the new `runWorkload` method, which is implemented on top of the new `startWorkload` primitive.
2021-10-03perf(simulator): Make convergence callback optionalFabian Mastenbroek
This change adds two new properties for controlling whether the convergence callbacks of the source and consumer respectively should be invoked. This saves a lot of unnecessary calls for stages that do not have any implementation of the `onConvergence` method.
2021-10-03refactor(simulator): Create separate callbacks for remaining eventsFabian Mastenbroek
This change creates separate callbacks for the remaining events: onStart, onStop and onConverge.
2021-10-03refactor(simulator): Remove capacity eventFabian Mastenbroek
This change removes the Capacity entry from FlowEvent. Since the source is always pulled on a capacity change, we do not need a separate event for this.
2021-10-03refactor(simulator): Migrate to flow-based simulationFabian Mastenbroek
This change renames the `opendc-simulator-resources` module into the `opendc-simulator-flow` module to indicate that the core simulation model of OpenDC is based around modelling and simulating flows. Previously, the distinction between resource consumer and provider, and input and output caused some confusion. By switching to a flow-based model, this distinction is now clear (as in, the water flows from source to consumer/sink).
2021-10-03refactor(simulator): Add support for pushing flow from contextFabian Mastenbroek
This change adds a new method to `SimResourceContext` called `push` which allows users to change the requested flow rate directly without having to interrupt the consumer.
2021-10-03refactor(simulator): Combine work and deadline to durationFabian Mastenbroek
This change removes the work and deadline properties from the SimResourceCommand.Consume class and introduces a new property duration. This property is now used in conjunction with the limit to compute the amount of work processed by a resource provider. Previously, we used both work and deadline to compute the duration and the amount of remaining work at the end of a consumption. However, with this change, we ensure that a resource consumption always runs at the same speed once establishing, drastically simplifying the computation for the amount of work processed during the consumption.
2021-09-02refactor(format): Remove environment reader from format libraryFabian Mastenbroek
This change removes the environment reader from the format library since they are highly specific for the particular experiment. In the future, we hope to have a single format to setup the entire datacenter (perhaps similar to the format used by the web runner).
2021-08-25build: Upgrade to OpenTelemetry 1.5Fabian Mastenbroek
This change upgrades the OpenTelemetry dependency to version 1.5, which contains various breaking changes in the metrics API.
2021-06-21simulator: Re-organize compute simulator moduleFabian Mastenbroek
This change re-organizes the classes of the compute simulator module to make a clearer distinction between the hardware, firmware and software interfaces in this module.
2021-06-11simulator: Integrate power subsystem with compute subsystemFabian Mastenbroek
This change integrates the power subsystem of the simulator with the compute subsystem by exposing a new field on a SimBareMetalMachine, psu, which provides access to the machine's PSU, which in turn can be connected to a SimPowerOutlet.
2021-06-09build: Eliminate most Hadoop dependenciesFabian Mastenbroek
This change eliminates all Hadoop dependencies that are not necessary for Parquet to work correctly. As a result, the number of dependencies should now be greatly reduced, which in turn leads to less artifacts that need to be retrieved at build time.
2021-06-03simulator: Split CPUFreq subsystem in compute simulatorFabian Mastenbroek
This change splits the functionality present in the CPUFreq subsystem of the compute simulation. Currently, the DVFS functionality is embedded in SimBareMetalMachine. However, this functionality should not exist within the firmware layer of a machine. Instead, the operating system should perform this logic (in OpenDC this should be the hypervisor). Furthermore, this change moves the scaling driver into the power package. The power driver is a machine/firmware specific implementation that computes the power consumption of a machine.
2021-06-02simulator: Start consumers directly from workloadFabian Mastenbroek
This change updates the SimWorkload interfaces to allow implementations to start consumers for the machine resource providers directly.
2021-06-01simulator: Centralize resource logic in SimResourceInterpreterFabian Mastenbroek
This change introduces the SimResourceInterpreter which centralizes the logic for scheduling and interpreting the communication between resource consumer and provider. This approach offers better performance due to avoiding invalidating the state of the resource context when not necessary. Benchmarks show in the best case a 5x performance improvement and at worst a 2x improvement.
2021-05-18chore: Address deprecations due to Kotlin 1.5Fabian Mastenbroek
This change addresses the deprecations that were caused by the migration to Kotlin 1.5.
2021-05-09exp: Add explanation about experimental nature of moduleFabian Mastenbroek
2021-05-09exp: Implement TensorFlow distribution strategiesWenchen Lai
2021-05-09exp: Model TensorFlow compute devicesWenchen Lai
2021-05-09exp: Add simple network model for TensorFlow experimentsWenchen Lai
2021-05-09exp: Add environments for TensorFlow experimentsWenchen Lai
This change adds the experimental environments that are being used for the TensorFlow on OpenDC experiments.
2021-05-09exp: Add environment reader for TensorFlow experimentsWenchen Lai
This change adds a reader for the environment/topology format that is used for the TensorFlow experiments in OpenDC.
2021-05-09exp: Add model of TensorFlow Keras APIWenchen Lai
This change adds a model of TensorFlow's Keras API to OpenDC.
2021-05-09exp: Add TensorFlow experiment setupWenchen Lai
This change adds the initial experiment setup for the TensorFlow on OpenDC experiments.