| Age | Commit message (Collapse) | Author |
|
* Updated the memory usage of Tasks. Still in Progress.
* Merged Task and ServiceTask -> Currently not fully working!!!
* Fixed bugs that made the merger between Task and ServiceTask not work well.
* Updated jdk version for Dockerfile
* Removed ServiceFlavor.java and Task.kt
|
|
Updated FilterScheduler.kt for performance using a constantly sorted Array
|
|
FilterScheduler.kt (#372)
|
|
* Updated FilterScheduler for performance
|
|
* Updated output format to reduce size
* using sum of gpu capacities instead of single max
* passing provisioned GPU cores to host view
* fix supply update trigger
* fixing floating point error, leading to negative demand
* fixing double mismatch, due to floating point in precision
* adding additional check if demand can be satisfied in the simple way
* adds workload invalidation if remaining duration for all resources is 0
* invalidating flow distributors after demand update
* spotless apply
* updating tests
* exporting power consumption of compute resources directly from gpu instead of PSU
* using big decimal to avoid floating point in-precision
* rolls back to pass-through version of PSU, before GPU implementation
* places flowdistributor between PSU and compute resources
* adds check to avoid null exception if supply is pushed without demand
* fixing task id type
* Adds memorizing GPU scheduler
* adds boundary for negative remaining work
* implemented tests for GPU scheduler filter
* Revert "Updated output format to reduce size"
This reverts commit 7171de8e0512a863df4962f64560ac7bad1fb48d.
* spotless aply
---------
Co-authored-by: DanteNiewenhuis <d.niewenhuis@hotmail.com>
|
|
|
|
(#342)
* renamed performance counter to distinguish different resource types
* added GPU, modelled similar to CPU
* added GPUs to machine model
* list of GPUs instead of single instance
* renamed memory speed to bandwidth
* enabled parsing of GPU resources
* split powermodel into cpu and GPU powermodel
* added gpu parsing tests
* added idea of host level scheduling
* added tests for multi gpu parsing
* renamed powermodel to cpupowermodel
* clarified naming of cpu and gpu components
* added resource type to flow suplier and edge
* added resourcetype
* added GPU components and resource type to fragments
* added GPU to workload and updated resource usage retrieval
* implemented first version of multi resource
* added name to workload
* renamed perfomance counters
* removed commented out code
* removed deprecated comments
* included demand and supply into calculations
* resolving rebase mismatches
* moved resource type from flowedge class to common package
* added available resources to machinees
* cleaner separation if workload is started of simmachine or vm
* Replaced exception with dedicated enum
* Only looping over resources that are actually used
* using hashmaps to handle resourcetype instead of arrays for readability
* fixed condition
* tracking finished workloads per resource type
* removed resource type from flowedge
* made supply and demand distribution resource specific
* added power model for GPU
* removed unused test setup
* removed depracated comments
* removed unused parameter
* added ID for GPU
* added GPUs and GPU performance counters (naively)
* implemented capturing of GPU statistics
* added reminders for future implementations
* renamed properties for better identification
* added capturing GPU statistics
* implemented first tests for GPUs
* unified access to performance counters
* added interface for general compute resource handling
* implemented multi resource support in simmachine
* added individual edge to VM per resource
* extended compute resource interface
* implemented multi-resource support in PSU
* implemented generic retrieval of computeresources
* implemented mult-resource suppport in vm
* made method use more resource specific
* implemented simple GPU tests
* rolled back frquency and demand use
* made naming independent of used resource
* using workloads resources instead of VMs to determine available resource
* implemented determination of used resources in workload
* removed logging statements
* implemented reading from workload
* fixed naming for host-level allocation
* fixed next deadline calculation
* fixed forwarding supply
* reduced memory footprint
* made GPU powermodel nullable
* maded Gpu powermodel configurable in topology
* implemented tests for basic gpu scheduler
* added gpu properties
* implemented weights, filter and simple cpu-gpu scheduler
* spotless apply
* spotless apply pt. 2
* fixed capitalization
* spotless kotlin run
* implemented coloumn export
* todo update
* removed code comments
* Merged PerformanceCounter classes into one & removed interface
* removed GPU specific powermodel
* Rebase master: kept both versions of TopologyFactories
* renamed CpuPowermodel to resource independent Powermodel
Moved it from Cpu package to power package
* implementated default of getResourceType & removed overrides if possible
* split getResourceType into Consumer and Supplier
* added power as resource type
* reduced supply demand from arrayList to single value
* combining GPUs into one large GPU, until full multi-gpu support
* merged distribution policy enum with corresponding factory
* added comment
* post-rebase fixes
* aligned naming
* Added GPU metrics to task output
* Updates power resource type to uppercase.
Standardizes the `ResourceType.Power` enum to `ResourceType.POWER`
for consistency with other resource types and improved readability.
* Removes deprecated test assertions
Removes commented-out assertions in GPU tests.
These assertions are no longer needed and clutter the test code.
* Renames MaxMinFairnessStrategy to Policy
Renames MaxMinFairnessStrategy to MaxMinFairnessPolicy for
clarity and consistency with naming conventions. This change
affects the factory and distributor to use the updated name.
* applies spotless
* nulls GPUs as it is not used
|
|
* Separate timeshift into an interface and add it to memorizing
* Run spotless apply
* Remove random from memorizing sched test
* Record time on task termination
* spotless apply
|
|
* Remove task from scheduler bookkeeping after failure
* Support carbon forecasting in timeshift
* Register scheduler and carbonmodel in context
* Preliminary working task stopping; carbon intensity bug
* Working carbon based stop. Two timeshift thresholds
* Add a pause state task and guest
* Move task stopper to allocation spec
* Start tracking num pauses
|
|
* Start time shifting
* Existing experiments work with new columns
* Remove unused traces dir
* Update java to 21 LTS and jacoco to be compatible
* Minimal working timeshifting
* Timeshift scheduler linked as carbon receiver
* Add basic tests for timeshift scheduler
* Run spotless apply
* Modify tarce format tests to support new fields
* Change all mentions of java 19 to 21
* Add a deferAll option to workload to make all tasks deferrable
* Run spotless apply
* Copy traces from resources in web dockerfile
|
|
* Change scheduler API to include task removal and add tests
* Check if memorizing schduler works with the whole system
* Spotless apply
* Expand function name and improve documentation
|
|
* Removed unused components. Updated tests.
Improved checkpointing model
Improved model, started with SimPowerSource
implemented FailureModels and Checkpointing
First working version
midway commit
first update
All simulation are now run with a single CPU and single MemoryUnit. multi CPUs are combined into one. This is for performance and explainability.
* fixed merge conflicts
* Updated M3SA paths.
* Fixed small typo
|
|
CPUs are combined into one. This is for performance and explainability. (#255)
|
|
* Added a max failure for tasks. If tasks fail more times, they get cancelled
* Added maxNumFailures to the frontend
* Updated tests
|
|
* Started on reimplementing the SimTrace implementation
* updated trace format. Fragments now do not have a deadline, but a duration. The Fragments are executed in order.
|
|
* Updated SimTrace to use a single ArrayDeque instead of three separate lists for deadline, cpuUsage, and coreCount
* Renamed input files to tasks.parquet and fragments.parquet. Renamed server to task. OpenDC nows exports tasks.parquet instead of server.parquet
|
|
* Fixed a problem which caused the CPU limit to be much lower than it should be.
AllocationPolicy is now properly exposed to the user
* Fixed tests
* spotless kotlin
|
|
|
|
* Updated the topology format to JSON. Updated TopologyReader.kt to handle JSON filed. Added documentation for the new format.
* applied spotless kotlin
* small update
* Updated for spotless apply
* Updated for spotless apply
|
|
* Updated all package versions including kotlin. Updated all web-server tests to run.
* Changed the java version of the tests. OpenDC now only supports java 19.
* small update
* test update
* new update
* updated docker version to 19
* updated docker version to 19
|
|
* Updated metrics to consistently be ms
* Updated metrics to consistently be ms
* Updated metric documentation on the site
* Updated some tests to work with the updated metrics
|
|
* made sure all tests run
* fixed typo
* executed spotlessApply
* added back web-server tests
* updated SimTraceWorkloadTest
* commented CapelinRunneer and GreenifierRunner tests
* commented one SimTraceWorkloadTest
* altered codecov execution
* changed codecov
|
|
This change updates the API interface of the OpenDC Compute service to
not suspend execution using Kotlin Coroutines.
The suspending modifiers were introduced in case the ComputeClient would
communicate with the service over a network connection. However, the main
use-case has been together with the ComputeService, where the suspending
modifiers only frustrate the user experience when writing experiments.
Furthermore, with the advent of Project Loom, it is not necessarily a
problem to block the (virtual) thread during network communications.
|
|
This change replaces the use of `CoroutineContext` for passing the
`SimulationDispatcher` across the different modules of OpenDC by the
lightweight `Dispatcher` interface of the OpenDC common module.
|
|
This change updates the `SimulationScheduler` class to implement the
`Dispatcher` interface from the OpenDC Common module, so that OpenDC
modules only need to depend on the common module for dispatching future
task (possibly in simulation).
|
|
This change updates the modules of OpenDC to always accept
the `InstantSource` interface as source of time. Previously we used
`java.time.Clock`, but this class is bound to a time zone which does not
make sense for our use-cases.
Since `java.time.Clock` implements `java.time.InstantSource`, it can be
used in places that require an `InstantSource` as parameter. Conversion
from `InstantSource` to `Clock` is also possible by invoking
`InstantSource#withZone`.
|
|
This change updates the modules of OpenDC to always accept
the `RandomGenerator` interface as source of randomness. This interface
is implemented by the slower `java.util.Random` class, but also by the
faster `java.util.SplittableRandom` class
|
|
This change updates the `Host` interface to remove the suspend modifiers
to the start, stop, spawn, and delete methods of this interface. We now
assume that the host immediately launches the guest on invocation of
this method.
|
|
This change updates `SimHost` to support modeling the time and resource
consumption it takes to boot the host. The boot procedure is modeled as a
`SimWorkload`.
|
|
This change updates the implementation of `SimHost` to use workload
chaining for modelling boot delays. Previously, this was implemented by
sleeping 1 millisecond using Kotlin coroutines. With this change, we
remove the need for coroutines and instead use the `SimDurationWorkload`
to model the boot delay.
In the future, we envision a user-supplied stochastic boot model to
model the boot delay for VM instances.
|
|
This change re-implements the OpenDC compute simulator framework using
the new flow2 framework for modelling multi-edge flow networks. The
re-implementation is written in Java and focusses on performance and
clean API surface.
|
|
This change updates the build configuration to use Spotless for code
formating of both Kotlin and Java.
|
|
This change updates the repository to remove the use of wildcard imports
everywhere. Wildcard imports are not allowed by default by Ktlint as
well as Google's Java style guide.
|
|
This change renames the method `runBlockingSimulation` to
`runSimulation` to put more emphasis on the simulation part of the
method. The blocking part is not that important, but this behavior is
still described in the method documentation.
|
|
This change updates the implementation of `SimulationDispatcher` to use
a (possibly user-provided) `SimulationScheduler` for managing the
execution of the simulation and future tasks.
|
|
This change simplifies the SimHypervisor class into a single
implementation. Previously, it was implemented as an abstract class with
multiple implementations for each multiplexer type. We now pass the
multiplexer type as parameter to the SimHypervisor constructor.
|
|
This change updates the virtual machine performance interference model
so that the interference domain can be constructed independently of the
interference profile. As a consequence, the construction of the topology
now does not depend anymore on the interference profile.
|
|
This change updates the constructor of SimHost to receive a
`SimBareMetalMachine` and `SimHypervisor` directly instead of
construction these objects itself. This ensures better testability and
also simplifies the constructor of this class, especially when future
changes to `SimBareMetalMachine` or `SimHypervisor` change their
constructors.
|
|
This change moves the Random dependency outside the interference model,
to allow the interference model to be completely immutable and passable
between different simulations.
|
|
This change removes the OpenTelemetry integration from the OpenDC
Compute modules. Previously, we chose to integrate OpenTelemetry to
provide a unified way to report metrics to the users.
Although this worked as expected, the overhead of the OpenTelemetry when
collecting metrics during simulation was considerable and lacked more
optimization opportunities (other than providing a separate API
implementation). Furthermore, since we were tied to OpenTelemetry's SDK
implementation, we experienced issues with throttling and registering
multiple instruments.
We will instead use another approach, where we expose the core metrics
in OpenDC via specialized interfaces (see the commits before) such that
access is fast and can be done without having to interface with
OpenTelemetry. In addition, we will provide an adapter to that is able
to forward these metrics to OpenTelemetry implementations, so we can
still integrate with the wider ecosystem.
|
|
This change updates the `ComputeService` interface to directly expose
statistics about the scheduler to the user, such that they do not
necessarily have to interact with OpenTelemetry to obtain these values.
|
|
This change updates the `Host` interface to directly expose CPU and
system stats to be used by components that interface with the `Host`
interface.
Previously, this would require the user to interact with the OpenTelemetry SDK.
Although that is still possible for more advanced usage cases, users can
use the following methods to easily access common host and guest
statistics.
|
|
This change updates the simulator implementation to flush the active
progress when accessing the hypervisor counters. Previously, if the
counters were accessed, while the mux or consumer was in progress, its
counter values were not accurate.
|
|
This change updates the OpenDC codebase to use OpenTelemetry v1.11,
which stabilizes the metrics API. This stabilization brings quite a few
breaking changes, so significant changes are necessary inside the OpenDC
codebase.
|
|
This change redesigns the ComputeMonitor interface to reduce the number
of memory allocations necessary during a collection cycle.
|
|
This change improves the performance of the SimTraceWorkload class by
changing the way trace fragments are read and processed by the CPU
consumers.
|
|
This change adds a new interface to the SimHypervisor interface that
exposes the CPU time counters directly. These are derived from the flow
counters and will be used by SimHost to expose them via telemetry.
|
|
This change renames the `opendc-simulator-resources` module into the
`opendc-simulator-flow` module to indicate that the core simulation
model of OpenDC is based around modelling and simulating flows.
Previously, the distinction between resource consumer and provider, and
input and output caused some confusion. By switching to a flow-based
model, this distinction is now clear (as in, the water flows from source
to consumer/sink).
|
|
This change drops the requirement for a clock parameter when
constructing a ComputeMetricExporter, since it will now derive the
timestamp from the recorded metrics.
|
|
This change standardizes the metrics emitted by SimHost instances and
their guests based on the OpenTelemetry semantic conventions. We now
also report CPU time as opposed to CPU work as this metric is more
commonly used.
|