| Age | Commit message (Collapse) | Author |
|
* Updated FilterScheduler for performance
|
|
(#342)
* renamed performance counter to distinguish different resource types
* added GPU, modelled similar to CPU
* added GPUs to machine model
* list of GPUs instead of single instance
* renamed memory speed to bandwidth
* enabled parsing of GPU resources
* split powermodel into cpu and GPU powermodel
* added gpu parsing tests
* added idea of host level scheduling
* added tests for multi gpu parsing
* renamed powermodel to cpupowermodel
* clarified naming of cpu and gpu components
* added resource type to flow suplier and edge
* added resourcetype
* added GPU components and resource type to fragments
* added GPU to workload and updated resource usage retrieval
* implemented first version of multi resource
* added name to workload
* renamed perfomance counters
* removed commented out code
* removed deprecated comments
* included demand and supply into calculations
* resolving rebase mismatches
* moved resource type from flowedge class to common package
* added available resources to machinees
* cleaner separation if workload is started of simmachine or vm
* Replaced exception with dedicated enum
* Only looping over resources that are actually used
* using hashmaps to handle resourcetype instead of arrays for readability
* fixed condition
* tracking finished workloads per resource type
* removed resource type from flowedge
* made supply and demand distribution resource specific
* added power model for GPU
* removed unused test setup
* removed depracated comments
* removed unused parameter
* added ID for GPU
* added GPUs and GPU performance counters (naively)
* implemented capturing of GPU statistics
* added reminders for future implementations
* renamed properties for better identification
* added capturing GPU statistics
* implemented first tests for GPUs
* unified access to performance counters
* added interface for general compute resource handling
* implemented multi resource support in simmachine
* added individual edge to VM per resource
* extended compute resource interface
* implemented multi-resource support in PSU
* implemented generic retrieval of computeresources
* implemented mult-resource suppport in vm
* made method use more resource specific
* implemented simple GPU tests
* rolled back frquency and demand use
* made naming independent of used resource
* using workloads resources instead of VMs to determine available resource
* implemented determination of used resources in workload
* removed logging statements
* implemented reading from workload
* fixed naming for host-level allocation
* fixed next deadline calculation
* fixed forwarding supply
* reduced memory footprint
* made GPU powermodel nullable
* maded Gpu powermodel configurable in topology
* implemented tests for basic gpu scheduler
* added gpu properties
* implemented weights, filter and simple cpu-gpu scheduler
* spotless apply
* spotless apply pt. 2
* fixed capitalization
* spotless kotlin run
* implemented coloumn export
* todo update
* removed code comments
* Merged PerformanceCounter classes into one & removed interface
* removed GPU specific powermodel
* Rebase master: kept both versions of TopologyFactories
* renamed CpuPowermodel to resource independent Powermodel
Moved it from Cpu package to power package
* implementated default of getResourceType & removed overrides if possible
* split getResourceType into Consumer and Supplier
* added power as resource type
* reduced supply demand from arrayList to single value
* combining GPUs into one large GPU, until full multi-gpu support
* merged distribution policy enum with corresponding factory
* added comment
* post-rebase fixes
* aligned naming
* Added GPU metrics to task output
* Updates power resource type to uppercase.
Standardizes the `ResourceType.Power` enum to `ResourceType.POWER`
for consistency with other resource types and improved readability.
* Removes deprecated test assertions
Removes commented-out assertions in GPU tests.
These assertions are no longer needed and clutter the test code.
* Renames MaxMinFairnessStrategy to Policy
Renames MaxMinFairnessStrategy to MaxMinFairnessPolicy for
clarity and consistency with naming conventions. This change
affects the factory and distributor to use the updated name.
* applies spotless
* nulls GPUs as it is not used
|
|
* Remove task from scheduler bookkeeping after failure
* Support carbon forecasting in timeshift
* Register scheduler and carbonmodel in context
* Preliminary working task stopping; carbon intensity bug
* Working carbon based stop. Two timeshift thresholds
* Add a pause state task and guest
* Move task stopper to allocation spec
* Start tracking num pauses
|
|
|
|
* Start time shifting
* Existing experiments work with new columns
* Remove unused traces dir
* Update java to 21 LTS and jacoco to be compatible
* Minimal working timeshifting
* Timeshift scheduler linked as carbon receiver
* Add basic tests for timeshift scheduler
* Run spotless apply
* Modify tarce format tests to support new fields
* Change all mentions of java 19 to 21
* Add a deferAll option to workload to make all tasks deferrable
* Run spotless apply
* Copy traces from resources in web dockerfile
|
|
|
|
* Updated logging
* removed DoubleThresholdBatteryPolicy.java
|
|
* Added sampleFraction and submissionTime to the workloadSpec
* Removed commented code
|
|
new workload types are added (#294)
|
|
|
|
* Added power sources to OpenDC.
In the current form each Cluster has a single power source that is connected to all hosts in that cluster
* Added power sources to OpenDC.
In the current form each Cluster has a single power source that is connected to all hosts in that cluster
* Ran spotless Kotlin and Java
|
|
* Updated tests
Changed all floats into doubles to have consistency over the whole framework
Made a small update to the multiplexer to better push through supply and demand
Fixed small typo
Updated M3SA paths.
fixed merge conflicts
Removed unused components. Updated tests.
Improved checkpointing model
Improved model, started with SimPowerSource
implemented FailureModels and Checkpointing
First working version
midway commit
first update
All simulation are now run with a single CPU and single MemoryUnit. multi CPUs are combined into one. This is for performance and explainability.
* Updated test memory
|
|
* Removed unused components. Updated tests.
Improved checkpointing model
Improved model, started with SimPowerSource
implemented FailureModels and Checkpointing
First working version
midway commit
first update
All simulation are now run with a single CPU and single MemoryUnit. multi CPUs are combined into one. This is for performance and explainability.
* fixed merge conflicts
* Updated M3SA paths.
* Fixed small typo
|
|
CPUs are combined into one. This is for performance and explainability. (#255)
|
|
* Updated SimTrace to use a single ArrayDeque instead of three separate lists for deadline, cpuUsage, and coreCount
* Renamed input files to tasks.parquet and fragments.parquet. Renamed server to task. OpenDC nows exports tasks.parquet instead of server.parquet
|
|
|
|
into objects when the scenario is being executed by ScenarioRunner.kt (#227)
|
|
* Initial commit
* Implemented a new systems of defining and running scenarios / portfolios. Scenarios and Portfolios can now be defined using JSON files similar to topologies. This allows user to define experiments without changing any KotLin code.
* Ran spotlessApply
|
|
* Updated the topology format to JSON. Updated TopologyReader.kt to handle JSON filed. Added documentation for the new format.
* applied spotless kotlin
* small update
* Updated for spotless apply
* Updated for spotless apply
|
|
* Updated all package versions including kotlin. Updated all web-server tests to run.
* Changed the java version of the tests. OpenDC now only supports java 19.
* small update
* test update
* new update
* updated docker version to 19
* updated docker version to 19
|
|
* Updated metrics and parquet output
* fixed typos
|
|
* removed experiment-compute and integrated all components into opendc-compute
* updated workflow gradle file
* removed unneeded code
|
|
This change replaces the use of `CoroutineContext` for passing the
`SimulationDispatcher` across the different modules of OpenDC by the
lightweight `Dispatcher` interface of the OpenDC common module.
|
|
This change updates the `SimulationScheduler` class to implement the
`Dispatcher` interface from the OpenDC Common module, so that OpenDC
modules only need to depend on the common module for dispatching future
task (possibly in simulation).
|
|
This change re-implements the OpenDC compute simulator framework using
the new flow2 framework for modelling multi-edge flow networks. The
re-implementation is written in Java and focusses on performance and
clean API surface.
|
|
This change fixes an issue with the OpenDC web runner where the default
job timeout was set to 10 ms instead of 10 minutes. For longer
simulations, this would cause the job to be terminated.
|
|
This change resolves an issue in the web runner where the finished VMs
would always be reported as zero.
|
|
This change updates the Quarkus-based web server to add support for
tracking and limiting the simulation minutes used by the user in order
to prevent misuse of shared resources.
|
|
This change updates the build configuration to use Spotless for code
formating of both Kotlin and Java.
|
|
This change updates the repository to remove the use of wildcard imports
everywhere. Wildcard imports are not allowed by default by Ktlint as
well as Google's Java style guide.
|
|
This change renames the method `runBlockingSimulation` to
`runSimulation` to put more emphasis on the simulation part of the
method. The blocking part is not that important, but this behavior is
still described in the method documentation.
|
|
This change updates the implementation of `SimulationDispatcher` to use
a (possibly user-provided) `SimulationScheduler` for managing the
execution of the simulation and future tasks.
|
|
This change removes the Topology interface from the
`opendc-experiments-compute` module, which was meant for provisioning
the experimental topology. Howerver, with the stateless `HostSpec`
class, it is not needed to resolve the topology everytime.
|
|
This change integrates the classes from the old
`opendc-compute-workload` module into the `opendc-experiments-compute`
module. This new module contains helper classes for setting up
experiments with the OpenDC compute service.
|
|
This change updates the OpenDC web runner to use the new
`opendc-experiments-base` module for setting up the experimental
environment and simulate the workload.
|
|
This change updates the interface of `ComputeService` to provide access
to the instances (servers) that have been registered with the compute
service. This allows metric collectors to query the metrics of the
servers that are currently running.
|
|
This change updates the `ComputeServiceHelper` class to provide the
failure model via a parameter to the `run` method instead of constructor
parameter. This separates the construction of the topology from the
simulation of the workload.
|
|
This change updates the virtual machine performance interference model
so that the interference domain can be constructed independently of the
interference profile. As a consequence, the construction of the topology
now does not depend anymore on the interference profile.
|
|
This change moves the Random dependency outside the interference model,
to allow the interference model to be completely immutable and passable
between different simulations.
|
|
This change introduces a new interface `JobManager` that is responsible
for communicating with the backend about the available jobs and updating
their status when the runner is simulating a job. This manager can be
injected into the `OpenDCRunner` class and allows users to provide
different sources for the jobs, not only the current REST API.
|
|
This change fixes an issue with the OpenDC web runner where it would
report NaN values for some of the metrics due to the topology being
empty. This in turn causes issues in the frontend.
|
|
This change updates the web runner implementation to gracefully exit the
current thread when interrupted.
|
|
This change updates the OpenDC web runner implementation to use the
correct context ClassLoader for simulation jobs running inside a
ForkJoinPool. By default, the ForkJoinPool will use the system class
loader which does not have access to the services needed by the web
runner.
|
|
This change splits the command line interface from the OpenDC web runner
into a separate configuration. We plan to re-use the runner code for a Quarkus
extension that integrates the runner in development mode.
|
|
This change removes the OpenTelemetry integration from the OpenDC
Compute modules. Previously, we chose to integrate OpenTelemetry to
provide a unified way to report metrics to the users.
Although this worked as expected, the overhead of the OpenTelemetry when
collecting metrics during simulation was considerable and lacked more
optimization opportunities (other than providing a separate API
implementation). Furthermore, since we were tied to OpenTelemetry's SDK
implementation, we experienced issues with throttling and registering
multiple instruments.
We will instead use another approach, where we expose the core metrics
in OpenDC via specialized interfaces (see the commits before) such that
access is fast and can be done without having to interface with
OpenTelemetry. In addition, we will provide an adapter to that is able
to forward these metrics to OpenTelemetry implementations, so we can
still integrate with the wider ecosystem.
|
|
This change introduces a `ComputeMetricReader` class that can be used as
a replacement for the `CoroutineMetricReader` class when reading metrics
from the Compute service. This implementation operates directly on a
`ComputeService` instance, providing better performance.
|
|
This change updates the compute support library to load the VM
interference model via the OpenDC trace library, which provides a
generic interface for reading interference models associated with
workload traces.
|
|
This change contains a rewrite of the OpenDC web runner implementation,
which now supports terminating simulations when exceeding a deadline, as
well as executing multiple simulation jobs at the same time.
Furthermore, we have extracted the runner from the command line
interface, so that we can offer this functionality as a library in the
future.
|
|
This change updates the web runner implementation to use the new API
client introduced in the previous commit.
|
|
This change adds support for custom audience values in the web runner.
If the audience used by the user is different from the default value
(https://api.opendc.org/v2/), then the runner fails to obtain a valid
access token for the API.
|