| Age | Commit message (Collapse) | Author |
|
(#342)
* renamed performance counter to distinguish different resource types
* added GPU, modelled similar to CPU
* added GPUs to machine model
* list of GPUs instead of single instance
* renamed memory speed to bandwidth
* enabled parsing of GPU resources
* split powermodel into cpu and GPU powermodel
* added gpu parsing tests
* added idea of host level scheduling
* added tests for multi gpu parsing
* renamed powermodel to cpupowermodel
* clarified naming of cpu and gpu components
* added resource type to flow suplier and edge
* added resourcetype
* added GPU components and resource type to fragments
* added GPU to workload and updated resource usage retrieval
* implemented first version of multi resource
* added name to workload
* renamed perfomance counters
* removed commented out code
* removed deprecated comments
* included demand and supply into calculations
* resolving rebase mismatches
* moved resource type from flowedge class to common package
* added available resources to machinees
* cleaner separation if workload is started of simmachine or vm
* Replaced exception with dedicated enum
* Only looping over resources that are actually used
* using hashmaps to handle resourcetype instead of arrays for readability
* fixed condition
* tracking finished workloads per resource type
* removed resource type from flowedge
* made supply and demand distribution resource specific
* added power model for GPU
* removed unused test setup
* removed depracated comments
* removed unused parameter
* added ID for GPU
* added GPUs and GPU performance counters (naively)
* implemented capturing of GPU statistics
* added reminders for future implementations
* renamed properties for better identification
* added capturing GPU statistics
* implemented first tests for GPUs
* unified access to performance counters
* added interface for general compute resource handling
* implemented multi resource support in simmachine
* added individual edge to VM per resource
* extended compute resource interface
* implemented multi-resource support in PSU
* implemented generic retrieval of computeresources
* implemented mult-resource suppport in vm
* made method use more resource specific
* implemented simple GPU tests
* rolled back frquency and demand use
* made naming independent of used resource
* using workloads resources instead of VMs to determine available resource
* implemented determination of used resources in workload
* removed logging statements
* implemented reading from workload
* fixed naming for host-level allocation
* fixed next deadline calculation
* fixed forwarding supply
* reduced memory footprint
* made GPU powermodel nullable
* maded Gpu powermodel configurable in topology
* implemented tests for basic gpu scheduler
* added gpu properties
* implemented weights, filter and simple cpu-gpu scheduler
* spotless apply
* spotless apply pt. 2
* fixed capitalization
* spotless kotlin run
* implemented coloumn export
* todo update
* removed code comments
* Merged PerformanceCounter classes into one & removed interface
* removed GPU specific powermodel
* Rebase master: kept both versions of TopologyFactories
* renamed CpuPowermodel to resource independent Powermodel
Moved it from Cpu package to power package
* implementated default of getResourceType & removed overrides if possible
* split getResourceType into Consumer and Supplier
* added power as resource type
* reduced supply demand from arrayList to single value
* combining GPUs into one large GPU, until full multi-gpu support
* merged distribution policy enum with corresponding factory
* added comment
* post-rebase fixes
* aligned naming
* Added GPU metrics to task output
* Updates power resource type to uppercase.
Standardizes the `ResourceType.Power` enum to `ResourceType.POWER`
for consistency with other resource types and improved readability.
* Removes deprecated test assertions
Removes commented-out assertions in GPU tests.
These assertions are no longer needed and clutter the test code.
* Renames MaxMinFairnessStrategy to Policy
Renames MaxMinFairnessStrategy to MaxMinFairnessPolicy for
clarity and consistency with naming conventions. This change
affects the factory and distributor to use the updated name.
* applies spotless
* nulls GPUs as it is not used
|
|
* Remove task from scheduler bookkeeping after failure
* Support carbon forecasting in timeshift
* Register scheduler and carbonmodel in context
* Preliminary working task stopping; carbon intensity bug
* Working carbon based stop. Two timeshift thresholds
* Add a pause state task and guest
* Move task stopper to allocation spec
* Start tracking num pauses
|
|
|
|
* Start time shifting
* Existing experiments work with new columns
* Remove unused traces dir
* Update java to 21 LTS and jacoco to be compatible
* Minimal working timeshifting
* Timeshift scheduler linked as carbon receiver
* Add basic tests for timeshift scheduler
* Run spotless apply
* Modify tarce format tests to support new fields
* Change all mentions of java 19 to 21
* Add a deferAll option to workload to make all tasks deferrable
* Run spotless apply
* Copy traces from resources in web dockerfile
|
|
|
|
* Updated logging
* removed DoubleThresholdBatteryPolicy.java
|
|
* Added sampleFraction and submissionTime to the workloadSpec
* Removed commented code
|
|
new workload types are added (#294)
|
|
|
|
* Added power sources to OpenDC.
In the current form each Cluster has a single power source that is connected to all hosts in that cluster
* Added power sources to OpenDC.
In the current form each Cluster has a single power source that is connected to all hosts in that cluster
* Ran spotless Kotlin and Java
|
|
* Updated tests
Changed all floats into doubles to have consistency over the whole framework
Made a small update to the multiplexer to better push through supply and demand
Fixed small typo
Updated M3SA paths.
fixed merge conflicts
Removed unused components. Updated tests.
Improved checkpointing model
Improved model, started with SimPowerSource
implemented FailureModels and Checkpointing
First working version
midway commit
first update
All simulation are now run with a single CPU and single MemoryUnit. multi CPUs are combined into one. This is for performance and explainability.
* Updated test memory
|
|
* Removed unused components. Updated tests.
Improved checkpointing model
Improved model, started with SimPowerSource
implemented FailureModels and Checkpointing
First working version
midway commit
first update
All simulation are now run with a single CPU and single MemoryUnit. multi CPUs are combined into one. This is for performance and explainability.
* fixed merge conflicts
* Updated M3SA paths.
* Fixed small typo
|
|
CPUs are combined into one. This is for performance and explainability. (#255)
|
|
* Updated SimTrace to use a single ArrayDeque instead of three separate lists for deadline, cpuUsage, and coreCount
* Renamed input files to tasks.parquet and fragments.parquet. Renamed server to task. OpenDC nows exports tasks.parquet instead of server.parquet
|
|
|
|
into objects when the scenario is being executed by ScenarioRunner.kt (#227)
|
|
* Started with the carbon trace implementation
* Moved the carbon trace system to the proper folders
|
|
* Revamped the trace system. All TraceFormat files are now in the api module. This fixes some problems with not being able to use types of traces
* applied spotless
|
|
* Initial commit
* Implemented a new systems of defining and running scenarios / portfolios. Scenarios and Portfolios can now be defined using JSON files similar to topologies. This allows user to define experiments without changing any KotLin code.
* Ran spotlessApply
|
|
* Updated the topology format to JSON. Updated TopologyReader.kt to handle JSON filed. Added documentation for the new format.
* applied spotless kotlin
* small update
* Updated for spotless apply
* Updated for spotless apply
|
|
* Updated all package versions including kotlin. Updated all web-server tests to run.
* Changed the java version of the tests. OpenDC now only supports java 19.
* small update
* test update
* new update
* updated docker version to 19
* updated docker version to 19
|
|
* Updated metrics and parquet output
* fixed typos
|
|
* removed experiment-compute and integrated all components into opendc-compute
* updated workflow gradle file
* removed unneeded code
|
|
This change updates the CI pipeline so that Java 20 is being tested with
the latest Gradle RC, since Gradle 8.0 does not support it yet.
|
|
Docker Inc is sunsetting free team organizations for the Docker registry,
which our organization is one of. Instead, a paid subscription is now required
to maintain the organization.
Given our relatively small usage of the account, it makes more sense to start
publishing the container images on the GitHub Container Registry, since it is
free for open source projects and integrates well with GitHub Actions.
Fixes #141
|
|
This change replaces the use of `CoroutineContext` for passing the
`SimulationDispatcher` across the different modules of OpenDC by the
lightweight `Dispatcher` interface of the OpenDC common module.
|
|
This change updates the `SimulationScheduler` class to implement the
`Dispatcher` interface from the OpenDC Common module, so that OpenDC
modules only need to depend on the common module for dispatching future
task (possibly in simulation).
|
|
This change adds the log4j-core dependency to various modules of OpenDC
using log4j2, to ensure logging keeps working. The upgrade to SLF4J 2.0 broke
the Log4j2 functionality, since the log4j-core artifact is not
automatically shipped with the SLF4J implementation.
|
|
This change re-implements the OpenDC compute simulator framework using
the new flow2 framework for modelling multi-edge flow networks. The
re-implementation is written in Java and focusses on performance and
clean API surface.
|
|
This change fixes an issue with the OpenDC web runner where the default
job timeout was set to 10 ms instead of 10 minutes. For longer
simulations, this would cause the job to be terminated.
|
|
This change resolves an issue in the web runner where the finished VMs
would always be reported as zero.
|
|
This change updates the Quarkus-based web server to add support for
tracking and limiting the simulation minutes used by the user in order
to prevent misuse of shared resources.
|
|
This change updates the build configuration to use Spotless for code
formating of both Kotlin and Java.
|
|
This change updates the repository to remove the use of wildcard imports
everywhere. Wildcard imports are not allowed by default by Ktlint as
well as Google's Java style guide.
|
|
This change renames the method `runBlockingSimulation` to
`runSimulation` to put more emphasis on the simulation part of the
method. The blocking part is not that important, but this behavior is
still described in the method documentation.
|
|
This change updates the implementation of `SimulationDispatcher` to use
a (possibly user-provided) `SimulationScheduler` for managing the
execution of the simulation and future tasks.
|
|
This change removes the Topology interface from the
`opendc-experiments-compute` module, which was meant for provisioning
the experimental topology. Howerver, with the stateless `HostSpec`
class, it is not needed to resolve the topology everytime.
|
|
This change integrates the classes from the old
`opendc-compute-workload` module into the `opendc-experiments-compute`
module. This new module contains helper classes for setting up
experiments with the OpenDC compute service.
|
|
This change updates the OpenDC web runner to use the new
`opendc-experiments-base` module for setting up the experimental
environment and simulate the workload.
|
|
This change updates the interface of `ComputeService` to provide access
to the instances (servers) that have been registered with the compute
service. This allows metric collectors to query the metrics of the
servers that are currently running.
|
|
This change updates the `ComputeServiceHelper` class to provide the
failure model via a parameter to the `run` method instead of constructor
parameter. This separates the construction of the topology from the
simulation of the workload.
|
|
This change updates the virtual machine performance interference model
so that the interference domain can be constructed independently of the
interference profile. As a consequence, the construction of the topology
now does not depend anymore on the interference profile.
|
|
This change moves the Random dependency outside the interference model,
to allow the interference model to be completely immutable and passable
between different simulations.
|
|
This change introduces a new interface `JobManager` that is responsible
for communicating with the backend about the available jobs and updating
their status when the runner is simulating a job. This manager can be
injected into the `OpenDCRunner` class and allows users to provide
different sources for the jobs, not only the current REST API.
|
|
This change fixes an issue with the OpenDC web runner where it would
report NaN values for some of the metrics due to the topology being
empty. This in turn causes issues in the frontend.
|
|
This change updates the web runner implementation to gracefully exit the
current thread when interrupted.
|
|
This change updates the OpenDC web runner implementation to use the
correct context ClassLoader for simulation jobs running inside a
ForkJoinPool. By default, the ForkJoinPool will use the system class
loader which does not have access to the services needed by the web
runner.
|
|
This change splits the command line interface from the OpenDC web runner
into a separate configuration. We plan to re-use the runner code for a Quarkus
extension that integrates the runner in development mode.
|
|
This change updates the Dockerfile for the web runner to reduce the
number of build steps necessary to build the web runner. Previously, the
build would also include/build the web API which is not used in the
image.
|
|
This change removes the OpenTelemetry integration from the OpenDC
Compute modules. Previously, we chose to integrate OpenTelemetry to
provide a unified way to report metrics to the users.
Although this worked as expected, the overhead of the OpenTelemetry when
collecting metrics during simulation was considerable and lacked more
optimization opportunities (other than providing a separate API
implementation). Furthermore, since we were tied to OpenTelemetry's SDK
implementation, we experienced issues with throttling and registering
multiple instruments.
We will instead use another approach, where we expose the core metrics
in OpenDC via specialized interfaces (see the commits before) such that
access is fast and can be done without having to interface with
OpenTelemetry. In addition, we will provide an adapter to that is able
to forward these metrics to OpenTelemetry implementations, so we can
still integrate with the wider ecosystem.
|