summaryrefslogtreecommitdiff
path: root/opendc-trace
AgeCommit message (Collapse)Author
2024-09-05Sim trace update (#249)Dante Niewenhuis
* Started on reimplementing the SimTrace implementation * updated trace format. Fragments now do not have a deadline, but a duration. The Fragments are executed in order.
2024-08-27Renamed input files and internally server is changed to task (#246)Dante Niewenhuis
* Updated SimTrace to use a single ArrayDeque instead of three separate lists for deadline, cpuUsage, and coreCount * Renamed input files to tasks.parquet and fragments.parquet. Renamed server to task. OpenDC nows exports tasks.parquet instead of server.parquet
2024-08-22Refactored exporters. Allows output column selection in scenario (#241) (#241)Alessio Leonardo Tomei
2024-05-07Revamped failure models (#228)Dante Niewenhuis
2024-04-22Merged scenario and portfolio (#220)Radu Nicolae
* sync with the master branch * rebase * multimodel - simulation is currently run as many times as you can see a model * factory method - handles models without given params * removed redundant flags * modelType * flags removed * implemented output into a folder * multimodel ipynb setup - to be implemented and also ran as a python script, when the simulation occurs * towards a mutimodel python implementation - issue observed - the saved files have same data? * json parsing handles now lists for topology, workloads, allocaitonPolicies, powerModels * scenarioFile inputs lists, and creates multiple combinations of scenarios * multi-model prediction repaired, now we predict using multiple models * commit before removing powerModel from scenario * commit after removing powerModel from scenario * commit after removing powerModel from scenario (and actually running) * powermodels now can output their name and full name (with min and max) * now we can select where to output (seed or output folder) * input files - clear naming + output naming improved * minimal changes * all tests passing + json files from tests updated to the new json format * json files from topology now accept only one power model (instead of list) * json files from topology now accept only one power model (instead of list) * multi and single input from tests updated to match the format * tests passed locally * spotless applies * demo folder removed
2024-04-17Added support for carbon traces (#218)Dante Niewenhuis
* Started with the carbon trace implementation * Moved the carbon trace system to the proper folders
2024-04-16Revamped the trace system. All TraceFormat files are now in the api m… (#216)Dante Niewenhuis
* Revamped the trace system. All TraceFormat files are now in the api module. This fixes some problems with not being able to use types of traces * applied spotless
2024-03-05Updated package versions, updated web server tests. (#207)Dante Niewenhuis
* Updated all package versions including kotlin. Updated all web-server tests to run. * Changed the java version of the tests. OpenDC now only supports java 19. * small update * test update * new update * updated docker version to 19 * updated docker version to 19
2022-12-14fix(trace/wtf): Disable Parquet strict typingFabian Mastenbroek
This change fixes an issue where some of the traces from the Workflow Trace Archive would fail to load with the trace format in OpenDC. This was caused by one of the fields being stored as a double, while the formats expects it to be a long. Parquet does not support unioning primitive types. Therefore, we have to disable strict type checking when reading the file. Furthermore, we need to support double entries for storing the workflow ids.
2022-10-21fix: Add log4j-core dependencyFabian Mastenbroek
This change adds the log4j-core dependency to various modules of OpenDC using log4j2, to ensure logging keeps working. The upgrade to SLF4J 2.0 broke the Log4j2 functionality, since the log4j-core artifact is not automatically shipped with the SLF4J implementation.
2022-10-06build: Switch to Spotless for formattingFabian Mastenbroek
This change updates the build configuration to use Spotless for code formating of both Kotlin and Java.
2022-10-06style: Eliminate use of wildcard importsFabian Mastenbroek
This change updates the repository to remove the use of wildcard imports everywhere. Wildcard imports are not allowed by default by Ktlint as well as Google's Java style guide.
2022-07-29fix(trace/api): Do not cache trace formatsFabian Mastenbroek
This change updates the TraceFormat lookup algorithm to prevent caching the available trace format on first access. Since the result of ServiceLoader depends on the Thread's context ClassLoader, they may differ between different threads. Furthermore, ServiceLoader maintains its own thread-local cache, so we can instead utilize that cache and always use the results returned by it.
2022-07-07build(trace/parquet): Ignore reload4j dependencyFabian Mastenbroek
This change updates the build configuration to ignore the reload4j dependency that was recently added to the hadoop-common module. Reload4j replaces the old unmaintained log4j1 module. However, since we expose this module as a library, we do not want to include a logging implementation in the dependencies. Currently, there are already instances where this new dependency leads to duplicate logging implementations on the classpath.
2022-06-23build: Update simulator dependenciesFabian Mastenbroek
This change updates the simulator dependencies to the latest available version where possible.
2022-06-08test(trace): Add conformance suite for OpenDC trace APIFabian Mastenbroek
This change adds a re-usable test suite for the interface of the OpenDC trace API, so implementors can verify whether they match the specification of the interfaces.
2022-06-07perf(trace/azure): Add benchmarks for Azure trace formatFabian Mastenbroek
This change adds JMH benchmarks for the parsing logic of the Azure VM trace format in order to catch performance regressions.
2022-06-07perf(trace/opendc): Add benchmarks for odcvm trace formatFabian Mastenbroek
This change adds JMH benchmarks for the parsing logic of the OpenDC VM trace format in order to catch performance regressions.
2022-06-07refactor(trace/api): Introduce type system for trace APIFabian Mastenbroek
This change updates the trace API by introducing a limited type system for the table columns. Previously, the table columns could have any possible type representable by the JVM. With this change, we limit the available types to a small type system.
2022-05-18refactor(web/runner): Move runner CLI into separate configurationFabian Mastenbroek
This change splits the command line interface from the OpenDC web runner into a separate configuration. We plan to re-use the runner code for a Quarkus extension that integrates the runner in development mode.
2022-05-06build(trace/parquet): Remove unnecessary dependenciesFabian Mastenbroek
This change removes several dependencies from the `opendc-trace-parquet` helper module, which are part of Hadoop Common, but are not actually used by the Parquet project.
2022-05-02perf(trace/calcite): Add support for projectionsFabian Mastenbroek
This change adds support for projections in the Apache Calcite integration with OpenDC. This enables faster queries when only a subset of the table columns is selected.
2022-05-02feat(trace/api): Add support for projecting tablesFabian Mastenbroek
This change adds support for projecting certain columns of a table. This enables faster reading for tables with high number of columns. Currently, we support projection in the Parquet-based workload formats. Other formats are text-based and will probably not benefit much from projection.
2022-05-02refactor(trace/parquet): Drop dependency on AvroFabian Mastenbroek
This change updates the Parquet support library in OpenDC to not rely on Avro, but instead interface directly with Parquet's reading and writing functionality, providing less overhead.
2022-05-02refactor(trace/wtf): Do not use Avro when reading WTF traceFabian Mastenbroek
This change updates the Workflow Trace format implementation in OpenDC to not use the `parquet-avro` library for exporting experiment data, but instead to use the low-level APIs to directly read the data from Parquet. This reduces the amount of conversions necessary before reaching the OpenDC trace API.
2022-05-02perf(trace/opendc): Read records using low-level APIFabian Mastenbroek
This change updates the OpenDC VM format reader implementation to use the low-level record reading APIs provided by the `parquet-mr` library for improved performance. Previously, we used the `parquet-avro` library to read/write Avro records in Parquet format, but that library carries considerable overhead.
2022-05-01refactor(trace/parquet): Support custom ReadSupport implementationsFabian Mastenbroek
This change updates the `LocalParquetReader` implementation to support custom `ReadSupport` implementations, so we do not have to rely on the Avro implementation necessarily.
2022-04-30feat(trace/tools): Add support for querying traces using SQLFabian Mastenbroek
This change adds a command line interface for querying workload traces using SQL. We provide a new command for the trace tools that can query a workload trace.
2022-04-30feat(trace/calcite): Add support for writing via SQLFabian Mastenbroek
This change updates the Apache Calcite integration to support writing workload traces via SQL. This enables custom conversion scripts between different workload traces.
2022-04-30feat(trace/calcite): Add Calcite (SQL) integrationFabian Mastenbroek
This change adds support for querying workload trace formats implemented using the OpenDC API through Apache Calcite. This allows users to write SQL queries to explore the workload traces.
2022-04-24build: Move modules into subgroupsFabian Mastenbroek
This change updates the Gradle build configuration of the project to publish the different type of modules (e.g., opendc-compute, opendc-simulator) into their own groups.
2022-04-23build: Enable testing for all library modulesFabian Mastenbroek
This change updates the Gradle build configuration to ensure that all library modules (that will be published) use testing and are included in coverage reports. This should ensure the public modules remain well tested.
2022-04-22refactor(trace/api): Move conventions into separate packageFabian Mastenbroek
This change moves the trace conventions (such as table and column names) in a separate conv package, so that it is separated from the main API. This also allows for a potential move into a separate module in the future.
2022-04-22feat(trace/opendc): Incorporate interference model in trace formatFabian Mastenbroek
This change updates the OpenDC VM trace format to incorporate the VM interference model in the trace format itself. This makes sense since the model is tightly coupled to the actual trace that is being simulated. This approach has as benefit that we can directly load the interference model from the workload trace, without having to resolve the model seperately (as we did before).
2022-02-18bug(trace): Adjust CPU capacity to number of vCPUsFabian Mastenbroek
This change fixes an issue where the number of vCPUs was not taken into account when converting from CPU Usage percentage to MHz.
2022-02-18build: Remove opendc-platform moduleFabian Mastenbroek
This change removes the opendc-platform module from the project. This module represented a Java platform which was previously used for sharing a set of dependency versions between subprojects. However, with the version catalogue that was added by Gradle, we currently do not use the platform anymore.
2021-12-12fix(trace): Read dependencies from .gwf trace file (#50)Florian Gerlinghoff
Tasks from a .gwf trace file did not have dependencies because this property was not assigned after being read in the GwfTaskTableReader. I removed the conversion from String to Long in parseParents because it seems like other readers (the Parquet reader in particular) return Strings as well, which is why they are converted to Long in line 75 of TraceHelpers.kt. Co-authored-by: Fabian Mastenbroek <mail.fabianm@gmail.com>
2021-11-02refactor(trace): Support gaps in trace dataFabian Mastenbroek
This change updates the implementation of the trace converter and SimTrace implementation to support cases where there is a gap between samples in the trace data. This change allows users to specify what to do in case samples are missing in the trace. The available options are specified in `SimTrace.FillMode`. Currently, we support either carrying the previous value forward or set the usage to zero.
2021-10-25feat(trace): Support conversion from Azure trace formatFabian Mastenbroek
This change adds support for converting the Azure VM traces into the OpenDC trace format.
2021-10-25feat(trace): Add column for CPU capacity in OpenDC formatFabian Mastenbroek
This change adds a new column to resource table of the OpenDC trace format for the CPU capacity provisioned for a virtual machine, so that this capacity can be assigned to the virtual machine during simulation.
2021-10-25fix(trace): Fix timestamp retrieval for Azure traceFabian Mastenbroek
This change addresses an issue where the timestamps in the Azure trace where not retrieved correctly from the files.
2021-10-25refactor(trace): Support GZIP files in Azure traceFabian Mastenbroek
This change updates the Azure VM trace format implementation to directly support loading a trace in GZIP format in order to prevent users having to decompress the trace files so they can be opened by OpenDC.
2021-09-21feat(trace): Add support for writing tracesFabian Mastenbroek
This change adds a new API for writing traces in a trace format. Currently, writing is only supported by the OpenDC VM format, but over time the other formats will also have support for writing added.
2021-09-20refactor(trace): Simplify TraceFormat SPI interfaceFabian Mastenbroek
This change simplifies the TraceFormat SPI interface by reducing the number of interfaces that implementors need to implement to only TraceFormat.
2021-09-20feat(trace): Add property for describing partition keysFabian Mastenbroek
2021-09-20feat(trace): Support column lookup via indexFabian Mastenbroek
This change adds support for looking up the column value through the column index. This enables faster lookup when processing very large traces.
2021-09-20refactor(trace): Unify columns of different tablesFabian Mastenbroek
This change unifies columns of different tables used by trace formats. This concretely means that instead of having columns specific per table (e.g., RESOURCE_ID and RESOURCE_STATE_ID), with this changes these columns are shared between the tables with a single definition (RESOURCE_ID).
2021-09-19feat(trace): Add tool for converting workload tracesFabian Mastenbroek
This change adds an initial implementation to the trace library for converting between workload trace formats. Currently the tool supports only converting to the OpenDC VM trace format. However, in the future, we will add support for converting between other formats as well.
2021-09-19feat(trace): Update OpenDC VM trace formatFabian Mastenbroek
This change optimizes the OpenDC VM trace format by removing unnecessary columns as well as optimizing the writer settings. The new implementation still supports reading the old trace format in case users run OpenDC with older workload traces.
2021-09-19feat(trace): Add support for internal OpenDC VM trace formatFabian Mastenbroek
This change adds official support to the trace library for the internal VM trace format used by OpenDC for its experiments. This is a compact format that uses Parquet to store the virtual machine trace data in two Parquet files.