opendc.git - The OpenDC repository.

Age	Commit message (Collapse)	Author
2022-05-06	build(trace/parquet): Remove unnecessary dependencies	Fabian Mastenbroek
	This change removes several dependencies from the `opendc-trace-parquet` helper module, which are part of Hadoop Common, but are not actually used by the Parquet project.
2022-05-02	perf(trace/calcite): Add support for projections	Fabian Mastenbroek
	This change adds support for projections in the Apache Calcite integration with OpenDC. This enables faster queries when only a subset of the table columns is selected.
2022-05-02	feat(trace/api): Add support for projecting tables	Fabian Mastenbroek
	This change adds support for projecting certain columns of a table. This enables faster reading for tables with high number of columns. Currently, we support projection in the Parquet-based workload formats. Other formats are text-based and will probably not benefit much from projection.
2022-05-02	refactor(trace/parquet): Drop dependency on Avro	Fabian Mastenbroek
	This change updates the Parquet support library in OpenDC to not rely on Avro, but instead interface directly with Parquet's reading and writing functionality, providing less overhead.
2022-05-02	refactor(trace/wtf): Do not use Avro when reading WTF trace	Fabian Mastenbroek
	This change updates the Workflow Trace format implementation in OpenDC to not use the `parquet-avro` library for exporting experiment data, but instead to use the low-level APIs to directly read the data from Parquet. This reduces the amount of conversions necessary before reaching the OpenDC trace API.
2022-05-02	perf(trace/opendc): Read records using low-level API	Fabian Mastenbroek
	This change updates the OpenDC VM format reader implementation to use the low-level record reading APIs provided by the `parquet-mr` library for improved performance. Previously, we used the `parquet-avro` library to read/write Avro records in Parquet format, but that library carries considerable overhead.
2022-05-01	refactor(trace/parquet): Support custom ReadSupport implementations	Fabian Mastenbroek
	This change updates the `LocalParquetReader` implementation to support custom `ReadSupport` implementations, so we do not have to rely on the Avro implementation necessarily.
2022-04-30	feat(trace/tools): Add support for querying traces using SQL	Fabian Mastenbroek
	This change adds a command line interface for querying workload traces using SQL. We provide a new command for the trace tools that can query a workload trace.
2022-04-30	feat(trace/calcite): Add support for writing via SQL	Fabian Mastenbroek
	This change updates the Apache Calcite integration to support writing workload traces via SQL. This enables custom conversion scripts between different workload traces.
2022-04-30	feat(trace/calcite): Add Calcite (SQL) integration	Fabian Mastenbroek
	This change adds support for querying workload trace formats implemented using the OpenDC API through Apache Calcite. This allows users to write SQL queries to explore the workload traces.
2022-04-24	build: Move modules into subgroups	Fabian Mastenbroek
	This change updates the Gradle build configuration of the project to publish the different type of modules (e.g., opendc-compute, opendc-simulator) into their own groups.
2022-04-23	build: Enable testing for all library modules	Fabian Mastenbroek
	This change updates the Gradle build configuration to ensure that all library modules (that will be published) use testing and are included in coverage reports. This should ensure the public modules remain well tested.
2022-04-22	refactor(trace/api): Move conventions into separate package	Fabian Mastenbroek
	This change moves the trace conventions (such as table and column names) in a separate conv package, so that it is separated from the main API. This also allows for a potential move into a separate module in the future.
2022-04-22	feat(trace/opendc): Incorporate interference model in trace format	Fabian Mastenbroek
	This change updates the OpenDC VM trace format to incorporate the VM interference model in the trace format itself. This makes sense since the model is tightly coupled to the actual trace that is being simulated. This approach has as benefit that we can directly load the interference model from the workload trace, without having to resolve the model seperately (as we did before).
2022-02-18	bug(trace): Adjust CPU capacity to number of vCPUs	Fabian Mastenbroek
	This change fixes an issue where the number of vCPUs was not taken into account when converting from CPU Usage percentage to MHz.
2022-02-18	build: Remove opendc-platform module	Fabian Mastenbroek
	This change removes the opendc-platform module from the project. This module represented a Java platform which was previously used for sharing a set of dependency versions between subprojects. However, with the version catalogue that was added by Gradle, we currently do not use the platform anymore.
2021-12-12	fix(trace): Read dependencies from .gwf trace file (#50)	Florian Gerlinghoff
	Tasks from a .gwf trace file did not have dependencies because this property was not assigned after being read in the GwfTaskTableReader. I removed the conversion from String to Long in parseParents because it seems like other readers (the Parquet reader in particular) return Strings as well, which is why they are converted to Long in line 75 of TraceHelpers.kt. Co-authored-by: Fabian Mastenbroek <mail.fabianm@gmail.com>
2021-11-02	refactor(trace): Support gaps in trace data	Fabian Mastenbroek
	This change updates the implementation of the trace converter and SimTrace implementation to support cases where there is a gap between samples in the trace data. This change allows users to specify what to do in case samples are missing in the trace. The available options are specified in `SimTrace.FillMode`. Currently, we support either carrying the previous value forward or set the usage to zero.
2021-10-25	feat(trace): Support conversion from Azure trace format	Fabian Mastenbroek
	This change adds support for converting the Azure VM traces into the OpenDC trace format.
2021-10-25	feat(trace): Add column for CPU capacity in OpenDC format	Fabian Mastenbroek
	This change adds a new column to resource table of the OpenDC trace format for the CPU capacity provisioned for a virtual machine, so that this capacity can be assigned to the virtual machine during simulation.
2021-10-25	fix(trace): Fix timestamp retrieval for Azure trace	Fabian Mastenbroek
	This change addresses an issue where the timestamps in the Azure trace where not retrieved correctly from the files.
2021-10-25	refactor(trace): Support GZIP files in Azure trace	Fabian Mastenbroek
	This change updates the Azure VM trace format implementation to directly support loading a trace in GZIP format in order to prevent users having to decompress the trace files so they can be opened by OpenDC.
2021-09-21	feat(trace): Add support for writing traces	Fabian Mastenbroek
	This change adds a new API for writing traces in a trace format. Currently, writing is only supported by the OpenDC VM format, but over time the other formats will also have support for writing added.
2021-09-20	refactor(trace): Simplify TraceFormat SPI interface	Fabian Mastenbroek
	This change simplifies the TraceFormat SPI interface by reducing the number of interfaces that implementors need to implement to only TraceFormat.
2021-09-20	feat(trace): Add property for describing partition keys	Fabian Mastenbroek

2021-09-20	feat(trace): Support column lookup via index	Fabian Mastenbroek
	This change adds support for looking up the column value through the column index. This enables faster lookup when processing very large traces.
2021-09-20	refactor(trace): Unify columns of different tables	Fabian Mastenbroek
	This change unifies columns of different tables used by trace formats. This concretely means that instead of having columns specific per table (e.g., RESOURCE_ID and RESOURCE_STATE_ID), with this changes these columns are shared between the tables with a single definition (RESOURCE_ID).
2021-09-19	feat(trace): Add tool for converting workload traces	Fabian Mastenbroek
	This change adds an initial implementation to the trace library for converting between workload trace formats. Currently the tool supports only converting to the OpenDC VM trace format. However, in the future, we will add support for converting between other formats as well.
2021-09-19	feat(trace): Update OpenDC VM trace format	Fabian Mastenbroek
	This change optimizes the OpenDC VM trace format by removing unnecessary columns as well as optimizing the writer settings. The new implementation still supports reading the old trace format in case users run OpenDC with older workload traces.
2021-09-19	feat(trace): Add support for internal OpenDC VM trace format	Fabian Mastenbroek
	This change adds official support to the trace library for the internal VM trace format used by OpenDC for its experiments. This is a compact format that uses Parquet to store the virtual machine trace data in two Parquet files.
2021-09-19	feat(trace): Add support for Azure VM trace format	Fabian Mastenbroek
	This change adds support in the trace library for the Azure VM trace format.
2021-09-19	feat(trace): Add support for extended Bitbrains trace format	Fabian Mastenbroek
	This change adds support in the trace library for the extended Bitbrains format. This format is slightly different than the CSV format used by the original Bitbrains traces and contains more fields.
2021-09-19	refactor(capelin): Extract common code out of Capelin experiments	Fabian Mastenbroek
	This change creates a new module for doing simulations with virtual machine workloads. We have found that a lot of code in the Capelin experiments code is being re-used by non-experiment modules.
2021-09-12	feat(trace): Support dynamic resolving of trace formats	Fabian Mastenbroek
	This change enables users to open traces of various trace formats by dynamically specifying the format name. The trace API will use the service loader to resolve the available trace formats on the classpath.
2021-09-12	feat(trace): Add synthetic resource table for Bitbrains format	Fabian Mastenbroek
	This change adds a synthetic resource table for the Bitbrains format, which can be used to list the available partitions in the trace.
2021-09-12	refactor(trace): Add API for accessing available table columns	Fabian Mastenbroek
	This change adds a new API to the Table interface for accessing the table columns that the table supports. This does not necessarily mean that the column will have a value for every row, but that the table format has defined this particular column.
2021-09-11	feat(trace): Add support for WfCommons (WorkflowHub) traces	Fabian Mastenbroek
	This change adds support for reading WfCommons workflow traces in OpenDC. This functionality is available in the new `opendc-trace-wfformat` module.
2021-09-11	perf(trace): Keep reader state in own class	Fabian Mastenbroek
	This change removes the external class that holds the state of the reader and instead puts the state in the reader implementation. Maintaining a separate class for the state increases the complexity and has worse performance characteristics due to the bytecode produced by Kotlin for property accesses.
2021-09-10	feat(trace): Support Materna traces from GWA	Fabian Mastenbroek
	This change adds support for the Materna traces from the Grid Workload Trace Archive (GWA). These traces are very similar to the Bitbrains traces, so they share the same base implementation.
2021-09-02	perf(trace): Improve performance of column lookup	Fabian Mastenbroek

2021-09-02	refactor(capelin): Migrate trace reader to new trace API	Fabian Mastenbroek
	This change updates the trace reading classes in the Capelin experiment to use the new trace API in order to re-use many of the trace reading parts.
2021-09-02	refactor(trace): Implement trace API for WTF reader	Fabian Mastenbroek
	This change updates the WTF trace reader to support the new streaming trace API.
2021-09-02	refactor(trace): Implement trace API for SWF reader	Fabian Mastenbroek
	This change updates the SWF trace reader to support the new streaming trace API.
2021-09-02	refactor(trace): Move Bitbrains format into separate module	Fabian Mastenbroek
	This change moves Bitbrains trace support into a separate module and adds support for the new trace api.
2021-09-02	refactor(trace): Extract Parquet helpers into separate module	Fabian Mastenbroek
	This change extracts the Parquet helpers outside format module into a new module, in order to improve re-usability of these helpers.
2021-09-02	refactor(trace): Move GWF trace reader into separate module	Fabian Mastenbroek
	This change starts the process of moving the different trace formats into separate modules. This change in particular moves the GWF trace format into a new module, opendc-trace-gwf. Furthermore, this change also implements the trace API for the GWF module.
2021-09-01	feat(trace): Add API for trace reading	Fabian Mastenbroek
	This change introduces a new OpenDC API for reading various trace formats in a streaming manner.