| Age | Commit message (Collapse) | Author |
|
This change removes several dependencies from the `opendc-trace-parquet`
helper module, which are part of Hadoop Common, but are not actually
used by the Parquet project.
|
|
This change adds support for projections in the Apache Calcite
integration with OpenDC. This enables faster queries when only a subset
of the table columns is selected.
|
|
This change adds support for projecting certain columns of a table. This
enables faster reading for tables with high number of columns.
Currently, we support projection in the Parquet-based workload formats.
Other formats are text-based and will probably not benefit much from
projection.
|
|
This change updates the Parquet support library in OpenDC to not rely on
Avro, but instead interface directly with Parquet's reading and writing
functionality, providing less overhead.
|
|
This change updates the Workflow Trace format implementation in OpenDC to
not use the `parquet-avro` library for exporting experiment data, but
instead to use the low-level APIs to directly read the data from Parquet.
This reduces the amount of conversions necessary before reaching the
OpenDC trace API.
|
|
This change updates the OpenDC VM format reader implementation to use
the low-level record reading APIs provided by the `parquet-mr` library
for improved performance. Previously, we used the `parquet-avro` library
to read/write Avro records in Parquet format, but that library carries
considerable overhead.
|
|
This change updates the `LocalParquetReader` implementation to support
custom `ReadSupport` implementations, so we do not have to rely on the
Avro implementation necessarily.
|
|
This change adds a command line interface for querying workload traces
using SQL. We provide a new command for the trace tools that can query a
workload trace.
|
|
This change updates the Apache Calcite integration to support writing
workload traces via SQL. This enables custom conversion scripts between
different workload traces.
|
|
This change adds support for querying workload trace formats implemented
using the OpenDC API through Apache Calcite. This allows users to write
SQL queries to explore the workload traces.
|
|
This change updates the Gradle build configuration of the project to
publish the different type of modules (e.g., opendc-compute,
opendc-simulator) into their own groups.
|
|
This change updates the Gradle build configuration to ensure that all
library modules (that will be published) use testing and are included in
coverage reports. This should ensure the public modules remain well
tested.
|
|
This change moves the trace conventions (such as table and column names)
in a separate conv package, so that it is separated from the main API.
This also allows for a potential move into a separate module in the
future.
|
|
This change updates the OpenDC VM trace format to incorporate the VM
interference model in the trace format itself. This makes sense since
the model is tightly coupled to the actual trace that is being
simulated.
This approach has as benefit that we can directly load the
interference model from the workload trace, without having to resolve
the model seperately (as we did before).
|
|
This change fixes an issue where the number of vCPUs was not taken into
account when converting from CPU Usage percentage to MHz.
|
|
This change removes the opendc-platform module from the project. This
module represented a Java platform which was previously used for sharing
a set of dependency versions between subprojects. However, with the
version catalogue that was added by Gradle, we currently do not use the
platform anymore.
|
|
Tasks from a .gwf trace file did not have dependencies because this
property was not assigned after being read in the GwfTaskTableReader.
I removed the conversion from String to Long in parseParents because it
seems like other readers (the Parquet reader in particular) return
Strings as well, which is why they are converted to Long in line 75 of
TraceHelpers.kt.
Co-authored-by: Fabian Mastenbroek <mail.fabianm@gmail.com>
|
|
This change updates the implementation of the trace converter and
SimTrace implementation to support cases where there is a gap between
samples in the trace data.
This change allows users to specify what to do in case samples are
missing in the trace. The available options are specified in
`SimTrace.FillMode`. Currently, we support either carrying the previous
value forward or set the usage to zero.
|
|
This change adds support for converting the Azure VM traces into the
OpenDC trace format.
|
|
This change adds a new column to resource table of the OpenDC trace format for
the CPU capacity provisioned for a virtual machine, so that this
capacity can be assigned to the virtual machine during simulation.
|
|
This change addresses an issue where the timestamps in the Azure trace
where not retrieved correctly from the files.
|
|
This change updates the Azure VM trace format implementation to directly
support loading a trace in GZIP format in order to prevent users having
to decompress the trace files so they can be opened by OpenDC.
|
|
This change adds a new API for writing traces in a trace format.
Currently, writing is only supported by the OpenDC VM format, but over
time the other formats will also have support for writing added.
|
|
This change simplifies the TraceFormat SPI interface by reducing the
number of interfaces that implementors need to implement to only
TraceFormat.
|
|
|
|
This change adds support for looking up the column value through the
column index. This enables faster lookup when processing very large
traces.
|
|
This change unifies columns of different tables used by trace formats.
This concretely means that instead of having columns specific per table
(e.g., RESOURCE_ID and RESOURCE_STATE_ID), with this changes these
columns are shared between the tables with a single definition
(RESOURCE_ID).
|
|
This change adds an initial implementation to the trace library for
converting between workload trace formats. Currently the tool supports
only converting to the OpenDC VM trace format. However, in the future,
we will add support for converting between other formats as well.
|
|
This change optimizes the OpenDC VM trace format by removing
unnecessary columns as well as optimizing the writer settings.
The new implementation still supports reading the old trace format in
case users run OpenDC with older workload traces.
|
|
This change adds official support to the trace library for the internal
VM trace format used by OpenDC for its experiments. This is a compact
format that uses Parquet to store the virtual machine trace data in two
Parquet files.
|
|
This change adds support in the trace library for the Azure VM trace
format.
|
|
This change adds support in the trace library for the extended Bitbrains
format. This format is slightly different than the CSV format used by
the original Bitbrains traces and contains more fields.
|
|
This change creates a new module for doing simulations with virtual
machine workloads. We have found that a lot of code in the Capelin
experiments code is being re-used by non-experiment modules.
|
|
This change enables users to open traces of various trace formats by
dynamically specifying the format name. The trace API will use the
service loader to resolve the available trace formats on the classpath.
|
|
This change adds a synthetic resource table for the Bitbrains format,
which can be used to list the available partitions in the trace.
|
|
This change adds a new API to the Table interface for accessing the
table columns that the table supports. This does not necessarily mean
that the column will have a value for every row, but that the table
format has defined this particular column.
|
|
This change adds support for reading WfCommons workflow traces in
OpenDC. This functionality is available in the new
`opendc-trace-wfformat` module.
|
|
This change removes the external class that holds the state of the
reader and instead puts the state in the reader implementation.
Maintaining a separate class for the state increases the complexity and
has worse performance characteristics due to the bytecode produced by
Kotlin for property accesses.
|
|
This change adds support for the Materna traces from the Grid Workload
Trace Archive (GWA). These traces are very similar to the Bitbrains
traces, so they share the same base implementation.
|
|
|
|
This change updates the trace reading classes in the Capelin experiment
to use the new trace API in order to re-use many of the trace reading
parts.
|
|
This change updates the WTF trace reader to support the new streaming
trace API.
|
|
This change updates the SWF trace reader to support the new streaming
trace API.
|
|
This change moves Bitbrains trace support into a separate module and
adds support for the new trace api.
|
|
This change extracts the Parquet helpers outside format module into a
new module, in order to improve re-usability of these helpers.
|
|
This change starts the process of moving the different trace formats into
separate modules. This change in particular moves the GWF trace format
into a new module, opendc-trace-gwf.
Furthermore, this change also implements the trace API for the GWF
module.
|
|
This change introduces a new OpenDC API for reading various trace
formats in a streaming manner.
|