summaryrefslogtreecommitdiff
path: root/opendc-trace/opendc-trace-parquet
AgeCommit message (Collapse)Author
2022-12-14fix(trace/wtf): Disable Parquet strict typingFabian Mastenbroek
This change fixes an issue where some of the traces from the Workflow Trace Archive would fail to load with the trace format in OpenDC. This was caused by one of the fields being stored as a double, while the formats expects it to be a long. Parquet does not support unioning primitive types. Therefore, we have to disable strict type checking when reading the file. Furthermore, we need to support double entries for storing the workflow ids.
2022-10-06build: Switch to Spotless for formattingFabian Mastenbroek
This change updates the build configuration to use Spotless for code formating of both Kotlin and Java.
2022-10-06style: Eliminate use of wildcard importsFabian Mastenbroek
This change updates the repository to remove the use of wildcard imports everywhere. Wildcard imports are not allowed by default by Ktlint as well as Google's Java style guide.
2022-07-07build(trace/parquet): Ignore reload4j dependencyFabian Mastenbroek
This change updates the build configuration to ignore the reload4j dependency that was recently added to the hadoop-common module. Reload4j replaces the old unmaintained log4j1 module. However, since we expose this module as a library, we do not want to include a logging implementation in the dependencies. Currently, there are already instances where this new dependency leads to duplicate logging implementations on the classpath.
2022-06-08test(trace): Add conformance suite for OpenDC trace APIFabian Mastenbroek
This change adds a re-usable test suite for the interface of the OpenDC trace API, so implementors can verify whether they match the specification of the interfaces.
2022-05-18refactor(web/runner): Move runner CLI into separate configurationFabian Mastenbroek
This change splits the command line interface from the OpenDC web runner into a separate configuration. We plan to re-use the runner code for a Quarkus extension that integrates the runner in development mode.
2022-05-06build(trace/parquet): Remove unnecessary dependenciesFabian Mastenbroek
This change removes several dependencies from the `opendc-trace-parquet` helper module, which are part of Hadoop Common, but are not actually used by the Parquet project.
2022-05-02refactor(trace/parquet): Drop dependency on AvroFabian Mastenbroek
This change updates the Parquet support library in OpenDC to not rely on Avro, but instead interface directly with Parquet's reading and writing functionality, providing less overhead.
2022-05-02perf(trace/opendc): Read records using low-level APIFabian Mastenbroek
This change updates the OpenDC VM format reader implementation to use the low-level record reading APIs provided by the `parquet-mr` library for improved performance. Previously, we used the `parquet-avro` library to read/write Avro records in Parquet format, but that library carries considerable overhead.
2022-05-01refactor(trace/parquet): Support custom ReadSupport implementationsFabian Mastenbroek
This change updates the `LocalParquetReader` implementation to support custom `ReadSupport` implementations, so we do not have to rely on the Avro implementation necessarily.
2022-04-23build: Enable testing for all library modulesFabian Mastenbroek
This change updates the Gradle build configuration to ensure that all library modules (that will be published) use testing and are included in coverage reports. This should ensure the public modules remain well tested.
2022-02-18build: Remove opendc-platform moduleFabian Mastenbroek
This change removes the opendc-platform module from the project. This module represented a Java platform which was previously used for sharing a set of dependency versions between subprojects. However, with the version catalogue that was added by Gradle, we currently do not use the platform anymore.
2021-09-19refactor(capelin): Extract common code out of Capelin experimentsFabian Mastenbroek
This change creates a new module for doing simulations with virtual machine workloads. We have found that a lot of code in the Capelin experiments code is being re-used by non-experiment modules.
2021-09-02refactor(trace): Extract Parquet helpers into separate moduleFabian Mastenbroek
This change extracts the Parquet helpers outside format module into a new module, in order to improve re-usability of these helpers.