diff options
| author | Niels Thiele <noleu66@posteo.net> | 2025-06-22 12:31:21 +0200 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-06-22 12:31:21 +0200 |
| commit | 0203254b709614fa732c114aa25916f61b8b3275 (patch) | |
| tree | 63232140a8e60e16e1668a51eb58954d8609fbdc /opendc-experiments/opendc-experiments-base | |
| parent | 8f846655347195bf6f22a4a102aa06f0ab127da1 (diff) | |
Implemented Single GPU Support & outline of host-level allocation policies (#342)
* renamed performance counter to distinguish different resource types
* added GPU, modelled similar to CPU
* added GPUs to machine model
* list of GPUs instead of single instance
* renamed memory speed to bandwidth
* enabled parsing of GPU resources
* split powermodel into cpu and GPU powermodel
* added gpu parsing tests
* added idea of host level scheduling
* added tests for multi gpu parsing
* renamed powermodel to cpupowermodel
* clarified naming of cpu and gpu components
* added resource type to flow suplier and edge
* added resourcetype
* added GPU components and resource type to fragments
* added GPU to workload and updated resource usage retrieval
* implemented first version of multi resource
* added name to workload
* renamed perfomance counters
* removed commented out code
* removed deprecated comments
* included demand and supply into calculations
* resolving rebase mismatches
* moved resource type from flowedge class to common package
* added available resources to machinees
* cleaner separation if workload is started of simmachine or vm
* Replaced exception with dedicated enum
* Only looping over resources that are actually used
* using hashmaps to handle resourcetype instead of arrays for readability
* fixed condition
* tracking finished workloads per resource type
* removed resource type from flowedge
* made supply and demand distribution resource specific
* added power model for GPU
* removed unused test setup
* removed depracated comments
* removed unused parameter
* added ID for GPU
* added GPUs and GPU performance counters (naively)
* implemented capturing of GPU statistics
* added reminders for future implementations
* renamed properties for better identification
* added capturing GPU statistics
* implemented first tests for GPUs
* unified access to performance counters
* added interface for general compute resource handling
* implemented multi resource support in simmachine
* added individual edge to VM per resource
* extended compute resource interface
* implemented multi-resource support in PSU
* implemented generic retrieval of computeresources
* implemented mult-resource suppport in vm
* made method use more resource specific
* implemented simple GPU tests
* rolled back frquency and demand use
* made naming independent of used resource
* using workloads resources instead of VMs to determine available resource
* implemented determination of used resources in workload
* removed logging statements
* implemented reading from workload
* fixed naming for host-level allocation
* fixed next deadline calculation
* fixed forwarding supply
* reduced memory footprint
* made GPU powermodel nullable
* maded Gpu powermodel configurable in topology
* implemented tests for basic gpu scheduler
* added gpu properties
* implemented weights, filter and simple cpu-gpu scheduler
* spotless apply
* spotless apply pt. 2
* fixed capitalization
* spotless kotlin run
* implemented coloumn export
* todo update
* removed code comments
* Merged PerformanceCounter classes into one & removed interface
* removed GPU specific powermodel
* Rebase master: kept both versions of TopologyFactories
* renamed CpuPowermodel to resource independent Powermodel
Moved it from Cpu package to power package
* implementated default of getResourceType & removed overrides if possible
* split getResourceType into Consumer and Supplier
* added power as resource type
* reduced supply demand from arrayList to single value
* combining GPUs into one large GPU, until full multi-gpu support
* merged distribution policy enum with corresponding factory
* added comment
* post-rebase fixes
* aligned naming
* Added GPU metrics to task output
* Updates power resource type to uppercase.
Standardizes the `ResourceType.Power` enum to `ResourceType.POWER`
for consistency with other resource types and improved readability.
* Removes deprecated test assertions
Removes commented-out assertions in GPU tests.
These assertions are no longer needed and clutter the test code.
* Renames MaxMinFairnessStrategy to Policy
Renames MaxMinFairnessStrategy to MaxMinFairnessPolicy for
clarity and consistency with naming conventions. This change
affects the factory and distributor to use the updated name.
* applies spotless
* nulls GPUs as it is not used
Diffstat (limited to 'opendc-experiments/opendc-experiments-base')
30 files changed, 1535 insertions, 46 deletions
diff --git a/opendc-experiments/opendc-experiments-base/src/main/kotlin/org/opendc/experiments/base/runner/ScenarioReplayer.kt b/opendc-experiments/opendc-experiments-base/src/main/kotlin/org/opendc/experiments/base/runner/ScenarioReplayer.kt index d56e4e4b..72042f3c 100644 --- a/opendc-experiments/opendc-experiments-base/src/main/kotlin/org/opendc/experiments/base/runner/ScenarioReplayer.kt +++ b/opendc-experiments/opendc-experiments-base/src/main/kotlin/org/opendc/experiments/base/runner/ScenarioReplayer.kt @@ -129,6 +129,15 @@ public suspend fun ComputeService.replay( TaskNature(false) } + val flavorMeta = mutableMapOf<String, Any>() + + if (entry.cpuCapacity > 0.0) { + flavorMeta["cpu-capacity"] = entry.cpuCapacity + } + if (entry.gpuCapacity > 0.0) { + flavorMeta["gpu-capacity"] = entry.gpuCapacity + } + launch { val task = client.newTask( @@ -140,7 +149,8 @@ public suspend fun ComputeService.replay( entry.name, entry.cpuCount, entry.memCapacity, - if (entry.cpuCapacity > 0.0) mapOf("cpu-capacity" to entry.cpuCapacity) else emptyMap(), + entry.gpuCount, + flavorMeta, ), workload, meta, diff --git a/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/ExperimentTest.kt b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/ExperimentTest.kt index d4729350..582fdbee 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/ExperimentTest.kt +++ b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/ExperimentTest.kt @@ -66,8 +66,8 @@ class ExperimentTest { assertAll( { assertEquals(10 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, - { assertEquals(((10 * 30000)).toLong(), monitor.hostIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, - { assertEquals((10 * 30000).toLong(), monitor.hostActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, + { assertEquals(((10 * 30000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, + { assertEquals((10 * 30000).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect host energy usage at timestamp 0" } }, { assertEquals(600 * 150.0, monitor.hostEnergyUsages["H01"]?.sum()) { "Incorrect host energy usage" } }, { assertEquals(600 * 150.0, monitor.energyUsages.sum()) { "Incorrect total energy usage" } }, @@ -117,8 +117,8 @@ class ExperimentTest { assertAll( { assertEquals(15 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, - { assertEquals(((10 * 30000)).toLong(), monitor.hostIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, - { assertEquals(((10 * 30000) + (5 * 60000)).toLong(), monitor.hostActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, + { assertEquals(((10 * 30000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, + { assertEquals(((10 * 30000) + (5 * 60000)).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect energy usage" } }, { assertEquals((600 * 150.0) + (300 * 200.0), monitor.hostEnergyUsages["H01"]?.sum()) { "Incorrect energy usage" } }, { assertEquals((600 * 150.0) + (300 * 200.0), monitor.energyUsages.sum()) { "Incorrect energy usage" } }, @@ -160,8 +160,8 @@ class ExperimentTest { assertAll( { assertEquals(10 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, - { assertEquals(((10 * 30000)).toLong(), monitor.hostIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, - { assertEquals(((10 * 30000)).toLong(), monitor.hostActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, + { assertEquals(((10 * 30000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, + { assertEquals(((10 * 30000)).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect energy usage" } }, { assertEquals((600 * 150.0), monitor.hostEnergyUsages["H01"]?.sum()) { "Incorrect energy usage" } }, { assertEquals((600 * 150.0), monitor.energyUsages.sum()) { "Incorrect energy usage" } }, @@ -204,8 +204,8 @@ class ExperimentTest { assertAll( { assertEquals(25 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, - { assertEquals(((10 * 30000) + (10 * 60000)).toLong(), monitor.hostIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, - { assertEquals(((10 * 30000) + (5 * 60000)).toLong(), monitor.hostActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, + { assertEquals(((10 * 30000) + (10 * 60000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, + { assertEquals(((10 * 30000) + (5 * 60000)).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect energy usage" } }, { assertEquals( @@ -215,4 +215,284 @@ class ExperimentTest { }, ) } + + /** + * Simulator test 5: One Task purely running on GPU + * + * In this test, a single task is scheduled that takes 10 minutes to run. It solely uses the GPU. + */ + @Test + fun testSimulator5() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 0.0, 0, 1000.0, 1), + ), + ), + ) + + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + + val monitor = runTest(topology, workload) + + assertAll( + { assertEquals(10 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, + { assertEquals(((10 * 60 * 1000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "CPU Idle time incorrect" } }, + { assertEquals(0L, monitor.hostCpuActiveTimes["H01"]?.sum()) { "CPU Active time incorrect" } }, + { + assertEquals( + ((10 * 30000)).toLong(), + monitor.hostGpuIdleTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Idle time incorrect" } + }, + { + assertEquals( + ((10 * 30000)).toLong(), + monitor.hostGpuActiveTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Active time incorrect" } + }, + // double, as CPU and GPU both use power + // higher power usage, as default GPU power model is used range [200, 400] + { assertEquals(2 * 12000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect host energy usage at timestamp 0" } }, + { assertEquals((600 * 100.0) + (600 * 300.0), monitor.hostEnergyUsages["H01"]?.sum()) { "Incorrect host energy usage" } }, + { assertEquals((600 * 100.0) + (600 * 300.0), monitor.energyUsages.sum()) { "Incorrect total energy usage" } }, + ) + } + + /** + * Simulator test 6: One Task running on CPU & GPU + * + * In this test, a single task is scheduled that takes 10 minutes to run. CPU & GPU are used and have the same runtime. + */ + @Test + fun testSimulator6() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 1000.0, 1), + ), + ), + ) + + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + + val monitor = runTest(topology, workload) + + assertAll( + { assertEquals(10 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, + { assertEquals(((10 * 30000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "CPU Idle time incorrect" } }, + { assertEquals(((10 * 30000)).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "CPU Active time incorrect" } }, + { + assertEquals( + ((10 * 30000)).toLong(), + monitor.hostGpuIdleTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Idle time incorrect" } + }, + { + assertEquals( + ((10 * 30000)).toLong(), + monitor.hostGpuActiveTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Active time incorrect" } + }, + // double, as CPU and GPU both use power + { assertEquals(27000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect host energy usage at timestamp 0" } }, + { assertEquals((600 * 150.0) + (600 * 300.0), monitor.hostEnergyUsages["H01"]?.sum()) { "Incorrect host energy usage" } }, + { assertEquals((600 * 150.0) + (600 * 300.0), monitor.energyUsages.sum()) { "Incorrect total energy usage" } }, + ) + } + + /** + * Simulator test 7: One Task running on CPU & GPU + * + * In this test, a single task is scheduled that takes 10 minutes to run. CPU & GPU are used. CPU will finish way ahead of the GPU. + */ + @Test + fun testSimulator7() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 2000.0, 1), + ), + ), + ) + + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + + val monitor = runTest(topology, workload) + assertAll( + { assertEquals(10 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, + { assertEquals(((10 * 30000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "CPU Idle time incorrect" } }, + { assertEquals(((10 * 30000)).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "CPU Active time incorrect" } }, + { + assertEquals( + 0L, + monitor.hostGpuIdleTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Idle time incorrect" } + }, + { + assertEquals( + ((10 * 60000)).toLong(), + monitor.hostGpuActiveTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Active time incorrect" } + }, + // double, as CPU and GPU both use power + { assertEquals(33000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect host energy usage at timestamp 0" } }, + { assertEquals((600 * 150.0) + (600 * 400.0), monitor.hostEnergyUsages["H01"]?.sum()) { "Incorrect host energy usage" } }, + { assertEquals((600 * 150.0) + (600 * 400.0), monitor.energyUsages.sum()) { "Incorrect total energy usage" } }, + ) + } + + /** + * Simulator test 8: One Task running on CPU & GPU + * + * In this test, a single task is scheduled that takes 10 minutes to run. CPU & GPU are used. GPU will finish way ahead of the CPU. + */ + @Test + fun testSimulator8() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 2000.0, 1, 1000.0, 1), + ), + ), + ) + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + val monitor = runTest(topology, workload) + + assertAll( + { assertEquals(10 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, + { assertEquals(0L, monitor.hostCpuIdleTimes["H01"]?.sum()) { "CPU Idle time incorrect" } }, + { assertEquals(((10 * 60000)).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "CPU Active time incorrect" } }, + { + assertEquals( + ((10 * 30000)).toLong(), + monitor.hostGpuIdleTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Idle time incorrect" } + }, + { + assertEquals( + ((10 * 30000)).toLong(), + monitor.hostGpuActiveTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Active time incorrect" } + }, + // double, as CPU and GPU both use power + { assertEquals(30000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect host energy usage at timestamp 0" } }, + { assertEquals((600 * 200.0) + (600 * 300.0), monitor.hostEnergyUsages["H01"]?.sum()) { "Incorrect host energy usage" } }, + { assertEquals((600 * 200.0) + (600 * 300.0), monitor.energyUsages.sum()) { "Incorrect total energy usage" } }, + ) + } + + /** + * Simulator test 9: Two tasks running on CPU & GPU + * + * In this test, two tasks are scheduled at the same time that takes 10 minutes to run. CPU & GPU are used. Both resources will finish at the same time. + */ + @Test + fun testSimulator9() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 1000.0, 1), + ), + ), + createTestTask( + name = "1", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 1000.0, 1), + ), + ), + ) + + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + val monitor = runTest(topology, workload) + + assertAll( + { assertEquals(2 * (10 * 60 * 1000), monitor.maxTimestamp) { "Total runtime incorrect" } }, + { assertEquals(((10 * 60000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "CPU Idle time incorrect" } }, + { assertEquals(((10 * 60000)).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "CPU Active time incorrect" } }, + { + assertEquals( + ((10 * 60000)).toLong(), + monitor.hostGpuIdleTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Idle time incorrect" } + }, + { + assertEquals( + ((10 * 60000)).toLong(), + monitor.hostGpuActiveTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Active time incorrect" } + }, + // double, as CPU and GPU both use power + { assertEquals(27000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect host energy usage at timestamp 0" } }, + { assertEquals(2 * ((600 * 150.0) + (600 * 300.0)), monitor.hostEnergyUsages["H01"]?.sum()) { "Incorrect host energy usage" } }, + { assertEquals(2 * ((600 * 150.0) + (600 * 300.0)), monitor.energyUsages.sum()) { "Incorrect total energy usage" } }, + ) + } + + /** + * Simulator test 10: Two tasks running on CPU & GPU + * + * In this test, two tasks are scheduled at the same time that takes 10 minutes to run. One task purely uses CPU, one purely GPU. + */ + @Test + fun testSimulator10() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 0.0, 0), + ), + ), + createTestTask( + name = "1", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 0.0, 0, 1000.0, 1), + ), + ), + ) + + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + val monitor = runTest(topology, workload) + + assertAll( + { assertEquals(10 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, + { assertEquals(((10 * 30000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "CPU Idle time incorrect" } }, + { assertEquals(((10 * 30000)).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "CPU Active time incorrect" } }, + { + assertEquals( + ((10 * 30000)).toLong(), + monitor.hostGpuIdleTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Idle time incorrect" } + }, + { + assertEquals( + ((10 * 30000)).toLong(), + monitor.hostGpuActiveTimes["H01"]?.fold(0, { acc, iterator -> acc + iterator[0] }), + ) { "GPU Active time incorrect" } + }, + // double, as CPU and GPU both use power + { assertEquals(27000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect host energy usage at timestamp 0" } }, + { assertEquals((600 * 150.0) + (600 * 300.0), monitor.hostEnergyUsages["H01"]?.sum()) { "Incorrect host energy usage" } }, + { assertEquals((600 * 150.0) + (600 * 300.0), monitor.energyUsages.sum()) { "Incorrect total energy usage" } }, + ) + } } diff --git a/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/FailuresAndCheckpointingTest.kt b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/FailuresAndCheckpointingTest.kt index df3a3c88..4278ca41 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/FailuresAndCheckpointingTest.kt +++ b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/FailuresAndCheckpointingTest.kt @@ -70,8 +70,8 @@ class FailuresAndCheckpointingTest { assertAll( { assertEquals(20 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, - { assertEquals(((15 * 30000) + (5 * 60000)).toLong(), monitor.hostIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, - { assertEquals((15 * 30000).toLong(), monitor.hostActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, + { assertEquals(((15 * 30000) + (5 * 60000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, + { assertEquals((15 * 30000).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect energy usage" } }, { assertEquals(6000.0, monitor.hostEnergyUsages["H01"]?.get(5)) { "Incorrect energy usage" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(10)) { "Incorrect energy usage" } }, @@ -110,8 +110,8 @@ class FailuresAndCheckpointingTest { assertAll( { assertEquals(10 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, - { assertEquals((10 * 30000).toLong(), monitor.hostIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, - { assertEquals((10 * 30000).toLong(), monitor.hostActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, + { assertEquals((10 * 30000).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, + { assertEquals((10 * 30000).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect energy usage" } }, { assertEquals((600 * 150.0), monitor.hostEnergyUsages["H01"]?.sum()) { "Incorrect energy usage" } }, ) @@ -153,8 +153,8 @@ class FailuresAndCheckpointingTest { assertAll( { assertEquals(37 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, - { assertEquals(((22 * 30000) + (15 * 60000)).toLong(), monitor.hostIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, - { assertEquals((22 * 30000).toLong(), monitor.hostActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, + { assertEquals(((22 * 30000) + (15 * 60000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, + { assertEquals((22 * 30000).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect energy usage" } }, { assertEquals(6000.0, monitor.hostEnergyUsages["H01"]?.get(5)) { "Incorrect energy usage" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(10)) { "Incorrect energy usage" } }, @@ -198,8 +198,8 @@ class FailuresAndCheckpointingTest { assertAll( { assertEquals(95 * 60000, monitor.maxTimestamp) { "Total runtime incorrect" } }, - { assertEquals(((50 * 60000) + (20 * 60000)).toLong(), monitor.hostIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, - { assertEquals((25 * 60000).toLong(), monitor.hostActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, + { assertEquals(((50 * 60000) + (20 * 60000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, + { assertEquals((25 * 60000).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect energy usage" } }, { assertEquals(6000.0, monitor.hostEnergyUsages["H01"]?.get(5)) { "Incorrect energy usage" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(10)) { "Incorrect energy usage" } }, diff --git a/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/FlowDistributorTest.kt b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/FlowDistributorTest.kt index 3d733360..7b7b23d2 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/FlowDistributorTest.kt +++ b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/FlowDistributorTest.kt @@ -557,4 +557,328 @@ class FlowDistributorTest { { assertEquals(1000 * 10 * 60 * 1000, monitor.maxTimestamp) { "The expected runtime is exceeded" } }, ) } + + /** + * FlowDistributor test 14: A single fitting GPU task + * In this test, a single task is scheduled that should fit the FlowDistributor + * We check if both the host and the Task show the correct cpu and gpu usage and demand during the two fragments. + */ + @Test + fun testFlowDistributor14() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 0.0, 0, 1000.0, 1), + TraceFragment(10 * 60 * 1000, 0.0, 0, 2000.0, 1), + ), + ), + ) + + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + + val monitor = runTest(topology, workload) + + assertAll( + // CPU + // task + { assertEquals(0.0, monitor.taskCpuDemands["0"]?.get(1)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuDemands["0"]?.get(10)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuSupplied["0"]?.get(1)) { "The cpu used by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuSupplied["0"]?.get(10)) { "The cpu used by task 0 is incorrect" } }, + // host + { assertEquals(0.0, monitor.hostCpuDemands["H01"]?.get(1)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostCpuDemands["H01"]?.get(10)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostCpuSupplied["H01"]?.get(1)) { "The cpu used by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostCpuSupplied["H01"]?.get(10)) { "The cpu used by the host is incorrect" } }, + // GPU + // task + { assertEquals(1000.0, monitor.taskGpuDemands["0"]?.get(1)?.get(0)) { "The gpu demanded by task 0 is incorrect" } }, + { assertEquals(2000.0, monitor.taskGpuDemands["0"]?.get(10)?.get(0)) { "The gpu demanded by task 0 is incorrect" } }, + { assertEquals(1000.0, monitor.taskGpuSupplied["0"]?.get(1)?.get(0)) { "The gpu used by task 0 is incorrect" } }, + { assertEquals(2000.0, monitor.taskGpuSupplied["0"]?.get(10)?.get(0)) { "The gpu used by task 0 is incorrect" } }, + // host + { assertEquals(1000.0, monitor.hostGpuDemands["H01"]?.get(1)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(2000.0, monitor.hostGpuDemands["H01"]?.get(10)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostGpuSupplied["H01"]?.get(1)?.get(0)) { "The gpu used by the host is incorrect" } }, + { assertEquals(2000.0, monitor.hostGpuSupplied["H01"]?.get(10)?.get(0)) { "The gpu used by the host is incorrect" } }, + ) + } + + /** + * FlowDistributor test 15: One Task running on CPU & GPU + * + * In this test, a single task is scheduled that takes 10 minutes to run. CPU & GPU are used and have the same runtime. + */ + @Test + fun testFlowDistributor15() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 1000.0, 1), + ), + ), + ) + + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + + val monitor = runTest(topology, workload) + + assertAll( + // CPU + // task + { assertEquals(1000.0, monitor.taskCpuDemands["0"]?.get(0)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuDemands["0"]?.get(9)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuSupplied["0"]?.get(0)) { "The cpu used by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuSupplied["0"]?.get(9)) { "The cpu used by task 0 is incorrect" } }, + // host + { assertEquals(1000.0, monitor.hostCpuDemands["H01"]?.get(1)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostCpuDemands["H01"]?.get(10)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostCpuSupplied["H01"]?.get(1)) { "The cpu used by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostCpuSupplied["H01"]?.get(10)) { "The cpu used by the host is incorrect" } }, + // GPU + // task + { assertEquals(1000.0, monitor.taskGpuDemands["0"]?.get(0)?.get(0)) { "The gpu demanded by task 0 is incorrect" } }, + { assert(monitor.taskGpuDemands["0"]?.get(9)?.isEmpty() ?: false) { "The gpu demanded by task 0 is incorrect" } }, + { assertEquals(1000.0, monitor.taskGpuSupplied["0"]?.get(0)?.get(0)) { "The gpu used by task 0 is incorrect" } }, + { assert(monitor.taskGpuSupplied["0"]?.get(9)?.isEmpty() ?: false) { "The gpu used by task 0 is incorrect" } }, + // host + { assertEquals(1000.0, monitor.hostGpuDemands["H01"]?.get(1)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostGpuSupplied["H01"]?.get(1)?.get(0)) { "The gpu used by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostGpuDemands["H01"]?.get(10)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostGpuSupplied["H01"]?.get(10)?.get(0)) { "The gpu used by the host is incorrect" } }, + ) + } + + /** + * FlowDistributor test 16: One Task running on CPU & GPU + * + * In this test, a single task is scheduled that takes 10 minutes to run. CPU & GPU are used. CPU will finish way ahead of the GPU. + */ + @Test + fun testFlowDistributor16() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 2000.0, 1), + ), + ), + ) + + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + + val monitor = runTest(topology, workload) + + assertAll( + // CPU + // task + { assertEquals(1000.0, monitor.taskCpuDemands["0"]?.get(0)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuDemands["0"]?.get(9)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuSupplied["0"]?.get(0)) { "The cpu used by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuSupplied["0"]?.get(9)) { "The cpu used by task 0 is incorrect" } }, + // host + { assertEquals(1000.0, monitor.hostCpuDemands["H01"]?.get(1)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostCpuDemands["H01"]?.get(10)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostCpuSupplied["H01"]?.get(1)) { "The cpu used by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostCpuSupplied["H01"]?.get(10)) { "The cpu used by the host is incorrect" } }, + // GPU + // task + { assertEquals(2000.0, monitor.taskGpuDemands["0"]?.get(0)?.get(0)) { "The gpu demanded by task 0 is incorrect" } }, + { assert(monitor.taskGpuDemands["0"]?.get(9)?.isEmpty() ?: false) { "The gpu demanded by task 0 is incorrect" } }, + { assertEquals(2000.0, monitor.taskGpuSupplied["0"]?.get(0)?.get(0)) { "The gpu used by task 0 is incorrect" } }, + { assert(monitor.taskGpuSupplied["0"]?.get(9)?.isEmpty() ?: false) { "The gpu used by task 0 is incorrect" } }, + // host + { assertEquals(2000.0, monitor.hostGpuDemands["H01"]?.get(1)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostGpuDemands["H01"]?.get(10)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(2000.0, monitor.hostGpuSupplied["H01"]?.get(1)?.get(0)) { "The gpu used by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostGpuSupplied["H01"]?.get(10)?.get(0)) { "The gpu used by the host is incorrect" } }, + ) + } + + /** + * FlowDistributor test 17: One Task running on CPU & GPU + * + * In this test, a single task is scheduled that takes 10 minutes to run. CPU & GPU are used. GPU will finish way ahead of the CPU. + */ + @Test + fun testFlowDistributor17() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 2000.0, 1, 1000.0, 1), + ), + ), + ) + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + val monitor = runTest(topology, workload) + + assertAll( + // CPU + // task + { assertEquals(2000.0, monitor.taskCpuDemands["0"]?.get(0)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuDemands["0"]?.get(9)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(2000.0, monitor.taskCpuSupplied["0"]?.get(0)) { "The cpu used by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuSupplied["0"]?.get(9)) { "The cpu used by task 0 is incorrect" } }, + // host + { assertEquals(2000.0, monitor.hostCpuDemands["H01"]?.get(1)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostCpuDemands["H01"]?.get(10)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(2000.0, monitor.hostCpuSupplied["H01"]?.get(1)) { "The cpu used by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostCpuSupplied["H01"]?.get(10)) { "The cpu used by the host is incorrect" } }, + // GPU + // task + { assertEquals(1000.0, monitor.taskGpuDemands["0"]?.get(1)?.get(0)) { "The gpu demanded by task 0 is incorrect" } }, + { assert(monitor.taskGpuDemands["0"]?.get(9)?.isEmpty() ?: false) { "The gpu demanded by task 0 is incorrect" } }, + { assertEquals(1000.0, monitor.taskGpuSupplied["0"]?.get(1)?.get(0)) { "The gpu used by task 0 is incorrect" } }, + { assert(monitor.taskGpuSupplied["0"]?.get(9)?.isEmpty() ?: false) { "The gpu used by task 0 is incorrect" } }, + // host + { assertEquals(1000.0, monitor.hostGpuDemands["H01"]?.get(1)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostGpuDemands["H01"]?.get(10)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostGpuSupplied["H01"]?.get(1)?.get(0)) { "The gpu used by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostGpuSupplied["H01"]?.get(10)?.get(0)) { "The gpu used by the host is incorrect" } }, + ) + } + + /** + * FlowDistributor test 18: Two tasks running on CPU & GPU + * + * In this test, two tasks are scheduled at the same time that takes 10 minutes to run. + * Only one can be scheduled due to resource constraints. + * CPU & GPU are used. Both resources will finish at the same time. + */ + @Test + fun testFlowDistributor18() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 1000.0, 1), + ), + ), + createTestTask( + name = "1", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 1000.0, 1), + ), + ), + ) + + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + val monitor = runTest(topology, workload) + assertAll( + // CPU + // task 0 + { assertEquals(1000.0, monitor.taskCpuDemands["0"]?.get(0)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuDemands["0"]?.get(9)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuSupplied["0"]?.get(0)) { "The cpu used by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuSupplied["0"]?.get(9)) { "The cpu used by task 0 is incorrect" } }, + // task 1 + { assertEquals(0.0, monitor.taskCpuDemands["1"]?.get(1)) { "The cpu demanded by task 1 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuDemands["1"]?.get(10)) { "The cpu demanded by task 1 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuDemands["1"]?.get(19)) { "The cpu demanded by task 1 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuSupplied["1"]?.get(1)) { "The cpu used by task 1 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuSupplied["1"]?.get(10)) { "The cpu used by task 1 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuSupplied["1"]?.get(19)) { "The cpu used by task 1 is incorrect" } }, + // host + { assertEquals(1000.0, monitor.hostCpuDemands["H01"]?.get(1)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostCpuDemands["H01"]?.get(10)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostCpuSupplied["H01"]?.get(1)) { "The cpu used by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostCpuSupplied["H01"]?.get(10)) { "The cpu used by the host is incorrect" } }, + // GPU + // task 0 + { assertEquals(1000.0, monitor.taskGpuDemands["0"]?.get(0)?.get(0)) { "The gpu demanded by task 0 is incorrect" } }, + { assert(monitor.taskGpuDemands["0"]?.get(9)?.isEmpty() ?: false) { "The gpu demanded by task 0 is incorrect" } }, + { assertEquals(1000.0, monitor.taskGpuSupplied["0"]?.get(0)?.get(0)) { "The gpu used by task 0 is incorrect" } }, + { assert(monitor.taskGpuSupplied["0"]?.get(9)?.isEmpty() ?: false) { "The gpu used by task 0 is incorrect" } }, + // task 1 + { assert(monitor.taskGpuDemands["1"]?.get(0)?.isEmpty() ?: false) { "The gpu demanded by task 1 is incorrect" } }, + { assertEquals(1000.0, monitor.taskGpuDemands["1"]?.get(10)?.get(0)) { "The gpu demanded by task 1 is incorrect" } }, + { assert(monitor.taskGpuDemands["1"]?.get(19)?.isEmpty() ?: false) { "The gpu demanded by task 1 is incorrect" } }, + { assert(monitor.taskGpuSupplied["1"]?.get(0)?.isEmpty() ?: false) { "The gpu used by task 1 is incorrect" } }, + { assertEquals(1000.0, monitor.taskGpuSupplied["1"]?.get(10)?.get(0)) { "The gpu used by task 1 is incorrect" } }, + { assert(monitor.taskGpuSupplied["1"]?.get(19)?.isEmpty() ?: false) { "The gpu used by task 1 is incorrect" } }, + // host + { assertEquals(1000.0, monitor.hostGpuDemands["H01"]?.get(1)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostGpuDemands["H01"]?.get(10)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostGpuSupplied["H01"]?.get(1)?.get(0)) { "The gpu used by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostGpuSupplied["H01"]?.get(10)?.get(0)) { "The gpu used by the host is incorrect" } }, + ) + } + + /** + * FlowDistributor test 19: Two tasks running on CPU & GPU + * + * In this test, two tasks are scheduled at the same time that takes 10 minutes to run. One task purely uses CPU, one purely GPU. + */ + @Test + fun testFlowDistributor19() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 0.0, 0), + ), + ), + createTestTask( + name = "1", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 0.0, 0, 1000.0, 1), + ), + ), + ) + + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + val monitor = runTest(topology, workload) + + assertAll( + // CPU + // task 0 + { assertEquals(1000.0, monitor.taskCpuDemands["0"]?.get(0)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuDemands["0"]?.get(9)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuSupplied["0"]?.get(0)) { "The cpu used by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuSupplied["0"]?.get(9)) { "The cpu used by task 0 is incorrect" } }, + // task 1 + { assertEquals(0.0, monitor.taskCpuDemands["1"]?.get(0)) { "The cpu demanded by task 1 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuDemands["1"]?.get(9)) { "The cpu demanded by task 1 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuSupplied["1"]?.get(0)) { "The cpu used by task 1 is incorrect" } }, + { assertEquals(0.0, monitor.taskCpuSupplied["1"]?.get(9)) { "The cpu used by task 1 is incorrect" } }, + // host + { assertEquals(1000.0, monitor.hostCpuDemands["H01"]?.get(1)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostCpuDemands["H01"]?.get(10)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostCpuSupplied["H01"]?.get(1)) { "The cpu used by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostCpuSupplied["H01"]?.get(10)) { "The cpu used by the host is incorrect" } }, + // GPU + // task 0 + { assertEquals(0.0, monitor.taskGpuDemands["0"]?.get(0)?.get(0)) { "The gpu demanded by task 0 is incorrect" } }, + { assert(monitor.taskGpuDemands["0"]?.get(9)?.isEmpty() ?: false) { "The gpu demanded by task 0 is incorrect" } }, + { assertEquals(0.0, monitor.taskGpuSupplied["0"]?.get(0)?.get(0)) { "The gpu used by task 0 is incorrect" } }, + { assert(monitor.taskGpuSupplied["0"]?.get(9)?.isEmpty() ?: false) { "The gpu used by task 0 is incorrect" } }, + // task 1 + { assertEquals(1000.0, monitor.taskGpuDemands["1"]?.get(0)?.get(0)) { "The gpu demanded by task 1 is incorrect" } }, + { assert(monitor.taskGpuDemands["1"]?.get(9)?.isEmpty() ?: false) { "The gpu demanded by task 1 is incorrect" } }, + { assertEquals(1000.0, monitor.taskGpuSupplied["1"]?.get(0)?.get(0)) { "The gpu used by task 1 is incorrect" } }, + { assert(monitor.taskGpuSupplied["1"]?.get(9)?.isEmpty() ?: false) { "The gpu used by task 1 is incorrect" } }, + // host + { assertEquals(1000.0, monitor.hostGpuDemands["H01"]?.get(1)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostGpuDemands["H01"]?.get(10)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(1000.0, monitor.hostGpuSupplied["H01"]?.get(1)?.get(0)) { "The gpu used by the host is incorrect" } }, + { assertEquals(0.0, monitor.hostGpuSupplied["H01"]?.get(10)?.get(0)) { "The gpu used by the host is incorrect" } }, + ) + } } diff --git a/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/GpuTest.kt b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/GpuTest.kt new file mode 100644 index 00000000..6e5a6b5e --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/GpuTest.kt @@ -0,0 +1,296 @@ +/* + * Copyright (c) 2020 AtLarge Research + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +package org.opendc.experiments.base + +import org.junit.jupiter.api.Assertions.assertEquals +import org.junit.jupiter.api.Test +import org.junit.jupiter.api.assertAll +import org.opendc.compute.topology.specs.ClusterSpec +import org.opendc.compute.workload.Task +import org.opendc.simulator.compute.workload.trace.TraceFragment +import java.util.ArrayList + +/** + * Testing suite containing tests that specifically test the FlowDistributor + */ +class GpuTest { + /** + * Test the creation of a GPU host with a single GPU, in minimal configuration + */ + @Test + fun testGpuHostCreationSingleMinimal() { + val topology = createTopology("Gpus/single_gpu_no_vendor_no_memory.json") + assertGpuConfiguration( + topology, + coreCount = 1, + coreSpeed = 2000.0, + memorySize = -1L, + memoryBandwidth = -1.0, + vendor = "unknown", + modelName = "unknown", + architecture = "unknown", + gpuCount = 1, + ) + } + + /** + * Test the creation of a GPU host with a single GPU with memory but no vendor + */ + @Test + fun testGpuHostCreationSingleWithMemoryNoVendor() { + val topology = createTopology("Gpus/single_gpu_no_vendor.json") + assertGpuConfiguration( + topology, + coreCount = 1, + coreSpeed = 2000.0, + memorySize = 4096L, + memoryBandwidth = 500.0, + vendor = "unknown", + modelName = "unknown", + architecture = "unknown", + gpuCount = 1, + ) + } + + /** + * Test the creation of a GPU host with a single GPU with no memory but with vendor + */ + @Test + fun testGpuHostCreationSingleNoMemoryWithVendor() { + val topology = createTopology("Gpus/single_gpu_no_memory.json") + assertGpuConfiguration( + topology, + coreCount = 1, + coreSpeed = 2000.0, + memorySize = -1L, + memoryBandwidth = -1.0, + vendor = "NVIDIA", + modelName = "Tesla V100", + architecture = "Volta", + gpuCount = 1, + ) + } + + /** + * Test the creation of a GPU host with a single GPU, in full configuration + */ + @Test + fun testGpuHostCreationSingleWithMemoryWithVendor() { + val topology = createTopology("Gpus/single_gpu_full.json") + assertGpuConfiguration( + topology, + // cuda cores + coreCount = 5120, +// coreCount = 640, // tensor cores + // fictional value + coreSpeed = 5000.0, + memorySize = 30517578125, + memoryBandwidth = 7031250000.0, + vendor = "NVIDIA", + modelName = "Tesla V100", + architecture = "Volta", + gpuCount = 1, + ) + } + + /** + * Test the creation of a GPU host with multiple GPU, in minimal configuration + */ + @Test + fun testGpuHostCreationMultiMinimal() { + val topology = createTopology("Gpus/multi_gpu_no_vendor_no_memory.json") + val count = 3 + assertGpuConfiguration( + topology, + coreCount = 1 * count, + coreSpeed = 2000.0, + memorySize = -1L * count, + memoryBandwidth = -1.0, + vendor = "unknown", + modelName = "unknown", + architecture = "unknown", + gpuCount = 1, + ) + } + + /** + * Test the creation of a GPU host with multiple GPU with memory but no vendor + */ + @Test + fun testGpuHostCreationMultiWithMemoryNoVendor() { + val topology = createTopology("Gpus/multi_gpu_no_vendor.json") + val count = 100 + + assertGpuConfiguration( + topology, + coreCount = 1 * count, + coreSpeed = 2000.0, + memorySize = 4096L * count, + memoryBandwidth = 500.0, + vendor = "unknown", + modelName = "unknown", + architecture = "unknown", + gpuCount = 1, + ) + } + + /** + * Test the creation of a GPU host with multiple GPU with no memory but with vendor + */ + @Test + fun testGpuHostCreationMultiNoMemoryWithVendor() { + val topology = createTopology("Gpus/multi_gpu_no_memory.json") + val count = 2 + assertGpuConfiguration( + topology, + coreCount = 1 * count, + coreSpeed = 2000.0, + memorySize = -1L * count, + memoryBandwidth = -1.0, + vendor = "NVIDIA", + modelName = "Tesla V100", + architecture = "Volta", + gpuCount = 1, + ) + } + + /** + * Test the creation of a GPU host with multiple GPU, in full configuration + */ + @Test + fun testGpuHostCreationMultiWithMemoryWithVendor() { + val topology = createTopology("Gpus/multi_gpu_full.json") + val count = 5 + assertGpuConfiguration( + topology, + // cuda cores + coreCount = 5120 * count, + // fictional value + coreSpeed = 5000.0, + memorySize = 30517578125 * count, + memoryBandwidth = 7031250000.0, + vendor = "NVIDIA", + modelName = "Tesla V100", + architecture = "Volta", + gpuCount = 1, + ) + } + + /** + * This test checks if the FlowDistributor can handle a workload that requires multiple GPUs. + * This test assumes that multiple GPUs are concatenated into on single larger GPU. + */ + @Test + fun testMultiGpuConcation() { + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 2000.0, 1), + ), + ), + createTestTask( + name = "1", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 2000.0, 1), + ), + ), + ) + val topology = createTopology("Gpus/multi_gpu_host.json") + + val monitor = runTest(topology, workload) + + assertAll( + { assertEquals(10 * 60 * 1000, monitor.maxTimestamp) { "The expected runtime is exceeded" } }, + // CPU + // task 0 + { assertEquals(1000.0, monitor.taskCpuDemands["0"]?.get(1)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuDemands["0"]?.get(8)) { "The cpu demanded by task 0 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuSupplied["0"]?.get(1)) { "The cpu used by task 0 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuSupplied["0"]?.get(8)) { "The cpu used by task 0 is incorrect" } }, + // task 1 + { assertEquals(1000.0, monitor.taskCpuDemands["1"]?.get(1)) { "The cpu demanded by task 1 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuDemands["1"]?.get(8)) { "The cpu demanded by task 1 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuSupplied["1"]?.get(1)) { "The cpu used by task 1 is incorrect" } }, + { assertEquals(1000.0, monitor.taskCpuSupplied["1"]?.get(8)) { "The cpu used by task 1 is incorrect" } }, + // host + { assertEquals(2000.0, monitor.hostCpuDemands["DualGpuHost"]?.get(1)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(2000.0, monitor.hostCpuDemands["DualGpuHost"]?.get(9)) { "The cpu demanded by the host is incorrect" } }, + { assertEquals(2000.0, monitor.hostCpuSupplied["DualGpuHost"]?.get(1)) { "The cpu used by the host is incorrect" } }, + { assertEquals(2000.0, monitor.hostCpuSupplied["DualGpuHost"]?.get(9)) { "The cpu used by the host is incorrect" } }, + // GPU + // task 0 + { assertEquals(2000.0, monitor.taskGpuDemands["0"]?.get(1)?.get(0)) { "The gpu demanded by task 0 is incorrect" } }, + { assertEquals(2000.0, monitor.taskGpuDemands["0"]?.get(8)?.get(0)) { "The gpu demanded by task 0 is incorrect" } }, + { assertEquals(2000.0, monitor.taskGpuSupplied["0"]?.get(1)?.get(0)) { "The gpu used by task 0 is incorrect" } }, + { assertEquals(2000.0, monitor.taskGpuSupplied["0"]?.get(8)?.get(0)) { "The gpu used by task 0 is incorrect" } }, + // task 1 + { assertEquals(2000.0, monitor.taskGpuDemands["1"]?.get(1)?.get(0)) { "The gpu demanded by task 1 is incorrect" } }, + { assertEquals(2000.0, monitor.taskGpuDemands["1"]?.get(8)?.get(0)) { "The gpu demanded by task 1 is incorrect" } }, + { assertEquals(2000.0, monitor.taskGpuSupplied["1"]?.get(1)?.get(0)) { "The gpu used by task 1 is incorrect" } }, + { assertEquals(2000.0, monitor.taskGpuSupplied["1"]?.get(8)?.get(0)) { "The gpu used by task 1 is incorrect" } }, + // host + { assertEquals(4000.0, monitor.hostGpuDemands["DualGpuHost"]?.get(1)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(4000.0, monitor.hostGpuDemands["DualGpuHost"]?.get(9)?.get(0)) { "The gpu demanded by the host is incorrect" } }, + { assertEquals(4000.0, monitor.hostGpuSupplied["DualGpuHost"]?.get(1)?.get(0)) { "The gpu used by the host is incorrect" } }, + { assertEquals(4000.0, monitor.hostGpuSupplied["DualGpuHost"]?.get(9)?.get(0)) { "The gpu used by the host is incorrect" } }, + ) + } + + private fun assertGpuConfiguration( + topology: List<ClusterSpec>, + coreCount: Int, + coreSpeed: Double, + memorySize: Long, + memoryBandwidth: Double, + vendor: String, + modelName: String, + architecture: String, + gpuCount: Int, + ) { + for (cluster in topology) { + for (host in cluster.hostSpecs) { + assert(host.model.gpuModels.size == gpuCount) { "GPU count should be $gpuCount, but is ${host.model.gpuModels.size}" } + + for (gpuModel in host.model.gpuModels) { + assert(gpuModel.coreCount == coreCount) { "GPU Core count should be $coreCount, but is ${gpuModel.coreCount}" } + assert(gpuModel.coreSpeed == coreSpeed) { "GPU core speed should be $coreSpeed, but is ${gpuModel.coreSpeed}" } + assert(gpuModel.memorySize == memorySize) { "GPU memory size should be $memorySize, but is ${gpuModel.memorySize}" } + assert(gpuModel.memoryBandwidth == memoryBandwidth) { + "GPU memory bandwidth should be $memoryBandwidth, but is ${gpuModel.memoryBandwidth}" + } + assert(gpuModel.vendor.contentEquals(vendor)) { "GPU vendor should be $vendor, but is ${gpuModel.vendor}" } + assert( + gpuModel.modelName.contentEquals(modelName), + ) { "GPU model name should be $modelName, but is ${gpuModel.modelName}" } + assert( + gpuModel.architecture.contentEquals(architecture), + ) { "GPU architecture should be $architecture, but is ${gpuModel.architecture}" } + } + } + } + } +} diff --git a/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/SchedulerTest.kt b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/SchedulerTest.kt index f9a20c68..8f71b7e7 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/SchedulerTest.kt +++ b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/SchedulerTest.kt @@ -25,10 +25,14 @@ package org.opendc.experiments.base import org.junit.jupiter.api.Assertions.assertEquals import org.junit.jupiter.api.Test import org.junit.jupiter.api.assertAll +import org.opendc.compute.simulator.scheduler.FilterScheduler import org.opendc.compute.simulator.scheduler.MemorizingScheduler import org.opendc.compute.simulator.scheduler.filters.ComputeFilter import org.opendc.compute.simulator.scheduler.filters.RamFilter import org.opendc.compute.simulator.scheduler.filters.VCpuFilter +import org.opendc.compute.simulator.scheduler.filters.VGpuFilter +import org.opendc.compute.simulator.scheduler.weights.VCpuWeigher +import org.opendc.compute.simulator.scheduler.weights.VGpuWeigher import org.opendc.compute.workload.Task import org.opendc.simulator.compute.workload.trace.TraceFragment import java.util.ArrayList @@ -65,8 +69,8 @@ class SchedulerTest { assertAll( { assertEquals(25 * 60 * 1000, monitor.maxTimestamp) { "Total runtime incorrect" } }, - { assertEquals(((10 * 30000) + (10 * 60000)).toLong(), monitor.hostIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, - { assertEquals(((10 * 30000) + (5 * 60000)).toLong(), monitor.hostActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, + { assertEquals(((10 * 30000) + (10 * 60000)).toLong(), monitor.hostCpuIdleTimes["H01"]?.sum()) { "Idle time incorrect" } }, + { assertEquals(((10 * 30000) + (5 * 60000)).toLong(), monitor.hostCpuActiveTimes["H01"]?.sum()) { "Active time incorrect" } }, { assertEquals(9000.0, monitor.hostEnergyUsages["H01"]?.get(0)) { "Incorrect energy usage" } }, { assertEquals( @@ -76,4 +80,109 @@ class SchedulerTest { }, ) } + + /** + * This test verifies that the gpu only schedulers are working correctly. + * The same workload is run 4 times, once with the normal gpu filter scheduler and once with the inverted gpu filter scheduler. + * Each scheduler is then run with a hardware configuration where the tasks fit onto one host, and one where multiple hosts are needed. + */ + @Test + fun testGpuAwareSchedulers() { + // Define workload with tasks requiring both CPU and GPU resources + val workload: ArrayList<Task> = + arrayListOf( + createTestTask( + name = "0", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 2000.0, 1), + ), + ), + createTestTask( + name = "1", + fragments = + arrayListOf( + TraceFragment(10 * 60 * 1000, 1000.0, 1, 2000.0, 1), + ), + submissionTime = "1970-01-01T00:20", + ), + ) + + // Topology with 1 host having 2 GPUs (both tasks can fit on one host) + val fittingTopology = createTopology("Gpus/dual_gpu_host.json") + + // Topology with 2 hosts each having 1 GPU (tasks must be distributed) + val nonFittingTopology = createTopology("Gpus/single_gpu_hosts.json") + + val cpuAllocationRatio = 1.0 + val ramAllocationRatio = 1.5 + val gpuAllocationRatio = 1.0 + + // Normal scheduler prioritizes hosts with more available resources + val normalScheduler = + FilterScheduler( + filters = + listOf( + ComputeFilter(), + VCpuFilter(cpuAllocationRatio), + VGpuFilter(gpuAllocationRatio), + RamFilter(ramAllocationRatio), + ), + weighers = listOf(VCpuWeigher(cpuAllocationRatio, multiplier = 1.0), VGpuWeigher(gpuAllocationRatio, multiplier = 1.0)), + ) + + // Inverted scheduler prioritizes hosts with fewer available resources + val invertedScheduler = + FilterScheduler( + filters = + listOf( + ComputeFilter(), + VCpuFilter(cpuAllocationRatio), + VGpuFilter(gpuAllocationRatio), + RamFilter(ramAllocationRatio), + ), + weighers = listOf(VCpuWeigher(cpuAllocationRatio, multiplier = -1.0), VGpuWeigher(gpuAllocationRatio, multiplier = -1.0)), + ) + + // Run the tests with both schedulers and both topologies + val normalFittingMonitor = runTest(fittingTopology, workload, computeScheduler = normalScheduler) + val normalNonFittingMonitor = runTest(nonFittingTopology, workload, computeScheduler = normalScheduler) + val invertedFittingMonitor = runTest(fittingTopology, workload, computeScheduler = invertedScheduler) + val invertedNonFittingMonitor = runTest(nonFittingTopology, workload, computeScheduler = invertedScheduler) + + assertAll( + // Normal scheduler with fitting topology should use just one host + { + assertEquals( + 1, + normalFittingMonitor.hostCpuSupplied.size, + ) { "Normal scheduler should place both tasks on a single host when possible" } + }, + // Normal scheduler with non-fitting topology must use two hosts + { + assertEquals( + 2, + normalNonFittingMonitor.hostCpuSupplied.size, + ) { "Normal scheduler should distribute tasks across hosts when needed" } + }, + // Inverted scheduler with fitting topology might still use one host or distribute depending on implementation + { + assert( + invertedFittingMonitor.hostCpuSupplied.isNotEmpty(), + ) { "Inverted scheduler should place tasks based on resource availability" } + }, + // Inverted scheduler with non-fitting topology must use two hosts + { + assertEquals( + 2, + invertedNonFittingMonitor.hostCpuSupplied.size, + ) { "Inverted scheduler should distribute tasks across hosts when needed" } + }, + // Verify GPU allocations - check that both tasks had their GPUs allocated + { assertEquals(2, normalFittingMonitor.taskGpuSupplied.size) { "Both tasks should have GPU allocations" } }, + { assertEquals(2, normalNonFittingMonitor.taskGpuSupplied.size) { "Both tasks should have GPU allocations" } }, + { assertEquals(2, invertedFittingMonitor.taskGpuSupplied.size) { "Both tasks should have GPU allocations" } }, + { assertEquals(2, invertedNonFittingMonitor.taskGpuSupplied.size) { "Both tasks should have GPU allocations" } }, + ) + } } diff --git a/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/TestingUtils.kt b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/TestingUtils.kt index eadb74e7..59b8d070 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/TestingUtils.kt +++ b/opendc-experiments/opendc-experiments-base/src/test/kotlin/org/opendc/experiments/base/TestingUtils.kt @@ -22,6 +22,7 @@ package org.opendc.experiments.base +import org.opendc.common.ResourceType import org.opendc.compute.simulator.provisioner.Provisioner import org.opendc.compute.simulator.provisioner.registerComputeMonitor import org.opendc.compute.simulator.provisioner.setupComputeService @@ -53,6 +54,7 @@ import java.time.LocalDateTime import java.time.ZoneOffset import java.util.UUID import kotlin.collections.ArrayList +import kotlin.compareTo /** * Obtain the topology factory for the test. @@ -73,12 +75,23 @@ fun createTestTask( checkpointIntervalScaling: Double = 1.0, scalingPolicy: ScalingPolicy = NoDelayScaling(), ): Task { + var usedResources = arrayOf<ResourceType>() + if (fragments.any { it.cpuUsage > 0.0 }) { + usedResources += ResourceType.CPU + } + if (fragments.any { it.gpuUsage > 0.0 }) { + usedResources += ResourceType.GPU + } + return Task( UUID.nameUUIDFromBytes(name.toByteArray()), name, - fragments.maxOf { it.coreCount }, + fragments.maxOf { it.cpuCoreCount() }, fragments.maxOf { it.cpuUsage }, memCapacity, + gpuCount = fragments.maxOfOrNull { it.gpuCoreCount() } ?: 0, + gpuCapacity = fragments.maxOfOrNull { it.gpuUsage } ?: 0.0, + gpuMemCapacity = fragments.maxOfOrNull { it.gpuMemoryUsage } ?: 0L, 1800000.0, LocalDateTime.parse(submissionTime).toInstant(ZoneOffset.UTC).toEpochMilli(), duration, @@ -91,6 +104,7 @@ fun createTestTask( checkpointIntervalScaling, scalingPolicy, name, + usedResources, ), ) } @@ -134,6 +148,8 @@ fun runTest( class TestComputeMonitor : ComputeMonitor { var taskCpuDemands = mutableMapOf<String, ArrayList<Double>>() var taskCpuSupplied = mutableMapOf<String, ArrayList<Double>>() + var taskGpuDemands = mutableMapOf<String, ArrayList<DoubleArray?>>() + var taskGpuSupplied = mutableMapOf<String, ArrayList<DoubleArray?>>() override fun record(reader: TaskTableReader) { val taskName: String = reader.taskInfo.name @@ -145,6 +161,13 @@ class TestComputeMonitor : ComputeMonitor { taskCpuDemands[taskName] = arrayListOf(reader.cpuDemand) taskCpuSupplied[taskName] = arrayListOf(reader.cpuUsage) } + if (taskName in taskGpuDemands) { + taskGpuDemands[taskName]?.add(reader.gpuDemands) + taskGpuSupplied[taskName]?.add(reader.gpuUsages) + } else { + taskGpuDemands[taskName] = arrayListOf(reader.gpuDemands) + taskGpuSupplied[taskName] = arrayListOf(reader.gpuUsages) + } } var attemptsSuccess = 0 @@ -174,13 +197,20 @@ class TestComputeMonitor : ComputeMonitor { maxTimestamp = reader.timestamp.toEpochMilli() } - var hostIdleTimes = mutableMapOf<String, ArrayList<Long>>() - var hostActiveTimes = mutableMapOf<String, ArrayList<Long>>() - var hostStealTimes = mutableMapOf<String, ArrayList<Long>>() - var hostLostTimes = mutableMapOf<String, ArrayList<Long>>() - var hostCpuDemands = mutableMapOf<String, ArrayList<Double>>() var hostCpuSupplied = mutableMapOf<String, ArrayList<Double>>() + var hostCpuIdleTimes = mutableMapOf<String, ArrayList<Long>>() + var hostCpuActiveTimes = mutableMapOf<String, ArrayList<Long>>() + var hostCpuStealTimes = mutableMapOf<String, ArrayList<Long>>() + var hostCpuLostTimes = mutableMapOf<String, ArrayList<Long>>() + + var hostGpuDemands = mutableMapOf<String, ArrayList<ArrayList<Double>>>() + var hostGpuSupplied = mutableMapOf<String, ArrayList<ArrayList<Double>>>() + var hostGpuIdleTimes = mutableMapOf<String, ArrayList<ArrayList<Long>>>() + var hostGpuActiveTimes = mutableMapOf<String, ArrayList<ArrayList<Long>>>() + var hostGpuStealTimes = mutableMapOf<String, ArrayList<ArrayList<Long>>>() + var hostGpuLostTimes = mutableMapOf<String, ArrayList<ArrayList<Long>>>() + var hostPowerDraws = mutableMapOf<String, ArrayList<Double>>() var hostEnergyUsages = mutableMapOf<String, ArrayList<Double>>() @@ -188,24 +218,39 @@ class TestComputeMonitor : ComputeMonitor { val hostName: String = reader.hostInfo.name if (!(hostName in hostCpuDemands)) { - hostIdleTimes[hostName] = ArrayList() - hostActiveTimes[hostName] = ArrayList() - hostStealTimes[hostName] = ArrayList() - hostLostTimes[hostName] = ArrayList() + hostCpuIdleTimes[hostName] = ArrayList() + hostCpuActiveTimes[hostName] = ArrayList() + hostCpuStealTimes[hostName] = ArrayList() + hostCpuLostTimes[hostName] = ArrayList() hostCpuDemands[hostName] = ArrayList() hostCpuSupplied[hostName] = ArrayList() hostPowerDraws[hostName] = ArrayList() hostEnergyUsages[hostName] = ArrayList() } - - hostIdleTimes[hostName]?.add(reader.cpuIdleTime) - hostActiveTimes[hostName]?.add(reader.cpuActiveTime) - hostStealTimes[hostName]?.add(reader.cpuStealTime) - hostLostTimes[hostName]?.add(reader.cpuLostTime) + if (hostName !in hostGpuDemands) { + hostGpuDemands[hostName] = ArrayList() + hostGpuSupplied[hostName] = ArrayList() + hostGpuIdleTimes[hostName] = ArrayList() + hostGpuActiveTimes[hostName] = ArrayList() + hostGpuStealTimes[hostName] = ArrayList() + hostGpuLostTimes[hostName] = ArrayList() + } hostCpuDemands[hostName]?.add(reader.cpuDemand) hostCpuSupplied[hostName]?.add(reader.cpuUsage) + hostCpuIdleTimes[hostName]?.add(reader.cpuIdleTime) + hostCpuActiveTimes[hostName]?.add(reader.cpuActiveTime) + hostCpuStealTimes[hostName]?.add(reader.cpuStealTime) + hostCpuLostTimes[hostName]?.add(reader.cpuLostTime) + + hostGpuDemands[hostName]?.add(reader.gpuDemands) + hostGpuSupplied[hostName]?.add(reader.gpuUsages) + hostGpuIdleTimes[hostName]?.add(reader.gpuIdleTimes) + hostGpuActiveTimes[hostName]?.add(reader.gpuActiveTimes) + hostGpuStealTimes[hostName]?.add(reader.gpuStealTimes) + hostGpuLostTimes[hostName]?.add(reader.gpuLostTimes) + hostPowerDraws[hostName]?.add(reader.powerDraw) hostEnergyUsages[hostName]?.add(reader.energyUsage) } diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/dual_gpu_host.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/dual_gpu_host.json new file mode 100644 index 00000000..c5271ff8 --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/dual_gpu_host.json @@ -0,0 +1,35 @@ +{ + "clusters": [ + { + "name": "C01", + "hosts": [ + { + "name": "DualGpuHost", + "cpu": { + "coreCount": 4, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": { + "coreCount": 2, + "coreSpeed": 2000 + }, + "gpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + } + } + ] + } + ] +} diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_full.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_full.json new file mode 100644 index 00000000..334100fc --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_full.json @@ -0,0 +1,39 @@ +{ + "clusters": + [ + { + "name": "C01", + "hosts" : + [ + { + "name": "H01", + "cpu": + { + "coreCount": 1, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": { + "count": 5, + "coreCount": 5120, + "coreSpeed": 5000, + "memorySize": 30517578125, + "memoryBandwidth": "900 GBps", + "vendor": "NVIDIA", + "modelName": "Tesla V100", + "architecture": "Volta" + } + } + ] + } + ] +} + diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_host.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_host.json new file mode 100644 index 00000000..719f0ab2 --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_host.json @@ -0,0 +1,36 @@ +{ + "clusters": [ + { + "name": "C01", + "hosts": [ + { + "name": "DualGpuHost", + "cpu": { + "coreCount": 4, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": { + "count": 2, + "coreCount": 1, + "coreSpeed": 2000 + }, + "gpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + } + } + ] + } + ] +} diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_no_memory.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_no_memory.json new file mode 100644 index 00000000..3757e641 --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_no_memory.json @@ -0,0 +1,36 @@ +{ + "clusters": + [ + { + "name": "C01", + "hosts" : + [ + { + "name": "H01", + "cpu": + { + "coreCount": 1, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": { + "count": 2, + "coreCount": 1, + "coreSpeed": 2000, + "vendor": "NVIDIA", + "modelName": "Tesla V100", + "architecture": "Volta" + } + } + ] + } + ] +} diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_no_vendor.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_no_vendor.json new file mode 100644 index 00000000..07aaac7c --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_no_vendor.json @@ -0,0 +1,36 @@ +{ + "clusters": + [ + { + "name": "C01", + "hosts" : + [ + { + "name": "H01", + "cpu": + { + "coreCount": 1, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": { + "count": 100, + "coreCount": 1, + "coreSpeed": 2000, + "memorySize": 4096, + "memoryBandwidth": 500 + } + } + ] + } + ] +} + diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_no_vendor_no_memory.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_no_vendor_no_memory.json new file mode 100644 index 00000000..3d036eef --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/multi_gpu_no_vendor_no_memory.json @@ -0,0 +1,34 @@ +{ + "clusters": + [ + { + "name": "C01", + "hosts" : + [ + { + "name": "H01", + "cpu": + { + "coreCount": 1, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": + { + "count": 3, + "coreCount": 1, + "coreSpeed": 2000 + } + } + ] + } + ] +} diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_full.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_full.json new file mode 100644 index 00000000..8e4c3546 --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_full.json @@ -0,0 +1,44 @@ +{ + "clusters": + [ + { + "name": "C01", + "hosts" : + [ + { + "name": "H01", + "cpu": + { + "coreCount": 1, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": + { + "coreCount": 5120, + "coreSpeed": 5000, + "memorySize": 30517578125, + "memoryBandwidth": "900 GBps", + "vendor": "NVIDIA", + "modelName": "Tesla V100", + "architecture": "Volta" + }, + "gpuPowerModel": { + "modelType": "linear", + "power": 800.0, + "idlePower": 300.0, + "maxPower": 600.0 + } + } + ] + } + ] +} diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_hosts.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_hosts.json new file mode 100644 index 00000000..44b83ef7 --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_hosts.json @@ -0,0 +1,61 @@ +{ + "clusters": [ + { + "name": "C01", + "hosts": [ + { + "name": "SingleGpuHost1", + "cpu": { + "coreCount": 2, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": { + "coreCount": 1, + "coreSpeed": 2000 + }, + "gpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + } + }, + { + "name": "SingleGpuHost2", + "cpu": { + "coreCount": 2, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": { + "coreCount": 1, + "coreSpeed": 2000 + }, + "gpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + } + } + ] + } + ] +} diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_no_memory.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_no_memory.json new file mode 100644 index 00000000..85be1e6e --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_no_memory.json @@ -0,0 +1,36 @@ +{ + "clusters": + [ + { + "name": "C01", + "hosts" : + [ + { + "name": "H01", + "cpu": + { + "coreCount": 1, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": + { + "coreCount": 1, + "coreSpeed": 2000, + "vendor": "NVIDIA", + "modelName": "Tesla V100", + "architecture": "Volta" + } + } + ] + } + ] +} diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_no_vendor.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_no_vendor.json new file mode 100644 index 00000000..b54fab75 --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_no_vendor.json @@ -0,0 +1,35 @@ +{ + "clusters": + [ + { + "name": "C01", + "hosts" : + [ + { + "name": "H01", + "cpu": + { + "coreCount": 1, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": + { + "coreCount": 1, + "coreSpeed": 2000, + "memorySize": 4096, + "memoryBandwidth": 500 + } + } + ] + } + ] +} diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_no_vendor_no_memory.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_no_vendor_no_memory.json new file mode 100644 index 00000000..ed01cf46 --- /dev/null +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/Gpus/single_gpu_no_vendor_no_memory.json @@ -0,0 +1,33 @@ +{ + "clusters": + [ + { + "name": "C01", + "hosts" : + [ + { + "name": "H01", + "cpu": + { + "coreCount": 1, + "coreSpeed": 2000 + }, + "memory": { + "memorySize": 140457600000 + }, + "cpuPowerModel": { + "modelType": "linear", + "power": 400.0, + "idlePower": 100.0, + "maxPower": 200.0 + }, + "gpu": + { + "coreCount": 1, + "coreSpeed": 2000 + } + } + ] + } + ] +} diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment1.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment1.json index 8835faeb..ad12a3e5 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment1.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment1.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment2.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment2.json index 8882af09..cbddf7f8 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment2.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment2.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment3.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment3.json index d78626f1..06a2163c 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment3.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment3.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment4.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment4.json index cb0ef4e5..c6e67b6b 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment4.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/batteries/experiment4.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000.json index ac9a3082..36a1efd7 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_BE.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_BE.json index 3a04b275..1eb20867 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_BE.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_BE.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_DE.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_DE.json index 651e8b54..d11ecc2f 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_DE.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_DE.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_FR.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_FR.json index fed097e9..ebec67e5 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_FR.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_FR.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_NL.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_NL.json index 05805c88..8f5ba1c6 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_NL.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_1_2000_NL.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_2_2000.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_2_2000.json index 24ab0bcd..e34e0256 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_2_2000.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_2_2000.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_50_big.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_50_big.json index 676d4f3d..47c633c9 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_50_big.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_50_big.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, diff --git a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_50_big_BE.json b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_50_big_BE.json index d2c19861..fe4e4813 100644 --- a/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_50_big_BE.json +++ b/opendc-experiments/opendc-experiments-base/src/test/resources/topologies/single_50_big_BE.json @@ -15,7 +15,7 @@ "memory": { "memorySize": 140457600000 }, - "powerModel": { + "cpuPowerModel": { "modelType": "linear", "power": 400.0, "idlePower": 100.0, |
