diff options
| author | Niels Thiele <noleu66@posteo.net> | 2025-06-22 12:31:21 +0200 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-06-22 12:31:21 +0200 |
| commit | 0203254b709614fa732c114aa25916f61b8b3275 (patch) | |
| tree | 63232140a8e60e16e1668a51eb58954d8609fbdc /opendc-compute/opendc-compute-simulator/src/main/java/org | |
| parent | 8f846655347195bf6f22a4a102aa06f0ab127da1 (diff) | |
Implemented Single GPU Support & outline of host-level allocation policies (#342)
* renamed performance counter to distinguish different resource types
* added GPU, modelled similar to CPU
* added GPUs to machine model
* list of GPUs instead of single instance
* renamed memory speed to bandwidth
* enabled parsing of GPU resources
* split powermodel into cpu and GPU powermodel
* added gpu parsing tests
* added idea of host level scheduling
* added tests for multi gpu parsing
* renamed powermodel to cpupowermodel
* clarified naming of cpu and gpu components
* added resource type to flow suplier and edge
* added resourcetype
* added GPU components and resource type to fragments
* added GPU to workload and updated resource usage retrieval
* implemented first version of multi resource
* added name to workload
* renamed perfomance counters
* removed commented out code
* removed deprecated comments
* included demand and supply into calculations
* resolving rebase mismatches
* moved resource type from flowedge class to common package
* added available resources to machinees
* cleaner separation if workload is started of simmachine or vm
* Replaced exception with dedicated enum
* Only looping over resources that are actually used
* using hashmaps to handle resourcetype instead of arrays for readability
* fixed condition
* tracking finished workloads per resource type
* removed resource type from flowedge
* made supply and demand distribution resource specific
* added power model for GPU
* removed unused test setup
* removed depracated comments
* removed unused parameter
* added ID for GPU
* added GPUs and GPU performance counters (naively)
* implemented capturing of GPU statistics
* added reminders for future implementations
* renamed properties for better identification
* added capturing GPU statistics
* implemented first tests for GPUs
* unified access to performance counters
* added interface for general compute resource handling
* implemented multi resource support in simmachine
* added individual edge to VM per resource
* extended compute resource interface
* implemented multi-resource support in PSU
* implemented generic retrieval of computeresources
* implemented mult-resource suppport in vm
* made method use more resource specific
* implemented simple GPU tests
* rolled back frquency and demand use
* made naming independent of used resource
* using workloads resources instead of VMs to determine available resource
* implemented determination of used resources in workload
* removed logging statements
* implemented reading from workload
* fixed naming for host-level allocation
* fixed next deadline calculation
* fixed forwarding supply
* reduced memory footprint
* made GPU powermodel nullable
* maded Gpu powermodel configurable in topology
* implemented tests for basic gpu scheduler
* added gpu properties
* implemented weights, filter and simple cpu-gpu scheduler
* spotless apply
* spotless apply pt. 2
* fixed capitalization
* spotless kotlin run
* implemented coloumn export
* todo update
* removed code comments
* Merged PerformanceCounter classes into one & removed interface
* removed GPU specific powermodel
* Rebase master: kept both versions of TopologyFactories
* renamed CpuPowermodel to resource independent Powermodel
Moved it from Cpu package to power package
* implementated default of getResourceType & removed overrides if possible
* split getResourceType into Consumer and Supplier
* added power as resource type
* reduced supply demand from arrayList to single value
* combining GPUs into one large GPU, until full multi-gpu support
* merged distribution policy enum with corresponding factory
* added comment
* post-rebase fixes
* aligned naming
* Added GPU metrics to task output
* Updates power resource type to uppercase.
Standardizes the `ResourceType.Power` enum to `ResourceType.POWER`
for consistency with other resource types and improved readability.
* Removes deprecated test assertions
Removes commented-out assertions in GPU tests.
These assertions are no longer needed and clutter the test code.
* Renames MaxMinFairnessStrategy to Policy
Renames MaxMinFairnessStrategy to MaxMinFairnessPolicy for
clarity and consistency with naming conventions. This change
affects the factory and distributor to use the updated name.
* applies spotless
* nulls GPUs as it is not used
Diffstat (limited to 'opendc-compute/opendc-compute-simulator/src/main/java/org')
7 files changed, 171 insertions, 16 deletions
diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/GpuHostModel.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/GpuHostModel.java new file mode 100644 index 00000000..97aaa820 --- /dev/null +++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/GpuHostModel.java @@ -0,0 +1,33 @@ +/* + * Copyright (c) 2022 AtLarge Research + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +package org.opendc.compute.simulator.host; + +/** + * A model for a GPU in a host. + * + * @param gpuCoreCapacity The capacity of the GPU cores hz. + * @param gpuCoreCount The number of GPU cores. + * @param GpuMemoryCapacity The capacity of the GPU memory in GB. + * @param GpuMemorySpeed The speed of the GPU memory in GB/s. + */ +public record GpuHostModel(double gpuCoreCapacity, int gpuCoreCount, long GpuMemoryCapacity, double GpuMemorySpeed) {} diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/HostModel.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/HostModel.java index 1ea73ea6..6464a56c 100644 --- a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/HostModel.java +++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/HostModel.java @@ -22,11 +22,24 @@ package org.opendc.compute.simulator.host; +import java.util.List; + /** * Record describing the static machine properties of the host. * - * @param cpuCapacity The total CPU capacity of the host in MHz. - * @param coreCount The number of logical processing cores available for this host. + * @param cpuCapacity The total CPU capacity of the host in MHz. + * @param coreCount The number of logical processing cores available for this host. * @param memoryCapacity The amount of memory available for this host in MB. */ -public record HostModel(double cpuCapacity, int coreCount, long memoryCapacity) {} +public record HostModel(double cpuCapacity, int coreCount, long memoryCapacity, List<GpuHostModel> gpuHostModels) { + /** + * Create a new host model. + * + * @param cpuCapacity The total CPU capacity of the host in MHz. + * @param coreCount The number of logical processing cores available for this host. + * @param memoryCapacity The amount of memory available for this host in MB. + */ + public HostModel(double cpuCapacity, int coreCount, long memoryCapacity) { + this(cpuCapacity, coreCount, memoryCapacity, null); + } +} diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ComputeService.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ComputeService.java index 2b4306af..835c7186 100644 --- a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ComputeService.java +++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ComputeService.java @@ -198,7 +198,7 @@ public final class ComputeService implements AutoCloseable, CarbonReceiver { HostView hv = hostToView.get(host); final ServiceFlavor flavor = task.getFlavor(); if (hv != null) { - hv.provisionedCores -= flavor.getCoreCount(); + hv.provisionedCpuCores -= flavor.getCpuCoreCount(); hv.instanceCount--; hv.availableMemory += flavor.getMemorySize(); } else { @@ -496,7 +496,7 @@ public final class ComputeService implements AutoCloseable, CarbonReceiver { if (result.getResultType() == SchedulingResultType.FAILURE) { LOGGER.trace("Task {} selected for scheduling but no capacity available for it at the moment", task); - if (flavor.getMemorySize() > maxMemory || flavor.getCoreCount() > maxCores) { + if (flavor.getMemorySize() > maxMemory || flavor.getCpuCoreCount() > maxCores) { // Remove the incoming image taskQueue.remove(req); tasksPending--; @@ -531,7 +531,7 @@ public final class ComputeService implements AutoCloseable, CarbonReceiver { attemptsSuccess++; hv.instanceCount++; - hv.provisionedCores += flavor.getCoreCount(); + hv.provisionedCpuCores += flavor.getCpuCoreCount(); hv.availableMemory -= flavor.getMemorySize(); activeTasks.put(task, host); @@ -612,12 +612,12 @@ public final class ComputeService implements AutoCloseable, CarbonReceiver { @NotNull public ServiceFlavor newFlavor( - @NotNull String name, int cpuCount, long memorySize, @NotNull Map<String, ?> meta) { + @NotNull String name, int cpuCount, long memorySize, int gpuCoreCount, @NotNull Map<String, ?> meta) { checkOpen(); final ComputeService service = this.service; UUID uid = new UUID(service.clock.millis(), service.random.nextLong()); - ServiceFlavor flavor = new ServiceFlavor(service, uid, name, cpuCount, memorySize, meta); + ServiceFlavor flavor = new ServiceFlavor(service, uid, name, cpuCount, memorySize, gpuCoreCount, meta); // service.flavorById.put(uid, flavor); // service.flavors.add(flavor); diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/HostView.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/HostView.java index 7c548add..c07f58c7 100644 --- a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/HostView.java +++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/HostView.java @@ -31,7 +31,8 @@ public class HostView { private final SimHost host; int instanceCount; long availableMemory; - int provisionedCores; + int provisionedCpuCores; + int provisionedGpuCores; /** * Scheduler bookkeeping @@ -83,8 +84,12 @@ public class HostView { /** * Return the provisioned cores on the host. */ - public int getProvisionedCores() { - return provisionedCores; + public int getProvisionedCpuCores() { + return provisionedCpuCores; + } + + public int getProvisionedGpuCores() { + return provisionedGpuCores; } @Override diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ServiceFlavor.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ServiceFlavor.java index eddde87e..8a4359b4 100644 --- a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ServiceFlavor.java +++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ServiceFlavor.java @@ -36,22 +36,31 @@ public final class ServiceFlavor implements Flavor { private final ComputeService service; private final UUID uid; private final String name; - private final int coreCount; + private final int cpuCoreCount; private final long memorySize; + private final int gpuCoreCount; private final Map<String, ?> meta; - ServiceFlavor(ComputeService service, UUID uid, String name, int coreCount, long memorySize, Map<String, ?> meta) { + ServiceFlavor( + ComputeService service, + UUID uid, + String name, + int cpuCoreCount, + long memorySize, + int gpuCoreCount, + Map<String, ?> meta) { this.service = service; this.uid = uid; this.name = name; - this.coreCount = coreCount; + this.cpuCoreCount = cpuCoreCount; this.memorySize = memorySize; + this.gpuCoreCount = gpuCoreCount; this.meta = meta; } @Override - public int getCoreCount() { - return coreCount; + public int getCpuCoreCount() { + return cpuCoreCount; } @Override @@ -59,6 +68,11 @@ public final class ServiceFlavor implements Flavor { return memorySize; } + @Override + public int getGpuCoreCount() { + return gpuCoreCount; + } + @NotNull @Override public UUID getUid() { diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/GuestGpuStats.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/GuestGpuStats.java new file mode 100644 index 00000000..1aba13e3 --- /dev/null +++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/GuestGpuStats.java @@ -0,0 +1,44 @@ +/* + * Copyright (c) 2022 AtLarge Research + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +package org.opendc.compute.simulator.telemetry; + +/** + * Statistics about the GPUs of a guest. + * + * @param activeTime The cumulative time (in seconds) that the GPUs of the guest were actively running. + * @param idleTime The cumulative time (in seconds) the GPUs of the guest were idle. + * @param stealTime The cumulative GPU time (in seconds) that the guest was ready to run, but not granted time by the host. + * @param lostTime The cumulative GPU time (in seconds) that was lost due to interference with other machines. + * @param capacity The available GPU capacity of the guest (in MHz). + * @param usage Amount of GPU resources (in MHz) actually used by the guest. + * @param utilization The utilization of the GPU resources (in %) relative to the total GPU capacity. + */ +public record GuestGpuStats( + long activeTime, + long idleTime, + long stealTime, + long lostTime, + double capacity, + double usage, + double demand, + double utilization) {} diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/HostGpuStats.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/HostGpuStats.java new file mode 100644 index 00000000..e42d7704 --- /dev/null +++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/HostGpuStats.java @@ -0,0 +1,46 @@ +/* + * Copyright (c) 2022 AtLarge Research + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +package org.opendc.compute.simulator.telemetry; + +/** + * Statistics about the GPUs of a host. + * + * @param activeTime The cumulative time (in seconds) that the GPUs of the host were actively running. + * @param idleTime The cumulative time (in seconds) the GPUs of the host were idle. + * @param stealTime The cumulative GPU time (in seconds) that virtual machines were ready to run, but were not able to. + * @param lostTime The cumulative GPU time (in seconds) that was lost due to interference between virtual machines. + * @param capacity The available GPU capacity of the host (in MHz). + * @param demand Amount of GPU resources (in MHz) the guests would use if there were no GPU contention or GPU + * limits. + * @param usage Amount of GPU resources (in MHz) actually used by the host. + * @param utilization The utilization of the GPU resources (in %) relative to the total GPU capacity. + */ +public record HostGpuStats( + long activeTime, + long idleTime, + long stealTime, + long lostTime, + double capacity, + double demand, + double usage, + double utilization) {} |
