summaryrefslogtreecommitdiff
path: root/opendc-compute/opendc-compute-simulator/src/main/java/org
diff options
context:
space:
mode:
authorNiels Thiele <noleu66@posteo.net>2025-06-22 12:31:21 +0200
committerGitHub <noreply@github.com>2025-06-22 12:31:21 +0200
commit0203254b709614fa732c114aa25916f61b8b3275 (patch)
tree63232140a8e60e16e1668a51eb58954d8609fbdc /opendc-compute/opendc-compute-simulator/src/main/java/org
parent8f846655347195bf6f22a4a102aa06f0ab127da1 (diff)
Implemented Single GPU Support & outline of host-level allocation policies (#342)
* renamed performance counter to distinguish different resource types * added GPU, modelled similar to CPU * added GPUs to machine model * list of GPUs instead of single instance * renamed memory speed to bandwidth * enabled parsing of GPU resources * split powermodel into cpu and GPU powermodel * added gpu parsing tests * added idea of host level scheduling * added tests for multi gpu parsing * renamed powermodel to cpupowermodel * clarified naming of cpu and gpu components * added resource type to flow suplier and edge * added resourcetype * added GPU components and resource type to fragments * added GPU to workload and updated resource usage retrieval * implemented first version of multi resource * added name to workload * renamed perfomance counters * removed commented out code * removed deprecated comments * included demand and supply into calculations * resolving rebase mismatches * moved resource type from flowedge class to common package * added available resources to machinees * cleaner separation if workload is started of simmachine or vm * Replaced exception with dedicated enum * Only looping over resources that are actually used * using hashmaps to handle resourcetype instead of arrays for readability * fixed condition * tracking finished workloads per resource type * removed resource type from flowedge * made supply and demand distribution resource specific * added power model for GPU * removed unused test setup * removed depracated comments * removed unused parameter * added ID for GPU * added GPUs and GPU performance counters (naively) * implemented capturing of GPU statistics * added reminders for future implementations * renamed properties for better identification * added capturing GPU statistics * implemented first tests for GPUs * unified access to performance counters * added interface for general compute resource handling * implemented multi resource support in simmachine * added individual edge to VM per resource * extended compute resource interface * implemented multi-resource support in PSU * implemented generic retrieval of computeresources * implemented mult-resource suppport in vm * made method use more resource specific * implemented simple GPU tests * rolled back frquency and demand use * made naming independent of used resource * using workloads resources instead of VMs to determine available resource * implemented determination of used resources in workload * removed logging statements * implemented reading from workload * fixed naming for host-level allocation * fixed next deadline calculation * fixed forwarding supply * reduced memory footprint * made GPU powermodel nullable * maded Gpu powermodel configurable in topology * implemented tests for basic gpu scheduler * added gpu properties * implemented weights, filter and simple cpu-gpu scheduler * spotless apply * spotless apply pt. 2 * fixed capitalization * spotless kotlin run * implemented coloumn export * todo update * removed code comments * Merged PerformanceCounter classes into one & removed interface * removed GPU specific powermodel * Rebase master: kept both versions of TopologyFactories * renamed CpuPowermodel to resource independent Powermodel Moved it from Cpu package to power package * implementated default of getResourceType & removed overrides if possible * split getResourceType into Consumer and Supplier * added power as resource type * reduced supply demand from arrayList to single value * combining GPUs into one large GPU, until full multi-gpu support * merged distribution policy enum with corresponding factory * added comment * post-rebase fixes * aligned naming * Added GPU metrics to task output * Updates power resource type to uppercase. Standardizes the `ResourceType.Power` enum to `ResourceType.POWER` for consistency with other resource types and improved readability. * Removes deprecated test assertions Removes commented-out assertions in GPU tests. These assertions are no longer needed and clutter the test code. * Renames MaxMinFairnessStrategy to Policy Renames MaxMinFairnessStrategy to MaxMinFairnessPolicy for clarity and consistency with naming conventions. This change affects the factory and distributor to use the updated name. * applies spotless * nulls GPUs as it is not used
Diffstat (limited to 'opendc-compute/opendc-compute-simulator/src/main/java/org')
-rw-r--r--opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/GpuHostModel.java33
-rw-r--r--opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/HostModel.java19
-rw-r--r--opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ComputeService.java10
-rw-r--r--opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/HostView.java11
-rw-r--r--opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ServiceFlavor.java24
-rw-r--r--opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/GuestGpuStats.java44
-rw-r--r--opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/HostGpuStats.java46
7 files changed, 171 insertions, 16 deletions
diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/GpuHostModel.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/GpuHostModel.java
new file mode 100644
index 00000000..97aaa820
--- /dev/null
+++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/GpuHostModel.java
@@ -0,0 +1,33 @@
+/*
+ * Copyright (c) 2022 AtLarge Research
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+package org.opendc.compute.simulator.host;
+
+/**
+ * A model for a GPU in a host.
+ *
+ * @param gpuCoreCapacity The capacity of the GPU cores hz.
+ * @param gpuCoreCount The number of GPU cores.
+ * @param GpuMemoryCapacity The capacity of the GPU memory in GB.
+ * @param GpuMemorySpeed The speed of the GPU memory in GB/s.
+ */
+public record GpuHostModel(double gpuCoreCapacity, int gpuCoreCount, long GpuMemoryCapacity, double GpuMemorySpeed) {}
diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/HostModel.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/HostModel.java
index 1ea73ea6..6464a56c 100644
--- a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/HostModel.java
+++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/host/HostModel.java
@@ -22,11 +22,24 @@
package org.opendc.compute.simulator.host;
+import java.util.List;
+
/**
* Record describing the static machine properties of the host.
*
- * @param cpuCapacity The total CPU capacity of the host in MHz.
- * @param coreCount The number of logical processing cores available for this host.
+ * @param cpuCapacity The total CPU capacity of the host in MHz.
+ * @param coreCount The number of logical processing cores available for this host.
* @param memoryCapacity The amount of memory available for this host in MB.
*/
-public record HostModel(double cpuCapacity, int coreCount, long memoryCapacity) {}
+public record HostModel(double cpuCapacity, int coreCount, long memoryCapacity, List<GpuHostModel> gpuHostModels) {
+ /**
+ * Create a new host model.
+ *
+ * @param cpuCapacity The total CPU capacity of the host in MHz.
+ * @param coreCount The number of logical processing cores available for this host.
+ * @param memoryCapacity The amount of memory available for this host in MB.
+ */
+ public HostModel(double cpuCapacity, int coreCount, long memoryCapacity) {
+ this(cpuCapacity, coreCount, memoryCapacity, null);
+ }
+}
diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ComputeService.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ComputeService.java
index 2b4306af..835c7186 100644
--- a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ComputeService.java
+++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ComputeService.java
@@ -198,7 +198,7 @@ public final class ComputeService implements AutoCloseable, CarbonReceiver {
HostView hv = hostToView.get(host);
final ServiceFlavor flavor = task.getFlavor();
if (hv != null) {
- hv.provisionedCores -= flavor.getCoreCount();
+ hv.provisionedCpuCores -= flavor.getCpuCoreCount();
hv.instanceCount--;
hv.availableMemory += flavor.getMemorySize();
} else {
@@ -496,7 +496,7 @@ public final class ComputeService implements AutoCloseable, CarbonReceiver {
if (result.getResultType() == SchedulingResultType.FAILURE) {
LOGGER.trace("Task {} selected for scheduling but no capacity available for it at the moment", task);
- if (flavor.getMemorySize() > maxMemory || flavor.getCoreCount() > maxCores) {
+ if (flavor.getMemorySize() > maxMemory || flavor.getCpuCoreCount() > maxCores) {
// Remove the incoming image
taskQueue.remove(req);
tasksPending--;
@@ -531,7 +531,7 @@ public final class ComputeService implements AutoCloseable, CarbonReceiver {
attemptsSuccess++;
hv.instanceCount++;
- hv.provisionedCores += flavor.getCoreCount();
+ hv.provisionedCpuCores += flavor.getCpuCoreCount();
hv.availableMemory -= flavor.getMemorySize();
activeTasks.put(task, host);
@@ -612,12 +612,12 @@ public final class ComputeService implements AutoCloseable, CarbonReceiver {
@NotNull
public ServiceFlavor newFlavor(
- @NotNull String name, int cpuCount, long memorySize, @NotNull Map<String, ?> meta) {
+ @NotNull String name, int cpuCount, long memorySize, int gpuCoreCount, @NotNull Map<String, ?> meta) {
checkOpen();
final ComputeService service = this.service;
UUID uid = new UUID(service.clock.millis(), service.random.nextLong());
- ServiceFlavor flavor = new ServiceFlavor(service, uid, name, cpuCount, memorySize, meta);
+ ServiceFlavor flavor = new ServiceFlavor(service, uid, name, cpuCount, memorySize, gpuCoreCount, meta);
// service.flavorById.put(uid, flavor);
// service.flavors.add(flavor);
diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/HostView.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/HostView.java
index 7c548add..c07f58c7 100644
--- a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/HostView.java
+++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/HostView.java
@@ -31,7 +31,8 @@ public class HostView {
private final SimHost host;
int instanceCount;
long availableMemory;
- int provisionedCores;
+ int provisionedCpuCores;
+ int provisionedGpuCores;
/**
* Scheduler bookkeeping
@@ -83,8 +84,12 @@ public class HostView {
/**
* Return the provisioned cores on the host.
*/
- public int getProvisionedCores() {
- return provisionedCores;
+ public int getProvisionedCpuCores() {
+ return provisionedCpuCores;
+ }
+
+ public int getProvisionedGpuCores() {
+ return provisionedGpuCores;
}
@Override
diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ServiceFlavor.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ServiceFlavor.java
index eddde87e..8a4359b4 100644
--- a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ServiceFlavor.java
+++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/service/ServiceFlavor.java
@@ -36,22 +36,31 @@ public final class ServiceFlavor implements Flavor {
private final ComputeService service;
private final UUID uid;
private final String name;
- private final int coreCount;
+ private final int cpuCoreCount;
private final long memorySize;
+ private final int gpuCoreCount;
private final Map<String, ?> meta;
- ServiceFlavor(ComputeService service, UUID uid, String name, int coreCount, long memorySize, Map<String, ?> meta) {
+ ServiceFlavor(
+ ComputeService service,
+ UUID uid,
+ String name,
+ int cpuCoreCount,
+ long memorySize,
+ int gpuCoreCount,
+ Map<String, ?> meta) {
this.service = service;
this.uid = uid;
this.name = name;
- this.coreCount = coreCount;
+ this.cpuCoreCount = cpuCoreCount;
this.memorySize = memorySize;
+ this.gpuCoreCount = gpuCoreCount;
this.meta = meta;
}
@Override
- public int getCoreCount() {
- return coreCount;
+ public int getCpuCoreCount() {
+ return cpuCoreCount;
}
@Override
@@ -59,6 +68,11 @@ public final class ServiceFlavor implements Flavor {
return memorySize;
}
+ @Override
+ public int getGpuCoreCount() {
+ return gpuCoreCount;
+ }
+
@NotNull
@Override
public UUID getUid() {
diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/GuestGpuStats.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/GuestGpuStats.java
new file mode 100644
index 00000000..1aba13e3
--- /dev/null
+++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/GuestGpuStats.java
@@ -0,0 +1,44 @@
+/*
+ * Copyright (c) 2022 AtLarge Research
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+package org.opendc.compute.simulator.telemetry;
+
+/**
+ * Statistics about the GPUs of a guest.
+ *
+ * @param activeTime The cumulative time (in seconds) that the GPUs of the guest were actively running.
+ * @param idleTime The cumulative time (in seconds) the GPUs of the guest were idle.
+ * @param stealTime The cumulative GPU time (in seconds) that the guest was ready to run, but not granted time by the host.
+ * @param lostTime The cumulative GPU time (in seconds) that was lost due to interference with other machines.
+ * @param capacity The available GPU capacity of the guest (in MHz).
+ * @param usage Amount of GPU resources (in MHz) actually used by the guest.
+ * @param utilization The utilization of the GPU resources (in %) relative to the total GPU capacity.
+ */
+public record GuestGpuStats(
+ long activeTime,
+ long idleTime,
+ long stealTime,
+ long lostTime,
+ double capacity,
+ double usage,
+ double demand,
+ double utilization) {}
diff --git a/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/HostGpuStats.java b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/HostGpuStats.java
new file mode 100644
index 00000000..e42d7704
--- /dev/null
+++ b/opendc-compute/opendc-compute-simulator/src/main/java/org/opendc/compute/simulator/telemetry/HostGpuStats.java
@@ -0,0 +1,46 @@
+/*
+ * Copyright (c) 2022 AtLarge Research
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+package org.opendc.compute.simulator.telemetry;
+
+/**
+ * Statistics about the GPUs of a host.
+ *
+ * @param activeTime The cumulative time (in seconds) that the GPUs of the host were actively running.
+ * @param idleTime The cumulative time (in seconds) the GPUs of the host were idle.
+ * @param stealTime The cumulative GPU time (in seconds) that virtual machines were ready to run, but were not able to.
+ * @param lostTime The cumulative GPU time (in seconds) that was lost due to interference between virtual machines.
+ * @param capacity The available GPU capacity of the host (in MHz).
+ * @param demand Amount of GPU resources (in MHz) the guests would use if there were no GPU contention or GPU
+ * limits.
+ * @param usage Amount of GPU resources (in MHz) actually used by the host.
+ * @param utilization The utilization of the GPU resources (in %) relative to the total GPU capacity.
+ */
+public record HostGpuStats(
+ long activeTime,
+ long idleTime,
+ long stealTime,
+ long lostTime,
+ double capacity,
+ double demand,
+ double usage,
+ double utilization) {}