summaryrefslogtreecommitdiff
path: root/site/docs/documentation/Input/FailureModels.md
blob: d62767f65469469291eb803b61f52b37ff5ad2aa (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
OpenDC provides three types of failure models: [Trace-based](#trace-based-failure-models), [Sample-based](#sample-based-failure-models), 
and [Prefab](#prefab-failure-models). 

All failure models have a similar structure containing three simple steps. 

1. The _interval_ time determines the time between two failures.
2. The _duration_ time determines how long a single failure takes.
3. The _intensity_ determines how many hosts are effected by a failure.

# Trace based failure models
Trace-based failure models are defined by a parquet file. This file defines the interval, duration, and intensity of 
several failures. The failures defined in the file are looped. A valid failure model file follows the format defined below:

| Metric            | Datatype   | Unit          | Summary                                    |
|-------------------|------------|---------------|--------------------------------------------|
| failure_interval  | int64      | milli seconds | The duration since the last failure        |
| failure_duration  | int64      | milli seconds | The duration of the failure                |
| failure_intensity | float64    | ratio         | The ratio of hosts effected by the failure |

## Schema
A trace-based failure model is specified by setting "type" to "trace-based".
After, the user can define the path to the failure trace using "pathToFile":
```json
{
    "type": "trace-based",
    "pathToFile": "path/to/your/failure_trace.parquet"
}
```

The "repeat" value can be set to false if the user does not want the failures to loop:
```json
{
    "type": "trace-based",
    "pathToFile": "path/to/your/failure_trace.parquet",
    "repeat": "false"
}
```

# Sample based failure models
Sample based failure models sample from three distributions to get the _interval_, _duration_, and _intensity_ of 
each failure. Sample-based failure models are effected by randomness and will thus create different results based 
on the provided seed. 

## Distributions
OpenDC supports eight different distributions based on java's [RealDistributions](https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/distribution/RealDistribution.html).
Because the different distributions require different variables, they have to be specified with a specific "type".

#### [ConstantRealDistribution](https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/distribution/ConstantRealDistribution.html)
A distribution that always returns the same value. 

```json
{
    "type": "constant",
    "value": 10.0
}
```

#### [ExponentialDistribution](https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/distribution/ExponentialDistribution.html)
```json
{
    "type": "exponential",
    "mean": 1.5
}
```

#### [GammaDistribution](https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/distribution/GammaDistribution.html)
```json
{
    "type": "gamma",
    "shape": 1.0,
    "scale": 0.5
}
```
 
#### [LogNormalDistribution](https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/distribution/LogNormalDistribution.html)
```json
{
    "type": "log-normal",
    "scale": 1.0,
    "shape": 0.5
}
```

#### [NormalDistribution](https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/distribution/NormalDistribution.html)
```json
{
    "type": "normal",
    "mean": 1.0,
    "std": 0.5
}
```

#### [ParetoDistribution](https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/distribution/ParetoDistribution.html)
```json
{
    "type": "constant",
    "scale": 1.0,
    "shape": 0.6
}
```

#### [UniformRealDistribution](https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/distribution/UniformRealDistribution.html)
```json
{
    "type": "constant",
    "lower": 5.0,
    "upper": 10.0
}
```

#### [WeibullDistribution](https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/distribution/WeibullDistribution.html)
```json
{
    "type": "constant",
    "alpha": 0.5,
    "beta": 1.2
}
```

## Schema
A sample-based failure model is defined using three distributions for _intensity_, _duration_, and _intensity_.
Distributions can be mixed however the user wants. Note, values for _intensity_ and _duration_ are clamped to be positive. 
The _intensity_ is clamped to the range [0.0, 1.0).
To specify a sample-based failure model, the type needs to be set to "custom".

Example:
```json
{
    "type": "custom",
    "iatSampler": {
        "type": "exponential",
        "mean": 1.5
    },
    "durationSampler": {
        "type": "constant",
        "alpha": 0.5,
        "beta": 1.2
    },
    "nohSampler": {
        "type": "constant",
        "value": 0.5
    }
}
```

# Prefab failure models
The final type of failure models is the prefab models. These are models that are predefined in OpenDC and are based on 
research. Currently, OpenDC has 9 prefab models based on [The Failure Trace Archive: Enabling the comparison of failure measurements and models of distributed systems](https://www-sciencedirect-com.vu-nl.idm.oclc.org/science/article/pii/S0743731513000634) 
The figure below shows the values used to define the failure models.
![img.png](img.png)

Each failure model is defined four times, on for each of the four distribution. 
The final list of available prefabs is thus:

    G5k06Exp
    G5k06Wbl
    G5k06LogN
    G5k06Gam
    Lanl05Exp
    Lanl05Wbl
    Lanl05LogN
    Lanl05Gam
    Ldns04Exp
    Ldns04Wbl
    Ldns04LogN
    Ldns04Gam
    Microsoft99Exp
    Microsoft99Wbl
    Microsoft99LogN
    Microsoft99Gam
    Nd07cpuExp
    Nd07cpuWbl
    Nd07cpuLogN
    Nd07cpuGam
    Overnet03Exp
    Overnet03Wbl
    Overnet03LogN
    Overnet03Gam
    Pl05Exp
    Pl05Wbl
    Pl05LogN
    Pl05Gam
    Skype06Exp
    Skype06Wbl
    Skype06LogN
    Skype06Gam
    Websites02Exp
    Websites02Wbl
    Websites02LogN
    Websites02Gam

## Schema
To specify a prefab model, the "type" needs to be set to "prefab".
After, the prefab can be defined with "prefabName":

```json
{
    "type": "prefab",
    "prefabName": "G5k06Exp"
}
```