summaryrefslogtreecommitdiff
path: root/site/docs/documentation/Input/CheckpointModel.md
blob: 7c622ea0fd1a3a6bf94d0f77cb1a9986e125e4a9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Checkpointing is a technique to reduce the impact of machine failure. 
When using Checkpointing, tasks make periodical snapshots of their state.
If a task fails, it can be restarted from the last snapshot instead of starting from the beginning.

A user can define a checkpoint model using the following parameters:

| Variable                  | Type   | Required? | Default | Description                                                                                                          |
|---------------------------|--------|-----------|---------|----------------------------------------------------------------------------------------------------------------------|
| checkpointInterval        | Int64  | no        | 3600000 | The time between checkpoints in ms                                                                                   |
| checkpointDuration        | Int64  | no        | 300000  | The time to create a snapshot in ms                                                                                  |
| checkpointIntervalScaling | Double | no        | 1.0     | The scaling of the checkpointInterval after each successful checkpoint. The default of 1.0 means no scaling happens. |

### Example

```json
{
    "checkpointInterval": 3600000,
    "checkpointDuration": 300000,
    "checkpointIntervalScaling": 1.5
}
```

In this example, a snapshot is created every hour, and the snapshot creation takes 5 minutes.
The checkpointIntervalScaling is set to 1.5, which means that after each successful checkpoint, 
the interval between checkpoints will be increased by 50% (for example from 1 to 1.5 hours).