Exposed metrics
Metrics
Supported Salt event tags
Each Salt event having a tag in this list will update the metrics:
salt/job/<jid>/new
salt/job/<jid>/ret/<*>
salt/run/<jid>/new
salt/run/<jid>/ret/<*>
Metric | Labels | Description |
---|---|---|
salt_new_job_total |
function , state |
Total number of new jobs |
salt_expected_responses_total |
function , state |
Counter incremented by the number of targeted minion for each new job |
salt_function_responses_total |
function , state , success (opt: minion ) |
Total number of job responses by function, state and success |
salt_scheduled_job_return_total |
function , state , success (opt: minion ) |
Counter incremented each time a minion sends a scheduled job result |
salt_responses_total |
minion , success |
Total number of job responses including scheduled_job responses |
salt_function_status |
function , state , minion |
Last status of a job execution* |
salt_health_last_heartbeat |
minion |
Last heartbeat from minion in UNIX timestamp |
salt_health_minions_total |
Total number of registered minions |
* more details in the section below.
Labels details
The exporter exposes the label for both classic jobs and runners.
Prometheus label | Salt information |
---|---|
function |
execution module |
state |
state and state module |
minion |
minion sending the response |
success |
job status |
Function status
By default, a Salt highstate generates the following metric:
The value can be:
1
the last function/state execution wassuccessful
0
the last function/state execution hasfailed
You can find an example of Prometheus alerts that could be used here.
See the configuration page if you want to watch other functions/states, or if you want to disable this metric.
Minions health
The exporter is supporting "hearbeat"-ing detection from minions which can be used to monitor for non-responding/dead minions. Under the hood it depends on Salt's beacons.
To ensure that all required minions are reported (even if there is no heartbeat from them yet), exporter needs access to the PKI directory of the Salt Master (by default /etc/salt/pki/master
) where it watches for accepted minion's public keys (located under /etc/salt/pki/master/minions
).
On startup, all currently accepted minions are added with last heartbeat set to current time. From this point forward, exporter is using fsnotify to detect added or removed minions. This will ensure that once minion is added, it will be monitored for heartbeat and metric will be removed once minion is deleted from Salt master.
To use this functionality you'll need to add status
beacon to each minion. It doesn't mater what functions will returned or the period. Exporter will just detect such events (in the format salt/beacon/<minion id>/status
) and register the timestamp as last heartbeat.
Detecting dead minions
The most simple way is (e.g. no heartbeat in last hour):
NOTE: Above is assuming beacon interval is set to < 3600 seconds
How to estimate missing responses
Simple way:
More advanced:
sum by (instance, function, state) (
increase(salt_expected_responses_total{function=~"$function", state=~"$state"}[$__rate_interval])
)
- sum by (instance, function, state) (
increase(salt_function_responses_total{function=~"$function", state=~"$state"}[$__rate_interval])
)
Examples
Execution modules
# HELP salt_expected_responses_total Total number of expected minions responses
# TYPE salt_expected_responses_total counter
salt_expected_responses_total{function="cmd.run", state=""} 6
salt_expected_responses_total{function="test.ping", state=""} 6
# HELP salt_function_responses_total Total number of responses per function processed
# TYPE salt_function_responses_total counter
salt_function_responses_total{function="cmd.run",state="",success="true"} 6
salt_function_responses_total{function="test.ping",state="",success="true"} 6
# HELP salt_new_job_total Total number of new jobs processed
# TYPE salt_new_job_total counter
salt_new_job_total{function="cmd.run",state=""} 3
salt_new_job_total{function="test.ping",state=""} 3
# HELP salt_responses_total Total number of responses
# TYPE salt_responses_total counter
salt_responses_total{minion="local",success="true"} 6
salt_responses_total{minion="node1",success="true"} 6
# HELP salt_scheduled_job_return_total Total number of scheduled job responses
# TYPE salt_scheduled_job_return_total counter
salt_scheduled_job_return_total{function="cmd.run",minion="local",state="",success="true"} 2
States and state modules
States (state.sls/apply/highstate) and state module (state.single):
salt_expected_responses_total{function="state.apply",state="highstate"} 1
salt_expected_responses_total{function="state.highstate",state="highstate"} 2
salt_expected_responses_total{function="state.sls",state="test"} 1
salt_expected_responses_total{function="state.single",state="test.nop"} 3
salt_function_responses_total{function="state.apply",state="highstate",success="true"} 1
salt_function_responses_total{function="state.highstate",state="highstate",success="true"} 2
salt_function_responses_total{function="state.sls",state="test",success="true"} 1
salt_function_responses_total{function="state.single",state="test.nop",success="true"} 3
salt_function_status{minion="node1",function="state.highstate",state="highstate"} 1
salt_new_job_total{function="state.apply",state="highstate",success="false"} 1
salt_new_job_total{function="state.highstate",state="highstate",success="false"} 2
salt_new_job_total{function="state.sls",state="test",success="false"} 1
salt_new_job_total{function="state.single",state="test.nop",success="true"} 3
salt_scheduled_job_return_total{function="state.sls",minion="local",state="test",success="true"} 3