What is a Probe

Cloudprober runs probes, but what is a probe? A probe runs an operation, usually against a set of targets (e.g., your API servers), and looks for an expected outcome. Typically probes access your systems the same way as your customers, hence verifying systems’ availability and performance from consumers’ point of view. For example, an HTTP probe executes an HTTP request against a web server to verify that the web server is available. Cloudprober probes run repeatedly at a configured interval and export probe results as a set of metrics.

Example of an HTTP Probe checking the frontend and API availability.
 _____________                   _______________
|             |   HTTP Probe    |               |
| Cloudprober |  ------------>  |  Website/APIs |
|_____________|                 |_______________|

Here are some of the options used to configure a probe:

FieldDescription
typeProbe type, for example: HTTP, PING or UDP
nameProbe name. Each probe should have a unique name.
interval_msecHow often to run the probe (in milliseconds).
timeout_msecProbe timeout (in milliseconds).
targetsTargets to run probe against.
validatorProbe validators, further explained here.
<type>_probeProbe type specific configuration, e.g. http_probe

Please take a look at the ProbeDef protobuf for further details on various fields and options. All probe types export at least the following metrics:

MetricDescription
totalTotal probes run so far.
successNumber of successful probes. Deficit between total and success indicates failures.
latencyCumulative probe latency (by default in microseconds). You can get more insights into latency by using distributions.

Note that by default all metrics are cumulative, i.e. we export sum of all the values so far. Cumulative metrics have this nice property that you don’t lose historical information if you miss a metrics read cycle, but they also make certain calculations slightly more complicated (see below). To provide a choice to the user, Cloudprober provides an option to export metrics as gauge values. See modifying metrics for more details.

Example: In prometheus, you’ll do something like the following to compute success ratio and average latency from cumulative metrics.

success_ratio_1m = increase(success[1m]) / increase(total[1m])
average_latency_1m = increase(latency[1m]) / increase(success[1m])

Probe Types

Cloudprober has built-in support for the following probe types:

More probe types can be added through cloudprober extensions.

HTTP

Code | Config options

HTTP probe sends HTTP(s) requests to a target and verify that a response is received. Apart from the core probe metrics (total, success, and latency), HTTP probes also export a map of response code counts (resp_code). By default, requests are marked as successful as long as they succeed, regardless of the HTTP response code, but this behavior can be changed by using validators. For example, you can add a validator to require status code to in a certain range, or response body to match a regex, etc (validator example).

  • SSL Certificate Expiry: If the target serves an SSL Certificate, cloudprober will walk the certificate chain and export the earliest expiry time in seconds as a metric. The metric is named ssl_earliest_cert_expiry_sec, and will only be exported when the expiry time in seconds is a positive number.

External

Code | Config options

External probe type allows running arbitrary programs for probing. This is useful for running complex checks through Cloudprober. External probes are documented in much more detail here: external probe.

Ping

Code | Config options

Ping probe type implements a fast native ICMP ping prober, that can probe hundreds of targets in parallel. Probe results are reported as number of packets sent (total), received (success) and round-trip time (latency). It supports both, privileged and unprivileged (uses ICMP datagram socket) pings.

Note that ICMP datagram sockets are not enabled by default on most Linux systems. You can enable them by running the following command: sudo sysctl -w net.ipv4.ping_group_range="0 5000"

DNS

Code | Config options

As the name suggests, DNS probe sends a DNS request to the target. This is useful to verify that your DNS server, typically a critical component of the infrastructure e.g. kube-dns, is working as expected.

UDP

Code | Config options

UDP probe sends a UDP packet to the configured targets. UDP probe (and all other probes that use ports) provides more coverage for the network elements on the data path as most packet forwarding elements use 5-tuple hashing and using a new source port for each probe ensures that we hit different network element each time.

TCP

Code | Config options

TCP probe verifies that we can establish a TCP connection to the given target and port.