Getting Started

Installation

If you’ve Go installed, you can install cloudprober from source using the following command:

go install github.com/cloudprober/cloudprober/cmd/cloudprober@latest
Other Methods:
MethodInstructionsPlatform
Brewbrew install cloudproberMacOS, Linux
Docker Imagedocker run ghcr.io/cloudprober/cloudprober (other docker versions)Docker
Helm chartSee here for instructionsKubernetes
Pre-built binariesDownload from the releases page.MacOS, Linux, Windows

See this page for how to access unreleased binaries.

Configuration

Without any config, cloudprober will run only the “sysvars” module (no probes) and write metrics to stdout in cloudprober’s line protocol format (to be documented). It will also start a Prometheus exporter at: http://localhost:9313 (you can change the default port through the environment variable CLOUDPROBER_PORT and the default listening address through the environment variable CLOUDPROBER_HOST).

Since sysvars variables are not very interesting themselves, lets add a simple config that probes Google’s homepage:

# Write config to a file in /tmp
cat > /tmp/cloudprober.cfg <<EOF
probe {
  name: "google_homepage"
  type: HTTP
  targets {
    host_names: "www.google.com"
  }
  interval_msec: 5000  # 5s
  timeout_msec: 1000   # 1s
}
EOF

This config adds an HTTP probe that accesses the homepage of the target “www.google.com” every 5s with a timeout of 1s. Cloudprober configuration is specified in the text protobuf format, with config schema described by the proto file: config.proto.

Assuming that you saved this file at /tmp/cloudprober.cfg (following the command above), you can have cloudprober use this config file using the following command line:

./cloudprober --config_file /tmp/cloudprober.cfg

You can have the standard docker image use this config using the following command:

docker run -v /tmp/cloudprober.cfg:/etc/cloudprober.cfg \
    ghcr.io/cloudprober/cloudprober

Note: While running on GCE, cloudprober config can also be provided through a custom metadata attribute: cloudprober_config.

Verification

One quick way to verify that cloudprober got the correct config is to access the URL http://localhost:9313/config (through cURL or in browser). It returns the config that cloudprober is using. You can also look at its current status at the URL (replace localhost by the actual hostname if not running locally): http://localhost:9313/status.

You should be able to see the generated metrics at http://localhost:9313/metrics (prometheus format) and the stdout (cloudprober format):

cloudprober 15.. 1500590520 labels=ptype=http,probe=google-http,dst=.. total=17 success=17 latency=180835
cloudprober 15.. 1500590530 labels=ptype=sysvars,probe=sysvars hostname="manugarg-ws" uptime=100
cloudprober 15.. 1500590530 labels=ptype=http,probe=google-http,dst=.. total=19 success=19 latency=211644

This information is good for debugging monitoring issues, but to really make sense of this data, you’ll need to feed this data to another monitoring system like Prometheus or StackDriver (see Surfacers for more details). Lets set up a Prometheus and Grafana stack to make pretty graphs for us.

Running Prometheus

Download prometheus binary from its release page. You can use a config like the following to scrape a cloudprober instance running on the same host.

# Write config to a file in /tmp
cat > /tmp/prometheus.yml <<EOF
scrape_configs:
  - job_name: 'cloudprober'
    scrape_interval: 10s
    static_configs:
      - targets: ['localhost:9313']
EOF

# Start prometheus:
./prometheus --config.file=/tmp/prometheus.yml

Prometheus provides a web interface at http://localhost:9090. You can explore probe metrics and build useful graphs through this interface. All probes in cloudprober export at least 3 counters:

  • total: Total number of probes.
  • success: Number of successful probes. Difference between total and success indicates failures.
  • latency: Total (cumulative) probe latency.

Using these counters, probe failure ratio and average latency can be calculated as:

failure_ratio = (rate(total) - rate(success)) / rate(total)
avg_latency   = rate(latency) / rate(success)

Assuming that prometheus is running at localhost:9090, graphs depicting failure ratio and latency over time can be accessed in prometheus at: this url . Even though prometheus provides a graphing interface, Grafana provides much richer interface and has excellent support for prometheus.

Grafana

Grafana is a popular tool for building monitoring dashboards. Grafana has native support for prometheus and thanks to the excellent support for prometheus in Cloudprober itself, it’s a breeze to build Grafana dashboards from Cloudprober’s probe results.

To get started with Grafana, follow the Grafana-Prometheus integration guide.