We overhauled Snowflake's monitoring at the network level and built our new monitoring stack on Cloudprober, which has since been adopted company-wide.
Catch outages before
your users do.
Open-source active monitoring. Built at Google. Runs anywhere — from a Raspberry Pi to global fleets.
In production at
Google, Tesla, Snowflake, Apple, DoorDash, Uber, Cloudflare, Walmart, Robinhood, Okta, Goldman Sachs, JPMorgan Chase, Hostinger, DigitalOcean, Disney+, Yahoo Japan, and more.
Built-in Probes
Probe every layer of your stack with built-ins. HTTP, gRPC, Browser, and Starlark Script probes test the surface your users actually touch — including full headless-browser checks that drive multi-step user flows. DNS, PING, TCP, and UDP probes verify the network underneath — name resolution, packet loss, connection refused, byte-level transport issues.
All built-ins share the same scheduling, target discovery, validators, and surfacers. Mix them in one config to monitor an application end-to-end — from DNS resolution, through the TCP handshake, to the HTTP response body.
See all probe types See the Browser Probe demoCustom Probes & Extensibility
When the built-ins don't fit, three escape hatches at increasing depth: shell out to any binary (External), run inline logic in Starlark Script, or compile a Go Extension into a private build.
All three flow through the same scheduling, target discovery, validators, and surfacers — first-class citizens, not bolted on.
Read the extensibility guideObservability, Alerts & Status
Every probe result has two destinations. Surfacers ship metrics where you already look — Prometheus, Datadog, CloudWatch, Google Cloud Monitoring, PostgreSQL, Pub/Sub, OpenTelemetry, plain stdout. No sidecars, no translation layers. Alert rules fire on failures, SLO breaches, or custom thresholds with delivery through PagerDuty, OpsGenie, Slack, email, or webhook.
And a built-in status dashboard shows live probe state per target — useful for incident response when your main observability stack may itself be down. No external dependencies; cloudprober becomes the source of truth for your synthetic checks.
Browse surfacers and alertsDynamic Templated Configs
Stop hand-writing thousands of probe definitions. Targets are discovered automatically — from Kubernetes, GCE, file-based lists, or any service you plug in — and probes are generated from templated configs.
Add a new region, a new service, or a new endpoint, and cloudprober picks it up on its next reload. Templates use Go text/template, with target metadata as variables, so a single probe block can adapt across hundreds of endpoints.
Read the config guideOne binary. Thousands of probes.
Cloudprober runs as a single Go process. Each probe is a goroutine — thousands run concurrently with sub-millisecond scheduling overhead, sharing one in-memory metrics pipeline.
Stop running an agent per region, per cluster, or per service. One cloudprober instance handles probing for an entire fleet on a single CPU core. Less ops surface. Less infrastructure to maintain. Less to break.
See the architecture1 process
single Go binary, no agent fleet
10k+ /sec
probes per core, typical workload
<1 ms
scheduling overhead per probe
Goroutine per probe. One in-memory pipeline. No JVM, no Python, no sidecar.
Bring your own probe. First-class, not bolted on.
Three escape hatches, at increasing depth — all integrated into the same pipeline.
External shells out to any binary or script. Script runs Python-like Starlark inline in your config — multi-step, conditional, no rebuild step. Extension compiles a Go module into a private cloudprober binary for the deepest integration.
However you write it, your custom probe gets the same scheduling, target discovery, validators, and surfacers as anything built in.
Read the extensibility guideprobe { name: "checkout_flow" type: SCRIPT script_probe { starlark: """ # Multi-step probe, no rebuild required def probe(target): r = http.get("https://%s/login" % target) assert.http_status(r, 200) r = http.post( "https://%s/login" % target, data = {"user": "test"}, ) cookies = r.cookies r = http.get( "https://%s/checkout" % target, cookies = cookies, ) assert.contains(r.body, "Your Cart") """ } }
One binary, the whole stack.
Probes, target discovery, surfacers, alert rules, and a live status dashboard all live in the same process — driven by the same config. No alertmanager. No sidecar. No glue.
Add Prometheus, Grafana, PagerDuty, or any other tool you're already running. Don't, if you don't want to. The same setup grows from one VM to multi-region production without swapping in a "real" tool later.
Stories from production
Cloudprober does an excellent job running probes to determine the health of a system.
Cloudprober only does one thing — launches and measures probes. The workflow is designed to be simple and lightweight to keep resource usage low.
Hostinger Engineering
Cloudprober Explained: The Way We Use It · Sep 2021 · ~1.8M sites from a single instance
Read the postAlso referenced in Cloudflare — Scaling with safety (May 2025)
Start detecting failures before your users do. Get started