Script Probe (Starlark)
Cloudprober’s script probe (proto type STARLARK) lets you write
multi-step checks as a small script. The script is written in
Starlark – a sandboxed, Python-like
language with if/for/dicts/lists/string formatting – and runs in-process,
once per resolved target each interval. Wall time of the call becomes the
probe’s latency; a clean return is success, any unhandled error (including
assertion failures) is failure.
Think of it as “curl + jq + assertions, in a real scripting language”, without the deployment baggage of shipping a binary alongside Cloudprober. A typical use case is a chained API flow: get a token, list resources, fetch a detail page, validate the response, emit a custom latency metric for the critical step.
Why Starlark? It’s the sandboxed Python-like language behind Bazel and Buck. It gives you most of Python’s ergonomics (string formatting, dicts, comprehensions) with strong guarantees: no filesystem, no
exec, no module system, deterministic semantics, and cheap to embed – perfect for short scripts that ship inside a config file.
When to use it
| If you want to… | Use |
|---|---|
| Probe a single URL | HTTP probe |
| Run an arbitrary external binary, in any language | External probe |
| Drive a real browser through user flows | Browser probe |
| Chain a few HTTP requests, assert on responses, emit custom metrics | Script probe (this page) |
The script probe sits in the gap between the HTTP probe (which is one request, no logic) and the external probe (which is a whole binary you have to build, ship, and version). For most “test an API end-to-end” checks, that gap is exactly where you want to be.
Quick Start
A small token-auth API check (full example in examples/starlark/):
# trading_api.star
def probe(target):
base = "http://%s:%d" % (target.name, target.port)
# 1. Auth.
r = http.post(
url = base + "/api-token-auth/",
json = {"username": "demo", "password": "demo"},
)
assert.http_status(r, 200)
token = r.json()["token"]
auth = {"Authorization": "Token " + token}
# 2. Accounts.
r = http.get(url = base + "/accounts/", headers = auth)
assert.http_status(r, 200)
accounts = r.json()["results"]
if len(accounts) == 0:
fail("no accounts returned")
# 3. Portfolio for first account.
account = accounts[0]["account_number"]
r = http.get(url = base + "/portfolios/%s/" % account, headers = auth)
assert.http_status(r, 200)
print("equity for %s: %s" % (account, r.json()["equity"]))
# cloudprober.cfg
probe {
name: "trading_api_flow"
type: STARLARK
targets {
host_names: "127.0.0.1:8080"
}
interval_msec: 5000
timeout_msec: 3000
starlark_probe {
source_file: "trading_api.star"
}
}
Cloudprober calls probe(target) every 5 seconds. Each assert.http_status raises
on mismatch, ending the run as a failure – without any extra plumbing,
total, success, and latency metrics flow to your surfacers exactly like
any other probe.
The target argument
The entry point receives a single argument exposing fields from the resolved target endpoint:
| Attribute | Type | Notes |
|---|---|---|
target.name | string | Host name or IP from discovery. |
target.port | int | 0 if no port. |
target.ip | string | Resolved IP ("" if not yet resolved). |
target.labels | dict[str,str] | Frozen; lookups via target.labels.get("env", "prod"). |
Available builtins
All scripts get a small, fixed set of builtins. No filesystem, network beyond
http, or exec – the probe is sandboxed by Starlark.
| Builtin | Purpose |
|---|---|
http.get(url, headers=None) | HTTP GET, returns a Response. |
http.post(url, headers=None, body=None, json=None) | HTTP POST. Pass json= for an auto-encoded JSON body (sets Content-Type), or body= for a raw string/bytes. |
assert.http_status(response, expected) | Fails the probe if response.status != expected. |
vars.get(name, default=None) | Read values from the probe’s vars config map (see below). |
state.get(key, default=None) / state.set(key, value) | Per-target key-value store that persists across runs (see below). |
log.info(msg) / log.warn(msg) / log.error(msg) / log.debug(msg) | Route a message to Cloudprober’s logger with the probe’s target attribute attached. |
print_metric(line) | Emit a custom metric line (see below). |
print(...) | Standard Starlark print; routed to the logger at INFO. |
fail(msg) | Standard Starlark; ends the run as a failure. |
Response
r = http.get(url = "https://example.com/")
r.status # int, e.g. 200
r.headers # dict[str,str]; multi-valued headers are ", "-joined
r.body # bytes
r.json() # parsed JSON (raises on parse error)
vars
vars.get reads from the probe’s static config map – useful for passing
non-secret config (an environment name, an API base URL, a feature flag)
without rebuilding the script:
probe {
type: STARLARK
starlark_probe {
source_file: "checkout.star"
vars { key: "api_base" value: "https://api.staging.example.com" }
vars { key: "feature_flag" value: "fast_checkout" }
}
}
def probe(target):
base = vars.get("api_base", "https://api.example.com")
...
For host environment values, use Cloudprober’s config-loading template layer
(e.g. vars { key: "api_key" value: "{{ envVar \"API_KEY\" }}" }).
state
state is a per-(probe, target) dictionary that survives across runs. Use
it for things like remembering the last-seen ETag, a paging cursor, or a
rate-limit countdown:
def probe(target):
last_id = state.get("last_id", 0)
r = http.get(url = "http://%s/events?since=%d" % (target.name, last_id))
assert.http_status(r, 200)
events = r.json()["events"]
if events:
state.set("last_id", events[-1]["id"])
The bucket lives only as long as the target does: when a target disappears from discovery, its state is dropped. Each bucket holds up to 1024 keys.
Custom metrics with print_metric
print_metric(line) accepts the same payload-format strings as the external
probe’s stdout protocol. The simplest form is name value:
print_metric("items_in_cart 5")
print_metric('checkout_latency_ms{flow="guest"} 234.5')
Configure how those lines are interpreted – gauge vs. cumulative,
distribution buckets, in-Cloudprober aggregation – with
output_metrics_options on the probe:
probe {
type: STARLARK
starlark_probe {
source_file: "checkout.star"
output_metrics_options {
aggregate_in_cloudprober: true
dist_metric {
key: "checkout_latency_ms"
value {
explicit_buckets: "10,50,100,250,500,1000,5000"
}
}
}
}
}
The configuration knobs (kind, additional labels, JSON / header metrics, distribution buckets) are exactly those used by the external probe output_metrics_options; see that section for a worked example.
Metrics are dispatched as the line is emitted – if a later assertion
fails, every print_metric line before it still surfaces.
Configuration reference
See the generated config reference for the full schema. The most-used fields:
| Field | Default | Description |
|---|---|---|
source / source_file | – | Exactly one of these. Inline source or path to a .star file. |
entry_point | "probe" | Function to call each run. Must take one argument (target). |
vars | – | map<string,string> exposed to the script via vars.get. |
output_metrics_options | – | How print_metric lines are parsed – kind, distributions, aggregation, etc. |
Lifecycle and concurrency
- Module-level code (top-level
def, assignments, imports) runs once at probe load. Helper functions and constants defined there are reused on every call. - The entry-point function is called once per target per interval. Each call
runs on its own Starlark thread; module globals are frozen after load, so
you cannot accumulate cross-run state in a module-level dict – use
stateinstead. probe_timeoutbounds each call: an in-flighthttp.getis cancelled, and pure-Starlark loops are interrupted at the next call/branch.