Script Probe (Starlark)

Cloudprober’s script probe (proto type STARLARK) lets you write multi-step checks as a small script. The script is written in Starlark – a sandboxed, Python-like language with if/for/dicts/lists/string formatting – and runs in-process, once per resolved target each interval. Wall time of the call becomes the probe’s latency; a clean return is success, any unhandled error (including assertion failures) is failure.

Think of it as “curl + jq + assertions, in a real scripting language”, without the deployment baggage of shipping a binary alongside Cloudprober. A typical use case is a chained API flow: get a token, list resources, fetch a detail page, validate the response, emit a custom latency metric for the critical step.

Why Starlark? It’s the sandboxed Python-like language behind Bazel and Buck. It gives you most of Python’s ergonomics (string formatting, dicts, comprehensions) with strong guarantees: no filesystem, no exec, no module system, deterministic semantics, and cheap to embed – perfect for short scripts that ship inside a config file.

When to use it

If you want to…Use
Probe a single URLHTTP probe
Run an arbitrary external binary, in any languageExternal probe
Drive a real browser through user flowsBrowser probe
Chain a few HTTP requests, assert on responses, emit custom metricsScript probe (this page)

The script probe sits in the gap between the HTTP probe (which is one request, no logic) and the external probe (which is a whole binary you have to build, ship, and version). For most “test an API end-to-end” checks, that gap is exactly where you want to be.

Quick Start

A small token-auth API check (full example in examples/starlark/):

# trading_api.star
def probe(target):
    base = "http://%s:%d" % (target.name, target.port)

    # 1. Auth.
    r = http.post(
        url = base + "/api-token-auth/",
        json = {"username": "demo", "password": "demo"},
    )
    assert.http_status(r, 200)
    token = r.json()["token"]
    auth = {"Authorization": "Token " + token}

    # 2. Accounts.
    r = http.get(url = base + "/accounts/", headers = auth)
    assert.http_status(r, 200)
    accounts = r.json()["results"]
    if len(accounts) == 0:
        fail("no accounts returned")

    # 3. Portfolio for first account.
    account = accounts[0]["account_number"]
    r = http.get(url = base + "/portfolios/%s/" % account, headers = auth)
    assert.http_status(r, 200)
    print("equity for %s: %s" % (account, r.json()["equity"]))
# cloudprober.cfg
probe {
  name: "trading_api_flow"
  type: STARLARK
  targets {
    host_names: "127.0.0.1:8080"
  }
  interval_msec: 5000
  timeout_msec: 3000
  starlark_probe {
    source_file: "trading_api.star"
  }
}

Cloudprober calls probe(target) every 5 seconds. Each assert.http_status raises on mismatch, ending the run as a failure – without any extra plumbing, total, success, and latency metrics flow to your surfacers exactly like any other probe.

The target argument

The entry point receives a single argument exposing fields from the resolved target endpoint:

AttributeTypeNotes
target.namestringHost name or IP from discovery.
target.portint0 if no port.
target.ipstringResolved IP ("" if not yet resolved).
target.labelsdict[str,str]Frozen; lookups via target.labels.get("env", "prod").

Available builtins

All scripts get a small, fixed set of builtins. No filesystem, network beyond http, or exec – the probe is sandboxed by Starlark.

BuiltinPurpose
http.get(url, headers=None)HTTP GET, returns a Response.
http.post(url, headers=None, body=None, json=None)HTTP POST. Pass json= for an auto-encoded JSON body (sets Content-Type), or body= for a raw string/bytes.
assert.http_status(response, expected)Fails the probe if response.status != expected.
vars.get(name, default=None)Read values from the probe’s vars config map (see below).
state.get(key, default=None) / state.set(key, value)Per-target key-value store that persists across runs (see below).
log.info(msg) / log.warn(msg) / log.error(msg) / log.debug(msg)Route a message to Cloudprober’s logger with the probe’s target attribute attached.
print_metric(line)Emit a custom metric line (see below).
print(...)Standard Starlark print; routed to the logger at INFO.
fail(msg)Standard Starlark; ends the run as a failure.

Response

r = http.get(url = "https://example.com/")
r.status     # int, e.g. 200
r.headers    # dict[str,str]; multi-valued headers are ", "-joined
r.body       # bytes
r.json()     # parsed JSON (raises on parse error)

vars

vars.get reads from the probe’s static config map – useful for passing non-secret config (an environment name, an API base URL, a feature flag) without rebuilding the script:

probe {
  type: STARLARK
  starlark_probe {
    source_file: "checkout.star"
    vars { key: "api_base" value: "https://api.staging.example.com" }
    vars { key: "feature_flag" value: "fast_checkout" }
  }
}
def probe(target):
    base = vars.get("api_base", "https://api.example.com")
    ...

For host environment values, use Cloudprober’s config-loading template layer (e.g. vars { key: "api_key" value: "{{ envVar \"API_KEY\" }}" }).

state

state is a per-(probe, target) dictionary that survives across runs. Use it for things like remembering the last-seen ETag, a paging cursor, or a rate-limit countdown:

def probe(target):
    last_id = state.get("last_id", 0)
    r = http.get(url = "http://%s/events?since=%d" % (target.name, last_id))
    assert.http_status(r, 200)
    events = r.json()["events"]
    if events:
        state.set("last_id", events[-1]["id"])

The bucket lives only as long as the target does: when a target disappears from discovery, its state is dropped. Each bucket holds up to 1024 keys.

Custom metrics with print_metric

print_metric(line) accepts the same payload-format strings as the external probe’s stdout protocol. The simplest form is name value:

print_metric("items_in_cart 5")
print_metric('checkout_latency_ms{flow="guest"} 234.5')

Configure how those lines are interpreted – gauge vs. cumulative, distribution buckets, in-Cloudprober aggregation – with output_metrics_options on the probe:

probe {
  type: STARLARK
  starlark_probe {
    source_file: "checkout.star"
    output_metrics_options {
      aggregate_in_cloudprober: true
      dist_metric {
        key: "checkout_latency_ms"
        value {
          explicit_buckets: "10,50,100,250,500,1000,5000"
        }
      }
    }
  }
}

The configuration knobs (kind, additional labels, JSON / header metrics, distribution buckets) are exactly those used by the external probe output_metrics_options; see that section for a worked example.

Metrics are dispatched as the line is emitted – if a later assertion fails, every print_metric line before it still surfaces.

Configuration reference

See the generated config reference for the full schema. The most-used fields:

FieldDefaultDescription
source / source_fileExactly one of these. Inline source or path to a .star file.
entry_point"probe"Function to call each run. Must take one argument (target).
varsmap<string,string> exposed to the script via vars.get.
output_metrics_optionsHow print_metric lines are parsed – kind, distributions, aggregation, etc.

Lifecycle and concurrency

  • Module-level code (top-level def, assignments, imports) runs once at probe load. Helper functions and constants defined there are reused on every call.
  • The entry-point function is called once per target per interval. Each call runs on its own Starlark thread; module globals are frozen after load, so you cannot accumulate cross-run state in a module-level dict – use state instead.
  • probe_timeout bounds each call: an in-flight http.get is cancelled, and pure-Starlark loops are interrupted at the next call/branch.