API Monitoring Guide
Get alerted when the Cloudflare API has issues before your deploys fail, DNS changes silently stall, or your edge starts returning 5xx. Know the Cloudflare API status in real time, not 15 minutes later.
Cloudflare sits in front of roughly a fifth of the web — DNS, CDN, WAF, Workers, and Access all run on it. When the control-plane API at api.cloudflare.com/client/v4 degrades, your Terraform and CI/CD pipelines fail, DNS automations stall, and certificate or rule changes don't apply. When the edge has a regional incident, your visitors see errors directly.
This guide covers everything you need to monitor Cloudflare API status: which endpoints to track, how the 1,200-per-5-minutes rate limit works, how to tell a Cloudflare problem apart from an origin problem, and what to do when something breaks during a deploy.
Cloudflare maintains a detailed status page at cloudflarestatus.com, broken down by product and by data center (colo). It's one of the better status pages in the industry — but relying on it as your only source of truth still has gaps.
External synthetic monitoring catches Cloudflare API and edge issues in 1-2 minutes, not 10-15. That head start lets you pause a deploy pipeline, fail over DNS, or roll back a Worker before the impact spreads.
"Cloudflare API" can mean several different things, and they fail independently. Knowing which surface you depend on tells you what to monitor.
Prefer scoped API tokens (sent as Authorization: Bearer) over the legacy Global API Key. For monitoring, mint a read-only token scoped to "Read" on Zone and User, and use the /user/tokens/verify endpoint — it confirms both API health and that the token is still valid.
Cloudflare API responses are JSON wrapped in a standard envelope with success, errors, and result fields. A 200 with "success": false is still a failure — validate the body, not just the status code.
Each endpoint below tests a different layer. Monitoring several of them lets you pinpoint whether a problem is auth, the control plane, the edge, or Workers.
GET https://api.cloudflare.com/client/v4/user/tokens/verify
Token verification. The cheapest, safest health check: it confirms the API is up and that your token is active and not revoked. Returns "status": "active" in the result. Make this your primary Cloudflare health monitor.
GET https://api.cloudflare.com/client/v4/zones?per_page=1
Zone list. Confirms read access to your zones works — the dependency behind most DNS, rule, and certificate automation. Use ?per_page=1 to keep the response tiny and stay well under the rate limit.
GET https://your-app.example.com/health
A proxied hostname behind Cloudflare. This is what your visitors actually hit — it tests the CDN edge and your origin together. Watch for Cloudflare-branded 5xx (520-524) which signal edge or origin problems even when the API is fine.
GET https://your-worker.your-subdomain.workers.dev/
A deployed Worker route. Validate the response body to catch 1101 (Worker threw an exception) and 1102 (CPU/memory limit) errors, which return as Cloudflare error pages rather than clean status codes.
GET https://cloudflare-dns.com/dns-query?name=example.com&type=A
DNS over HTTPS resolver (requires the Accept: application/dns-json header). Optional, but useful if you depend on 1.1.1.1 / DoH for resolution. Tests Cloudflare's public resolver independently of your zones.
Rate limit note: The global API limit is 1,200 requests per 5-minute window per user, shared across the dashboard, keys, and tokens. A monitor running every minute against three endpoints uses ~15 requests per window — a tiny fraction. Keep automation and monitoring on separate tokens so a runaway script can't starve your health checks.
Cloudflare's rate limiting can make a healthy API look like an outage from your application's side. Understanding it is essential for both robust automation and accurate alerting.
The core limit is 1,200 requests per five-minute window, per user, counted cumulatively across every method of access — dashboard clicks, API keys, and API tokens all draw from the same bucket. Exceed it and Cloudflare returns HTTP 429 and blocks all your API calls until the window resets.
The Ratelimit header shows remaining quota and seconds until reset.
Some endpoints layer on stricter limits beyond the global cap — for example, certain analytics, GraphQL, and Logpull endpoints. Check the Ratelimit-Policy header to see which policy applied to a given response.
A handful of lightweight GET checks per minute is negligible against a 1,200/5-min budget. The real risk is a Terraform plan, bulk DNS import, or buggy script burning the budget — which then 429s your monitors too. Isolating monitoring on its own token keeps your health signal clean.
A HTTP 429 means you are being throttled, not that Cloudflare is down — respect the Retry-After header. A 503 or sustained timeouts point to an actual API or edge incident. UptimeSignal records both status codes and response times so you can tell throttling apart from a real outage at a glance.
Your infrastructure runs on Cloudflare. Monitor it yourself.
Get alerted when the client/v4 API, your edge, or your Workers start failing — before a deploy stalls or visitors see 5xx. Free for 25 endpoints, checks every 5 minutes.
Monitor Cloudflare API free →Knowing the common failure patterns helps you configure monitoring that catches real problems and avoids false alerts.
Tokens can be deleted, hit an expiry date, or lose permissions when roles change. Your monitor catches this as a 401 or 403 with "success": false, letting you rotate the token before automation goes silent.
521 (origin down), 522 (connection timed out), 523 (origin unreachable), and 524 (origin timeout) mean Cloudflare is up but your origin isn't responding. Monitoring a proxied hostname surfaces these so you can fix the origin, not chase a phantom Cloudflare incident.
A Worker that throws (1101) or exceeds CPU/memory limits (1102) returns a Cloudflare error page wrapped as a 5xx. Body validation on your Worker route catches these — a status-only check can miss a 200-wrapped error page.
A bulk operation or runaway script can burn the 1,200/5-min budget and 429 every subsequent call, including your CI/CD. Watching for 429s on the API endpoints tells you it's a quota problem, not an outage.
Cloudflare's edge is hundreds of colos; a regional incident can raise latency or errors for some visitors while global status stays green. Track response times on your proxied hostnames and alert on sustained latency spikes, not just hard failures.
In the Cloudflare dashboard, go to My Profile → API Tokens → Create Token. Use a custom token with read-only permissions: User → User Details → Read and Zone → Zone → Read. Copy the token once — it isn't shown again.
Security tip: Never use the Global API Key for monitoring. A dedicated read-only token can't change anything if it leaks.
Sign up at app.uptimesignal.io and add a new HTTP monitor:
Body validation on "active" catches a 200 envelope with "success": false.
Repeat for the zone-list endpoint (control plane), a proxied hostname (edge + origin), and a Worker route if you run Workers. Monitoring each layer separately tells you exactly what's failing.
Route alerts to the people who can act:
For release windows and DNS migrations, use 1-minute checks (Pro plan) so you catch a bad change or edge regression within a minute instead of five.
Configure your monitors to alert on these conditions:
success)Cloudflare wraps responses in {"success": ..., "errors": [...], "result": ...}. A 200 with "success": false is a failure. Validate for "success":true or the "active" status on token verify so you catch envelope-level errors a status check would miss.
When monitoring alerts you to a Cloudflare problem, here's how to respond and limit the blast radius.
Check cloudflarestatus.com by component and region. Determine whether it's auth (401/403), rate limit (429), a control-plane API incident, or an edge/origin error (520-524). Each needs a different response.
If the control-plane API is degraded, pause Terraform runs, DNS sync jobs, and CI/CD steps that call the API. Half-applied changes during an incident are worse than no changes — let it stabilize before retrying.
If visitors are seeing 5xx but the API is fine, the issue is the edge or your origin. Confirm your origin is healthy, and if it's a localized edge incident, decide whether to (temporarily) grey-cloud DNS to bypass the proxy.
If the problem started right after a Worker deploy or rule change, roll it back first. Many "Cloudflare outages" are actually a Worker exception or a misconfigured rule you just shipped.
Update your status page and notify dependent teams. If it's an upstream Cloudflare incident, link the official status post so support tickets point to the right place.
curl -H "Authorization: Bearer YOUR_TOKEN" https://api.cloudflare.com/client/v4/user/tokens/verify. For continuous monitoring, set up UptimeSignal to poll every 1-5 minutes and alert you instantly when the status changes.
/client/v4/user/tokens/verify as a lightweight auth + API health check, add /client/v4/zones?per_page=1 for control-plane reads, and monitor a real proxied hostname for the edge. If you run Workers, add a deployed route or workers.dev URL and validate the body to catch 1101/1102 exceptions.
Ratelimit, Ratelimit-Policy, and Retry-After headers. Monitoring at 1-5 minute intervals uses a negligible share.
https://cloudflare-dns.com/dns-query with the Accept: application/dns-json header.
UptimeSignal checks your endpoints from outside your network and catches errors before users do.
25 monitors free, unlimited for $10/month.
No password needed. We'll send a magic link.