December 2025 · ·

Building an API Health Monitoring Agent with x402 Micropayments

Introduction

I built an API health monitoring agent using the Daydreams Lucid-Agents framework and deployed it on port402.com. It supports x402 micropayments - other agents can pay per health check instead of dealing with API keys or monthly subscriptions.

The problem: when running multiple agent endpoints on port402, I needed a way to monitor their health without setting up DataDog or New Relic for each one. The agent uses ERC-8004 for on-chain identity and runs multiple probes to catch intermittent failures.


1. Agent-to-Agent APIs

Traditional APIs use API keys and monthly subscriptions. If you want to call an API, you create an account, get a key, set up billing, and manage quotas. This works fine when humans are in the loop, but it breaks down when agents need to call other agents autonomously.

The x402 protocol (based on HTTP 402 Payment Required) lets agents pay per-request instead. An agent can call another agent's API, pay a small amount of ETH, and get a response - no API keys, no subscriptions, no human intervention.

Here's the cost difference:

$$C_{\text{traditional}} = C_{\text{subscription}} + C_{\text{unused}} + C_{\text{management}}$$

With traditional APIs, you pay a fixed monthly cost, waste money on unused quota (usually 40-60% of your subscription), and spend developer time managing keys and billing.

With x402:

$$C_{\text{x402}} = \sum_{i=1}^{n} p_i \cdot r_i$$

You pay exactly for what you use - \(p_i\) per request to endpoint \(i\), times \(r_i\) requests. No fixed costs, no wasted quota, no management overhead.


2. The Problem

When running multiple agent endpoints on port402.com, I needed a way to monitor their health. The usual options (DataDog, New Relic, CloudWatch) have problems:

Setup overhead: You have to install agents in containers, configure dashboards, set up alerts, and manage API keys. This doesn't scale when you're running dozens of agents.

Not built for x402: Traditional monitoring tools don't know how to verify x402 payments or check ERC-8004 identity registration. They can't send alerts to other agents with micropayments.

Wrong pricing model: They charge $15-100/host/month. For a platform where agents pay per-request, fixed monthly costs don't make sense.

Trust: In an agent economy, who monitors the monitors? I wanted cryptographic proof that health checks actually happened, with an on-chain audit trail.


3. What It Does

The agent is a REST endpoint that monitors other APIs. You send it a URL and some parameters, and it probes the endpoint 4 times to check if it's healthy.

Request Format

POST request with the target API details:

interface HealthCheckRequest {
  url: string;           // Target API endpoint to check
  expectedStatus: number; // Expected HTTP status code (e.g., 200)
  maxLatencyMs: number;  // Maximum acceptable latency in milliseconds
  method: string;        // HTTP method (GET, HEAD, POST, etc.)
}

Response Format

Healthy response

All 4 probes passed:

{
  "health": {
    "url": "https://api.example.com/status",
    "method": "HEAD",
    "checkedAt": "2024-12-22T10:30:00.000Z",
    "status": 200,
    "expectedStatus": 200,
    "ok": true,
    "expectedStatusMet": true,
    "latencyMs": 245,
    "withinLatencyBudget": true
  },
  "context": {
    "runId": "550e8400-e29b-41d4-a716-446655440000",
    "agentVersion": "0.0.2"
  }
}

Unhealthy response

One or more probes failed:

{
  "health": {
    "url": "https://api.example.com/status",
    "method": "GET",
    "checkedAt": "2024-12-22T10:35:00.000Z",
    "status": 503,
    "expectedStatus": 200,
    "ok": false,
    "expectedStatusMet": false,
    "latencyMs": 1850,
    "withinLatencyBudget": false,
    "error": "Service unavailable - status 503, latency 1850ms exceeds budget of 1000ms"
  },
  "context": {
    "runId": "661f9511-f39c-52e5-b827-557766551111",
    "agentVersion": "0.0.2"
  }
}

4. Lucid-Agents Framework

I used Daydreams Lucid-Agents, a TypeScript framework that includes x402 payments, ERC-8004 identity, and type-safe schemas out of the box.

Why use it?

x402 payments are built in

With a normal framework, you have to manually verify payments, check credits, deduct balances, etc. With Lucid-Agents, you just configure the payment settings:

import { createAgentApp } from "@lucid-agents/agent-kit-hono";

const { app, addEntrypoint } = createAgentApp(
  { name: "API Health Checker", version: "0.0.2" },
  {
    config: {
      payments: {
        facilitatorUrl: "https://facilitator.daydreams.systems",
        payTo: process.env.PAY_TO as `0x${string}`,
        network: "base-sepolia",
        defaultPrice: "0.001", // 0.001 ETH per request
      }
    }
  }
);

The framework handles payment verification, wallet debiting, transaction logging, and failure rollback automatically.

On-chain identity with ERC-8004

Agents register themselves on-chain so other agents can verify who they are:

import { createAgentIdentity, generateAgentMetadata } from "@lucid-agents/agent-kit-identity";

const identity = await createAgentIdentity({
  domain: process.env.AGENT_DOMAIN, // e.g., "health-dev.port402.com"
  autoRegister: true,
  env: process.env
});

if (identity.didRegister) {
  console.log(`Agent registered! TX: ${identity.transactionHash}`);
  console.log(`Metadata: https://${identity.domain}/.well-known/agent-metadata.json`);
}

This creates an immutable record on-chain that the agent exists and is controlled by a specific Ethereum address. Other agents can verify the identity by fetching the metadata:

// Any agent can verify another agent's identity
const metadata = await fetch('https://health-dev.port402.com/.well-known/agent-metadata.json');
const agentInfo = await metadata.json();

console.log(agentInfo.agentId);        // On-chain ID
console.log(agentInfo.owner);          // Ethereum address of operator
console.log(agentInfo.capabilities);   // What this agent can do

Type safety with Zod

You define your API contract with Zod schemas, and the framework validates inputs/outputs at runtime:

import { z } from "zod";

const healthCheckInputSchema = z.object({
  url: z.string().url(),
  method: z.enum(["HEAD", "GET"]).default("HEAD"),
  expectedStatus: z.coerce.number().int().min(100).max(599).default(200),
  maxLatencyMs: z.coerce.number().int().positive().default(1_000),
  alertWebhookUrl: z.string().url().optional(),
});

const healthCheckOutputSchema = z.object({
  health: z.object({
    url: z.string(),
    status: z.number(),
    ok: z.boolean(),
    latencyMs: z.number(),
    withinLatencyBudget: z.boolean(),
    expectedStatusMet: z.boolean(),
    errorMessage: z.string().optional(),
  }),
  alert: z.object({
    dispatched: z.boolean(),
    message: z.string(),
  }).optional(),
  context: z.object({
    runId: z.string(),
    agentVersion: z.string(),
  }),
});

export const apiHealthCheck = {
  key: "api-health-check",
  description: "Monitor HTTP(S) endpoints with latency budgets and alerting",
  input: healthCheckInputSchema,
  output: healthCheckOutputSchema,
  async handler(ctx: AgentContext) {
    // TypeScript knows the exact shape of ctx.input!
    const { url, expectedStatus, maxLatencyMs } = ctx.input;
    // ...
  }
};

Invalid requests get rejected automatically, the framework generates OpenAPI docs, and TypeScript knows the exact shape of your inputs/outputs.


5. Implementation

Multi-probe strategy

The agent runs 4 probes spaced 250ms apart to catch intermittent failures:

┌─────────────────────────────────────────────────────────────┐
│                     Health Check Flow                       │
└─────────────────────────────────────────────────────────────┘

Client Agent
    │
    │ x402 Payment + Request
    ▼
┌──────────────────────┐
│  Health Check Agent  │
│  (port402.com)       │
└──────────────────────┘
    │
    │ Probe 1 (t=0ms)
    ├────────────► Target API ─────► Status: 200, Latency: 245ms ✓
    │
    │ Wait 250ms
    │
    │ Probe 2 (t=250ms)
    ├────────────► Target API ─────► Status: 200, Latency: 189ms ✓
    │
    │ Wait 250ms
    │
    │ Probe 3 (t=500ms)
    ├────────────► Target API ─────► Status: 503, Latency: 102ms ✗
    │
    │ Wait 250ms
    │
    │ Probe 4 (t=750ms)
    ├────────────► Target API ─────► Status: 200, Latency: 220ms ✓
    │
    │ Aggregate Results
    │ Overall: UNHEALTHY (3/4 passed)
    │
    │ Dispatch Alert (if configured)
    ├────────────► Webhook URL
    │
    │ Return Results
    ▼
Client Agent

Health determination

An endpoint is healthy only if all 4 probes pass:

$$\text{Health} = \begin{cases} \text{HEALTHY} & \text{if } \bigwedge_{i=1}^{4} (S_i = S_{\text{expected}} \land L_i \leq L_{\text{max}}) \\ \text{UNHEALTHY} & \text{otherwise} \end{cases}$$

Where:

  • \(S_i\) = HTTP status code from probe \(i\)
  • \(S_{\text{expected}}\) = Expected status code (e.g., 200)
  • \(L_i\) = Latency of probe \(i\) (in milliseconds)
  • \(L_{\text{max}}\) = Maximum acceptable latency

The reported latency is the worst case across all probes:

$$L_{\text{reported}} = \max_{i=1}^{4} L_i$$

If your SLA requires sub-1000ms response time, this guarantees it across all probes, not just the average.

Code

const HEALTH_CHECK_ATTEMPTS = 4;
const HEALTH_CHECK_INTERVAL_MS = 250;

async function monitorEndpoint(
  url: URL,
  options: { method: HttpMethod; expectedStatus: number; maxLatencyMs: number }
): Promise<HealthCheckResult> {
  const attempts: HealthCheckResult[] = [];

  // Execute 4 sequential probes
  for (let attempt = 0; attempt < HEALTH_CHECK_ATTEMPTS; attempt++) {
    const result = await probeEndpointOnce(url, options);
    attempts.push(result);

    // Wait 250ms between probes (except after the last one)
    if (attempt < HEALTH_CHECK_ATTEMPTS - 1) {
      await new Promise((resolve) => setTimeout(resolve, HEALTH_CHECK_INTERVAL_MS));
    }
  }

  // Strict consensus: ALL must pass
  const allAttemptsOk = attempts.every((attempt) => attempt.ok);
  const withinLatencyBudget = attempts.every((attempt) => attempt.withinLatencyBudget);
  const expectedStatusMet = attempts.every((attempt) => attempt.expectedStatusMet);

  // Report worst-case latency
  const longestLatency = attempts.reduce(
    (max, attempt) => Math.max(max, attempt.latencyMs),
    0
  );

  const lastAttempt = attempts[attempts.length - 1];
  const firstFailure = attempts.find((attempt) => !attempt.ok);

  return {
    status: lastAttempt?.status ?? 0,
    ok: allAttemptsOk,
    latencyMs: longestLatency,  // Conservative: worst-case
    method: lastAttempt?.method ?? options.method,
    timestamp: lastAttempt?.timestamp ?? Date.now(),
    withinLatencyBudget,
    expectedStatusMet,
    errorMessage: allAttemptsOk
      ? undefined
      : firstFailure?.errorMessage ?? "One or more health checks failed.",
  };
}

I chose sequential probes (not parallel) to detect intermittent issues that only show up under sustained load. The 250ms spacing gives 750ms total delay, which is fast enough for monitoring without overwhelming the target. Even one failure marks the endpoint as unhealthy - conservative, but better for catching problems early.


6. Deployment

I deployed the agent on AWS using Terraform. The stack is Route 53 DNS → Application Load Balancer → ECS Fargate → Docker container running Bun.

                                    ┌─────────────────────┐
                                    │   Route 53 DNS      │
                                    │ health.port402.com  │
                                    └──────────┬──────────┘
                                               │
                                               ▼
                              ┌────────────────────────────────┐
                              │  Application Load Balancer     │
                              │  - TLS 1.3 (ELBSecurityPolicy) │
                              │  - Health Check: 200-299       │
                              │  - Idle Timeout: 30s           │
                              └────────────────┬───────────────┘
                                               │
                              ┌────────────────┴────────────────┐
                              │    Target Group (HTTP)          │
                              │    Health: /.well-known/agent.json
                              └────────────────┬────────────────┘
                                               │
                                               ▼
                              ┌─────────────────────────────────┐
                              │    ECS Fargate Service          │
                              │    - Task CPU: 256              │
                              │    - Task Memory: 512 MB        │
                              │    - Desired Count: 1           │
                              └────────────────┬────────────────┘
                                               │
                              ┌────────────────┴────────────────┐
                              │    Docker Container             │
                              │    - Bun Runtime                │
                              │    - Port: 8787                 │
                              │    - Env: Production            │
                              └─────────────────────────────────┘
                                               │
                                               ▼
                              ┌─────────────────────────────────┐
                              │    CloudWatch Logs              │
                              │    Retention: 14 days           │
                              └─────────────────────────────────┘

Latency breakdown

End-to-end latency for a health check:

$$L_{\text{total}} = L_{\text{TLS}} + L_{\text{ALB}} + L_{\text{app}} + L_{\text{probes}}$$

Where:

  1. TLS Handshake (\(L_{\text{TLS}}\)): ~40ms (TLS 1.3) vs ~80ms (TLS 1.2)
  2. ALB Processing (\(L_{\text{ALB}}\)): ~2ms typical, ~10ms at p99
  3. Application Overhead (\(L_{\text{app}}\)): ~8ms total (Bun startup + Zod validation + handler logic)
  4. Probe Execution (\(L_{\text{probes}}\)): 4 probes × (target latency + 250ms spacing)

For a target with 200ms latency:

$$L_{\text{probes}} = 4 \times 200\text{ms} + 3 \times 250\text{ms} = 800 + 750 = 1550\text{ms}$$

Total latency with a warm connection and TLS 1.3:

$$L_{\text{total}} = 40 + 2 + 8 + 1550 = 1600\text{ms}$$

This is fine for health monitoring - accuracy matters more than speed.


7. What I learned

Having x402 and identity built into Lucid-Agents saved about 2 weeks of work. Zod schemas caught several bugs during development. Terraform made deploying to dev/prod easy. Bun has faster cold starts than Node.js (40ms vs 120ms).

Next steps: SSL/TLS certificate expiration checks, DNS validation, multi-region probing, and storing historical health data on-chain.


8. Conclusion

The pieces exist now: x402 for micropayments, ERC-8004 for identity, Lucid-Agents for execution, Base/Sepolia for settlement. This health monitoring agent shows how agents can pay each other per-request without API keys or subscriptions.


Resources