LLMs, Agents, and RAG as Language Features · The Book

Most production AI systems live in a different world than the one they were prototyped in. The notebook becomes a Flask app, the Flask app becomes a queue worker, the queue worker accumulates a tangle of prompt strings, retry handlers, JSON shape adapters, ad-hoc telemetry, and a slowly drifting copy of the upstream tool catalog. Once a real user is on the other end, what looked like a single agent is suddenly a small distributed system whose contract is whatever the last person to touch it remembered. The reason is straightforward: agents have always been libraries, never language.

Carrier closes that gap by treating the agent as a first-class declaration. A Carrier Structured Agent — CSA — is the smallest unit that turns an LLM call into a contract. It has a typed input, an optional typed output, an explicit LLM target, a bounded set of tools, a guards block, a resilience policy, a telemetry shape, and a durability story. The compiler enforces the input/output contract; the runtime records the run; the manifest publishes the metadata for tooling, MCP servers, and downstream services. Nothing about that surface is novel — it is what enterprise integrations have always required. The difference is that in Carrier you write it once.

What a CSA actually is

A CSA is a typed runtime surface that wraps an LLM client, narrows that client's tool set to a subset you reviewed, validates structured output against a Carrier type, and records the run as durable workflow state. Read that sentence twice. Each clause replaces work you would otherwise do by hand: the type narrows what the model can be asked, the tool subset narrows what the model can do, the output schema narrows what the model can return, and the workflow row narrows what the system can lose.

In Carrier source the declaration looks small enough to fit on a screen, which is the point.

excerpt

agent SupportTriage {
  input:  SupportMessage
  output: TicketDecision
  llm:    RoutedSupportAgent
  prompt: input.user_prompt
 
  tools {
    action create_ticket
    fn     search_help_docs
  }
 
  guards {
    require_auth:           true
    tenant_scope:           current_user.tenant_id
    max_tool_calls:         6
    deny_tools_after_output: true
    output_must_match:      TicketDecision
  }
 
  resilience {
    timeout_ms: 12_000
    retry attempts: 2 backoff_ms: 250
    fallback_output: {
      action:          "escalate"
      summary:         "Agent unavailable"
      confidence:      0.0
      suggested_reply: null
    }
  }
}

Read the block top to bottom and notice what is missing. There is no JSON Schema written by hand for the model to interpret — the tool schemas come from the underlying Carrier type signatures. There is no retry decorator wrapped around an opaque coroutine — resilience lives in the same source as the agent it protects. There is no logging middleware to register — telemetry is part of the declaration. There is no separate process manager keeping the run alive — the runtime files the agent under the same workflow table that powers your sagas. There is no orchestration library deciding when to call the model, because the compiler already knows: the route handler calls SupportTriage.run(input) the way it would call any other action.

The compile-time guards are the product

Almost every CSA's value can be traced to its guards block. require_auth refuses to start a run without caller context. tenant_scope binds every tool call to the caller's tenant claim, which is the line between "a model that returns helpful answers" and "a model that returns helpful answers about other people's records." max_tool_calls bounds the loop; deny_tools_after_output prevents a model from continuing to call tools after it has already produced a structured answer. output_must_match instructs the runtime to validate the model's structured output against a Carrier type before completing the run.

These are the lines you would otherwise write a thousand times — once per agent — in adapter code that no reviewer has time to read. In a CSA they are declarative, compiler-checked, and visible in the manifest. A reviewer can answer the question "what can this agent do, on whose behalf, with what evidence" by reading one declaration.

Resilience is the contract

The resilience block carries three small fields that decide everything about how the system behaves on a bad day. timeout_ms places a hard ceiling on the run, so a stuck model never holds open a request thread. retry attempts maps to the underlying workflow retry policy with explicit backoff, so the run is durable across crashes and provider blips. fallback_output is the part most teams would skip; it is the typed value Carrier returns when the run fails. Because Carrier checks at compile time that the fallback satisfies the agent's declared output, callers never have to special-case agent-specific error paths. The contract — the type — is the same whether the model answered or the runtime gave up.

This is a small thing that turns into a large thing in practice. The route handler calling the agent does not need an exception ladder; it does not need a try/except wrapping the JSON.parse; it does not need a metrics decorator counting failure modes. It calls the agent, receives the typed value, and continues. The downstream consumer of the typed value never knows whether this particular run was the model or the fallback. That is what allows the rest of the system — workflows, audit, UIs — to depend on the agent without depending on the model.

Tools are bounded by construction

An agent can only call the fn or action declarations listed in its tools block, and Carrier verifies that those tools already exist on the LLM client referenced by llm. There is no mode in which a CSA exposes a broader tool surface than what you wrote in source. The schema for each tool is derived from the Carrier type signatures — there is no handwritten JSON Schema to drift, and no string-matching at runtime to dispatch. When a tool is an action, the action's existing policy and tenant rules apply during the call. The agent cannot route around a policy block by phrasing a request as a chat message.

This last sentence matters more than the rest of the block. It is the one promise that distinguishes a CSA from a free-form chat completion exposed as a feature. Policies are still policies, even when the caller is the model.

Durability and observability come for free

Every CSA run is a row in carrier_workflow_state. That row records the input, the steps, the tool calls, the output (or fallback), the duration, the tokens, and the trace context. Carrier emits OTLP spans for the agent boundary, every tool call, and every guard failure. The audit log records each tool invocation with its input shape, output shape, duration, and tokens. Every artifact you would otherwise spend a quarter assembling is already in the system, in a shape you can query with SQL, browse in your tracing backend, or hand to an auditor.

If the agent loses the model partway through and falls back, the row records that. If a tool call returned a value the policy denied, the row records that. If the run timed out and rolled back, the row records that. The agent's behavior on the fifth Tuesday is as inspectable as the agent's behavior on the first.

RAG is just the other side of the same construct

Most agents you ship will have a retrieval step in front of them. Carrier exposes that as the rag declaration: a typed retriever (Model.similar_with_scores), an embedder (a Carrier fn or native fn returning Vector(N)), and an LLM target (an llm client or, increasingly, a CSA). The runtime composes the standard pieces — retrieve scored matches, optionally rerank by score threshold, fit serialized model context into the configured token budget, then dispatch the LLM call. The tools and guards on the CSA still apply. RAG is not a different kind of system; it is a particular shape of input.

The practical implication is that the same governance you wrote once for SupportTriage applies whether the prompt is the user's raw message or a retrieval-augmented version of it. You do not write a second agent for the augmented case; you compose a rag block in front of the agent you already have.

What it compiles into

The Rust target produces a real, native runtime for executable CSA runs: typed structured-output parsing, tool dispatch under the caller's auth and policy context, durability into the workflow tables, OTLP spans, and the manifest entries that downstream tools consume. Java and Node accept agent declarations as metadata today and fail closed on executable runs with explicit unsupported-feature diagnostics; the metadata path is enough for documentation, MCP exposure, and review, while the production runtime stays in Rust. This split is intentional — it lets teams adopt CSAs at the architecture and review layer without committing to a runtime they have not deployed yet.

Why you write a CSA instead of an agent framework

An agent framework gives you a Python class hierarchy, a decorator vocabulary, and a runtime that you adopt as a dependency. A CSA gives you a typed declaration that compiles into your service. The framework lives in your service's process; the declaration lives in your service's source tree. The framework's contract is a docstring; the declaration's contract is a type. The framework's failure mode is an unhandled exception; the declaration's failure mode is a fallback value that satisfies the same type as a successful run.

These are not arguments against frameworks. Frameworks are excellent prototyping environments. The argument is that an enterprise system, by the time it reaches its third quarter of life, ends up reinventing the small list of contracts a CSA already encodes. Carrier just wrote the contract down.

How to read a CSA in review

Does the input type narrow what the agent can be asked, or does it accept Json?
Is the output type concrete enough that downstream consumers can rely on it?
Does the tools block list only the actions the agent should be allowed to call?
Do require_auth and tenant_scope match the caller boundary the route already enforces?
Is the fallback_output a coherent answer, or is it a placeholder no one will read?
Is the routed LLM client wrapping the agent so budget pressure and outages have a fallback?
Are emit_tokens / emit_tool_calls / emit_guard_failures on, so runs leave evidence?

If a CSA's guards, tools, output, and fallback all read clearly, the rest of the agent is a prompt — and prompts are tunable. If the guards, tools, output, or fallback are missing, the rest of the agent is irrelevant. CSAs make the review focus on the parts that matter at production scale, which is the only scale that matters once a real user is involved.