Carrier™ vs Node.js, Python, Go, .NET, and Spring Boot.
Six small API services — one per language/framework — exposing the same three endpoints. An aiohttp client bombards each with 64 in-flight requests for 30 seconds, records latency and throughput, and samples the server process's resident-set memory every 200 ms. All stacks run natively on the same host; no containers, no networking asymmetry. The CPU kernel is frozen as sha256_chain_v1 so published comparisons stay apples-to-apples across runs.
Compiled-language throughput, a fraction of the memory.
Across all three endpoints, Carrier lands in the top tier for throughput and tail-latency. Its peak memory is smaller than every other stack here except Go — and smaller than Go on the CPU-bound endpoint.
Three endpoints, three different questions
Each endpoint isolates a different axis so the final picture isn't a single number.
| Stack | Language / runtime | Framework | DB driver |
|---|---|---|---|
| Carrier™ | Carrier™ → Rust | built-in | sqlx |
| Go | Go 1.22+ | Gin | pgx/v5 |
| .NET | C# / .NET 10 | ASP.NET Core Minimal | Npgsql |
| Spring | Java 21 | Spring Boot 3.3 | JDBC + HikariCP |
| Node | Node.js 22 | Fastify | pg |
| Python | Python 3.12 | FastAPI + Uvicorn | asyncpg |
Requests per second across three endpoints.
Higher is better. Carrier clusters with the compiled-language tier (Go, .NET, Spring) on all three endpoints and pulls ahead on the Postgres-backed /items path.

| Stack | GET /health | GET /compute | GET /items |
|---|---|---|---|
| Carrier™ | 23,254 | 24,980 | 19,808 |
| Go | 23,634 | 15,798 | 21,259 |
| .NET | 21,027 | 14,062 | 18,377 |
| Spring | 21,921 | 16,293 | 13,559 |
| Node | 22,923 | 2,532 | 10,863 |
| Python | 18,361 | 2,627 | 2,805 |
p50 / p95 / p99 latency per endpoint.
Lower is better. Carrier's 99th-percentile latency on /items is 6.3 ms — the tightest tail of any stack — because the compiled Axum handler has no async framework layer between it and sqlx.

| Stack | p50 | p95 | p99 | max |
|---|---|---|---|---|
| Carrier™ | 2.71 | 2.79 | 5.50 | 13.71 |
| Go | 2.65 | 2.86 | 5.72 | 14.65 |
| .NET | 2.89 | 3.93 | 6.50 | 14.79 |
| Spring | 2.83 | 3.03 | 6.22 | 15.58 |
| Node | 2.75 | 2.84 | 5.70 | 10.76 |
| Python | 3.42 | 3.61 | 6.20 | 120.34 |
| Stack | p50 | p95 | p99 | max |
|---|---|---|---|---|
| Carrier™ | 2.49 | 2.85 | 5.28 | 22.00 |
| Go | 4.20 | 9.83 | 14.53 | 123.70 |
| .NET | 3.78 | 7.28 | 9.73 | 28.49 |
| Spring | 3.58 | 5.37 | 9.30 | 36.00 |
| Node | 24.76 | 30.99 | 49.79 | 74.61 |
| Python | 24.36 | 25.09 | 26.69 | 47.06 |
| Stack | p50 | p95 | p99 | max |
|---|---|---|---|---|
| Carrier™ | 3.15 | 3.62 | 6.29 | 18.31 |
| Go | 2.87 | 3.70 | 6.25 | 30.02 |
| .NET | 3.33 | 4.19 | 7.00 | 24.67 |
| Spring | 4.14 | 7.95 | 11.87 | 46.60 |
| Node | 5.83 | 6.39 | 8.31 | 19.38 |
| Python | 22.67 | 41.72 | 76.57 | 290.85 |
Peak resident-set size sampled every 200 ms.
Lower is better. The compiled Rust binary's footprint stays under 16 MB at peak. Managed-runtime stacks pay a fixed overhead before they serve a single request.

| Stack | Idle (MB) | Peak @ /health | Peak @ /compute | Peak @ /items |
|---|---|---|---|---|
| Carrier™ | 10 | 13 | 14 | 15 |
| Go | 19 | 28 | 37 | 38 |
| .NET | 99 | 209 | 215 | 221 |
| Spring | 218 | 356 | 621 | 334 |
| Node | 73 | 107 | 237 | 277 |
| Python | 52 | 52 | 52 | 54 |
ps -o rss= against each stack's PID.How the numbers above were produced. No containers, no special tuning.
All six APIs run as native host processes on the same machine and talk to the same native Postgres. The bench client discovers each server PID via its listening socket and samples memory directly.
aiohttp driver with 64 in-flight requests, 8s warm-up discarded, 30s measurement window per (API, endpoint).ps -o rss=. The bench finds the PID with lsof -ti :{port} -sTCP:LISTEN — the actual server, not a Docker userspace proxy.127.0.0.1. Each stack uses a pool of 10 so six × 10 = 60 fits under the default max_connections=100.vpnkit, no userspace proxy. Every stack takes the same loopback path to Postgres.sha256_chain_v1. The chain starts from sha256_hex("carrier") and iterates n=1000 times replacing the hex with sha256_hex(hex). All six APIs produce the same final hex — correctness is verified.node:http, FastAPI over Flask, Minimal APIs over MVC. Otherwise the comparison rewards micro-routers rather than what production actually runs.What the numbers don't say
A single-host benchmark is always relative, never absolute. These caveats are copied verbatim from speed-benchmark/README.md.
items has 10k rows. Production scale would shift the /items numbers — DB cost rises for everyone, but index and planner effects don't show up at this size.node cluster, no multi-process Uvicorn, no multiple Spring instances. Horizontal scaling would move these ratios./compute request dominate. That's the real bottleneck, not hash speed.Every step in one place
The full benchmark directory is committed — the APIs, the client, the raw JSON, and the rendered charts.
$ cd speed-benchmark$ psql -h 127.0.0.1 -U postgres -c "CREATE DATABASE speed_bench;"$ psql -h 127.0.0.1 -U postgres -d speed_bench -f db/init.sql$ (cd carrier-api && carrier build)$ (cd go-api && go build -o go-api main.go)$ (cd dotnet-api && dotnet build -c Release --nologo)$ (cd spring-api && mvn -q -DskipTests package)$ (cd node-api && npm install --omit=dev)$ (cd python-api && python3 -m venv .venv && .venv/bin/pip install -r requirements.txt)# start each API on its own port (see README)$ ./carrier-api/.carrier/build/carrier-bench &$ node node-api/server.js &$ ./go-api/go-api &$ …$ $ cd client && python3 -m venv .venv && source .venv/bin/activate$ pip install -r requirements.txt$ python bench.py --warmup 8$ python render.py # charts + BENCHMARK.mdclient/results/. Charts are PNGs in client/results/charts/. The markdown report BENCHMARK.md is generated by render.py.speed-benchmark/ in the carrier repository. Every API is a minimal, idiomatic implementation of the same three endpoints.Numbers are nice. Seeing the code is better.
The /items endpoint above is seven lines of Carrier. No router boilerplate, no serializer setup, no DB glue — the compiler lowers it into the Rust service binary the benchmark actually measured.