Back to Blog

fast-telemetry: A High-Performance Rust Telemetry Library Built for Hot Paths

April 6, 2026 · Eden Team


If you tell people you are building AI infrastructure, most of them assume the hard part is the model layer. They picture prompts, orchestration, agents, evals, and flashy demos. In our experience, the hard part often lives much lower in the stack: data infrastructure, authentication, observability, permissions, latency, and the very unglamorous question of what happens when one more shared atomic, one more label lookup, or one more exporter flush turns into a real production bottleneck.

That is where fast-telemetry came from. More concretely, it came from a requirement inside Eden that was hard to negotiate away: we wanted to build a proxy layer that could sit on critical paths, handle millions of operations per second, stay in the microsecond-latency range, and still track telemetry for every operation. That is a harsh combination of requirements. A proxy like that is not just forwarding bytes. It is often parsing protocols, checking identity, enforcing policy, routing requests, and producing the operational record that tells you what just happened. We did not want to choose between performance and visibility, because for infrastructure products that tradeoff quickly becomes fake. If the proxy is where authentication, policy, routing, observability, and governance come together, then "turn telemetry down until it is cheap enough" is not a satisfying answer.

We needed telemetry to be present on every operation, but we could not let telemetry itself become the overhead that set the ceiling for the system. That became the design constraint that shaped everything else.

Today we're open-sourcing fast-telemetry, a Rust telemetry library we built for systems where instrumentation overhead is not theoretical. It shows up in profiles. It shows up in throughput limits. And if you are not careful, it starts changing the behavior of the very system you are trying to measure.

The core idea is simple: make the write path cheap. fast-telemetry uses thread-sharded, cache-line-aware data structures so hot recording paths avoid cross-core contention, then aggregates on read when it is time to export.

In our local benchmark run, that approach was often materially faster than comparable OpenTelemetry SDK paths, with the biggest gains showing up in labeled and dynamic metric workloads. If you want the raw data, see the benchmark results page, the Criterion report, and the threaded harness report.

A few representative data points from that same harness run:

Harness caseSpeedupfast ops/secOTel ops/sec
labeled_counter:churn2,364x8,798,611,2273,721,890
dynamic_counter:uniform1,743x6,075,946,9063,485,690
pipeline324x59,388,005183,055
root253x267,326,0711,056,078
dynamic_gauge_i64:uniform9x32,229,5213,570,375

But the benchmarks are only part of the story. The reason this crate exists has everything to do with what it feels like to build infrastructure products in the AI era.

Building Eden at the Intersection of Data and AI

When we started Eden in early 2024, we were not trying to create a telemetry crate. We were building infrastructure, starting in database and data infrastructure because that was where we saw the biggest pain for enterprises trying to modernize. A lot of companies are still operating on a stack designed for an earlier phase of computing. Their systems are fragmented, their data lives in too many places, and their infrastructure is difficult to migrate, difficult to observe, and difficult to govern consistently. We started from a simple belief: if you want to help organizations move into the future, you need to help them deal with the past and the present first.

That is how we think about the platform:

  • the past is modernization, migration, and portability
  • the present is observability, security, authentication, and analytics
  • the future is AI, automation, and the ability to stay on the edge of new systems as they emerge

That framing turned into three core Eden products:

  • Exodus, our migration product, which helps make data portable so organizations can modernize more easily
  • Eve, our infrastructure layer, which sits between applications and systems to provide a central point for observability, access, governance, and authentication
  • Adam, our AI product, which lets users and models interact with the underlying infrastructure through secure, policy-aware interfaces

Those products look different from the outside, but under the hood they share a very important requirement: we need to understand what is happening inside a system at a very fine level of detail, and we need to do that without slowing the system down. As AI workloads became more central, that requirement only got sharper. More requests were flowing through shared infrastructure layers. More decisions needed to be visible and attributable. More customers expected us to explain not just whether a system worked, but exactly what it did under load and under policy.

Why AI Infrastructure Changes the Bar

One of the core lessons we have learned is that products for humans and products for models are not built the same way. Human software is heavily shaped by context limits. People get overwhelmed. Good product design for humans is often about hiding detail, reducing noise, pacing information, and presenting only the right abstraction at the right moment. Great UI does a lot of synthesis for the user.

Model-facing software is different. Models generally do not want the same abstractions humans want. They want raw structure, direct interfaces, the real API, the real schema, the real permissions, and the real output. The challenge is not that they need more hand-holding. The challenge is that they need clean access to a lot of information without bloating context or breaking security boundaries. That pushed our thinking in a very specific direction. We did not want to build a business out of stacking translation layers on top of translation layers, because in our experience that kind of architecture adds token bloat, latency, ambiguity around permissions, and more opportunities for the system to drift away from the truth of the underlying infrastructure.

Instead, we wanted to expose systems as directly as possible while preserving the things enterprises actually care about:

  • authentication
  • authorization
  • observability
  • auditability
  • performance

That sounds obvious, but it has deep consequences. If you believe the winning AI infrastructure products will expose lower-level, truer system interfaces, then your instrumentation cannot just be an afterthought. Every request matters. Every tool call matters. Every routed query matters. Every authorization decision matters. Every outbound action matters. You do not get to treat telemetry as a background convenience anymore. It becomes part of the product contract, because without it the proxy stops being explainable at exactly the moment it becomes most valuable.

The Moment Telemetry Became the Bottleneck

This is the point where fast-telemetry stops being an optimization project and starts being a product story. At Eden, we wanted to observe everything. Not in a vague, sampled, maybe-we-have-enough-signals sense. We wanted the option to trace, log, and emit metrics for every single request flowing through the system, even in very high-throughput paths. That is a much higher standard than "do we have a dashboard." When you are building infrastructure that sits in the path between applications, databases, APIs, and now AI systems, the value of the platform depends on visibility. If a customer wants to know who touched a system, when it happened, what request path it took, which credentials were used, what failed, what slowed down, or what changed after a migration, the answer cannot be "we sampled that away because the telemetry layer could not keep up."

That is where we hit the wall. The immediate problem was not abstract. We were trying to build a proxy that could remain useful on extremely hot paths, and if a request path needs to stay in the microsecond range while sustaining millions of operations per second, there is not much room for "small" telemetry costs. A few extra atomics, a few extra lookups, or one more piece of shared coordination on every operation can be enough to move the whole system in the wrong direction. In production-style workloads inside Eden, telemetry itself started showing up in profiles. Shared atomics were contending across cores. Per-record attribute handling was adding cost. Export-related work was no longer invisible. The act of measuring the system was beginning to tax the system.

Internally, one way we described the problem was this: we needed to go from handling roughly hundreds of millions of metric events to tens of billions on the same class of system budget, because anything less meant observability itself became the limiting factor. In one representative internal framing, that meant moving from about 200 million metrics to about 20 billion metrics on the same system footprint. That was not a nice-to-have improvement. It was the difference between "full-fidelity observability is feasible" and "observability will throttle the product," and for us that was unacceptable.

Our identity as a company is tied to infrastructure performance. If we are going to sit in critical paths, if we are going to claim to provide enterprise-grade infrastructure, and if we are going to help companies adopt AI-native systems safely, then we cannot be casual about the cost of our own instrumentation.

Once telemetry starts distorting the workload it is supposed to measure, you do not just have a metrics problem. You have a product problem.

Why the Existing Approach Was Not Enough

To be clear, this is not an argument that the OpenTelemetry ecosystem is bad. It is not.

OpenTelemetry is broad, standardized, widely adopted, and often the right answer. If you need ecosystem compatibility, portability, and a general-purpose instrumentation stack, it is extremely valuable.

But those strengths come with tradeoffs.

The moment you are recording on very hot write paths under heavy concurrency, generality starts costing real CPU time. Shared state becomes expensive. Labels become expensive. Indirection becomes expensive. Export coordination becomes expensive.

The worst offender for us was cross-core contention.

If multiple threads are frequently updating the same shared atomic, the cache line holding that value keeps bouncing between cores. That creates coherence traffic and serializes work that ought to be parallel. On low-frequency code paths, you may never notice. On million-ops-per-second infrastructure paths, you absolutely do.

The same thing happens conceptually with label handling. If you are constructing or resolving dimensional series on the hot path over and over again, the overhead compounds quickly. Dynamic series are incredibly useful, but if you treat every write like a fresh general-purpose lookup, you can pay for that flexibility every single time.

So the question became:

What would telemetry look like if we optimized for the write path first, assumed contention was real, and narrowed the design to the workloads we actually cared about?

That question became fast-telemetry.

What We Built

fast-telemetry covers the pieces we needed most often:

  • counters
  • gauges
  • histograms and distributions
  • runtime-labeled dynamic metrics
  • enum-based labeled metrics
  • lightweight spans
  • derive macros
  • exporters for Prometheus, DogStatsD, and OTLP

The design is intentionally narrow in spirit, even if the feature set is broad enough to be useful across a lot of services.

This is not an attempt to recreate the full OpenTelemetry ecosystem under a different name. It is not trying to be the universal answer for every telemetry use case. It is a performance-first library for systems where the write path matters enough that telemetry cost becomes visible in real workloads.

The Core Design Choice: Optimize Writes, Pay on Read

The main idea behind fast-telemetry is that synchronization should not happen on every record if you can avoid it.

Instead of routing every write through one shared hot counter, we shard metrics by thread and keep those shards cache-line aware. The common path becomes close to thread-local, which dramatically reduces cross-core interference. When it is time to export, we aggregate across shards on read.

That tradeoff matched our workloads well.

The write path is hot. Export is comparatively infrequent. So we decided to make writes extremely cheap and let reads do the aggregation work.

That shows up in a few concrete design choices:

  • Thread-sharded metrics instead of a single shared hot value
  • Cache-line padding to reduce false sharing
  • Aggregate on read instead of synchronizing on every write
  • Reusable handles for dynamic label sets so repeated hot-path writes do not have to rediscover the same series every time
  • Enum-based labeled metrics so known label sets can use direct indexing rather than more general lookup structures

The same philosophy applies to spans. We wanted lightweight span collection that could work well in high-throughput paths without dragging a large amount of overhead into the request lifecycle.

The result is a crate that is deliberately boring in the best possible way: record locally, minimize coordination, export later.

How We Benchmarked It

We used two benchmark families:

  • Criterion microbenchmarks for focused recording, export, overlap, and span-serialization paths
  • A threaded harness for contention-heavy cache and span scenarios under sustained multi-threaded load

The local run linked above was captured on a 10-core Apple M1 Pro machine.

The Criterion suite is useful when you want to isolate individual categories of cost:

  • single-thread hot-path recording
  • exporter cost for Prometheus, DogStatsD, and OTLP
  • dynamic-cardinality export scaling
  • first-touch dynamic series insertion and overflow behavior
  • concurrent write-plus-export overlap
  • span OTLP drain, build, encode, and gzip work

The threaded harness is where the story gets especially interesting for us, because it stresses exactly the behavior we cared about most: what happens when the telemetry layer sits on a contested multi-threaded path and has to keep up.

What Stood Out in the Results

The broad pattern was consistent: the more the comparison stressed shared write paths, label-heavy workflows, and contention, the larger the gap became. That mattered to us because those are exactly the conditions where proxy-layer telemetry gets dangerous. If the overhead only disappeared in toy cases, the design would not have solved the actual problem.

Representative results from the harness run:

  • labeled_counter:churn: about 2364x faster
  • dynamic_counter:uniform: about 1743x faster
  • pipeline span scenario: about 324x faster
  • root span scenario: about 253x faster
  • dynamic_gauge_i64:uniform: about 9x faster

Those numbers are large, but the important takeaway is not just the headline multiplier. The important takeaway is that the shape of the results matched the problem we were trying to solve. Counter-heavy and label-heavy paths benefited the most, because those were exactly the places where contention and hot-path overhead were hurting us. The dynamic and labeled families were especially strong because that is where repeated resolution and shared write coordination can become expensive very quickly in generic stacks.

At the same time, not every path was equally dramatic, and that is a good thing. dynamic_gauge_i64 improved, but by much less than the counter-heavy cases. Histogram-style work also narrows the gap because both sides are doing more real work per record. That is actually reassuring. It is a sign that the benchmark is not just shouting "everything is infinitely faster." Different telemetry paths have different cost profiles. Also, to keep our marketing department humble, the "worst" representative result was only about 9x faster. If your disappointing case is that you merely improved something by nine times, the data is probably still telling you that you found the right bottleneck.

That is why we think the right question is not:

"Is fast-telemetry always faster in every case?"

The right question is:

"Does telemetry overhead matter in your hot path?"

If the answer is no, the broader OpenTelemetry ecosystem may still be the better choice.

If the answer is yes, especially in high-throughput, contention-heavy, label-heavy Rust services, then a narrower design can make a very big difference.

Where We Think It Fits

We think fast-telemetry is a strong fit for:

  • teams running high-volume Rust services
  • systems where metrics or spans already show up in CPU profiles
  • engineers who want cheaper instrumentation on critical paths
  • workloads with known label sets or reusable dynamic series handles
  • infrastructure products that need fine-grained observability without paying for it on every request

We do not think it replaces the broader OpenTelemetry ecosystem in every system.

If ecosystem breadth, standardized integrations, and vendor portability matter more than raw write-path cost, the OpenTelemetry toolchain is still a great answer. In some systems, the right answer may be a mix: use broader OpenTelemetry tooling where flexibility matters, and use fast-telemetry where the hot path is too expensive to ignore.

Why We Open-Sourced It

We built fast-telemetry because we needed a telemetry library that matched the performance constraints we were actually seeing, not the ones we wished we had.

Once the implementation matured, it started to feel like something that should exist outside Eden.

The underlying problem is not unique to us. More teams are building infrastructure for AI workloads. More teams are putting policy, routing, observability, and access control in the path of very high-throughput systems. More teams want richer telemetry without turning telemetry into the dominant cost center on their write path.

We think that trend will continue.

As more software gets built not just for humans but also for models, the underlying infrastructure requirements get stricter, not looser. You need better security, better provenance, better observability, and better raw performance all at once. The systems that win will not just have the nicest interface. They will have the strongest low-level foundations.

fast-telemetry is one small piece of that foundation.

We are especially interested in feedback from teams dealing with:

  • high label cardinality
  • exporter-heavy environments
  • real-world span pipelines
  • cases where OpenTelemetry is already fast enough
  • cases where full-fidelity telemetry still feels too expensive

Try It

fast-telemetry is our first open-source release from Eden in this part of the stack, and it is built around one idea: telemetry should measure your system, not become one of its bottlenecks.

You can find it here:

If you are profiling Rust services and seeing telemetry overhead, we would love to compare notes.