Skip to main content

Why It Is Fast

Introduction

The framework's internal message transport layer uses the Aeron + SBE combination: true zero-copy, zero-loopback, zero-reflection, zero-GC, zero runtime parsing, and near-zero encoding/decoding overhead. Through a Lock-Free design and a Zero-GC strategy, it can achieve sub-microsecond or even nanosecond-level end-to-end latency. Especially in IPC, performance can nearly reach the hardware limit.

Shared Memory (Aeron IPC)

Aeron's unique IPC mechanism is based on shared memory and lock-free design, fundamentally eliminating copy and context-switch overhead during inter-process transfer.

Difference between Aeron IPC and Netty Zero-Copy

FeatureNetty system-level zero-copyAeron application-level zero-copy (IPC)
Transport mediumKernel buffers, network protocol stackShared-memory files (mapped files)
Data flowReduces copies between user space and kernel space, but still involves the kernel network stackCompletely bypasses kernel, network stack, and copies between user spaces
Zero-copy definitionReduces CPU copies during system callsCross-process, fully eliminates data copying

Aeron allocates a memory region (Log Buffer) at the underlying layer so different processes can point to the same memory. By mapping the same memory into multiple processes, data transfer between processes does not require copying through kernel, network stack, or user space. This is a more complete, higher-level zero-copy model.

The diagram below shows two processes pointing to the same memory. This memory is a reusable ring buffer (composed of Term Buffers) to avoid GC. Memory reuse is a key reason for Aeron's high performance.

performance_zero

The ring buffer completely avoids frequent memory allocation/release during data transfer, greatly reducing JVM GC pressure and pauses, and ensuring very low latency.

Advantages of this design:

  1. Latency: extremely low. It avoids network stack and context-switch overhead, usually at the nanosecond/microsecond level.
  2. Transfer model: because multiple processes map the same memory, data transfer between processes achieves zero-copy transmission.
  3. Overhead: fundamentally eliminates data-copy and context-switch overhead in inter-process transfer.

SBE (Simple Binary Encoding)

Aeron and SBE are often used together as a "best combination" to provide an end-to-end optimal solution for high-throughput, low-latency applications.

SBE is a message encoding/decoding standard for high-performance financial and trading applications. Its core goal is to achieve extreme CPU efficiency while keeping very low latency. SBE is designed entirely for machine efficiency, not human readability. It discards the unnecessary overhead common in XML, JSON, Google Protobuf, Thrift, and similar formats.


Key characteristics:

  1. Extreme CPU efficiency: SBE encoding/decoding is almost zero CPU overhead. It uses a mechanism called "On-the-wire Encoding/Decoding".
  2. Fixed memory layout: field sizes are fixed, and message structure plus field offsets are known at compile time, so encoding/decoding overhead is nearly zero.
  3. Zero GC: minimal memory overhead, no temporary objects, and no new Java object creation during encode/decode.

SBE is extremely fast, with encode/decode latency at the nanosecond level. Thanks to fixed memory layout, performance is often several times faster than runtime-parsing frameworks such as Google Protobuf. On a single core, SBE can easily process tens of millions of messages per second, demonstrating extreme CPU efficiency.

Time Conversion and Latency Metrics

To measure performance clearly, first review time units.

UnitRelationDescriptionConversion
Nanosecond (ns)10-9 secondOne billionth of a second1 second (s) = 1,000,000,000 nanoseconds (ns)
Microsecond (us)10-6 secondOne millionth of a second1 microsecond (us) = 1,000 nanoseconds (ns)
Millisecond (ms)10-3 secondOne thousandth of a second1 millisecond (ms) = 1,000 microseconds (us)
Second (s)1 second (s) = 1,000 milliseconds (ms)

Aeron latency metrics

  1. Intra-process/inter-process communication (IPC):

    • Latency: nanosecond-level latency (~100 ns level).
    • Mechanism: lock-free, shared-memory ring buffers; fully avoids kernel mode switches and lock contention, which is key to ultra-low latency.
  2. LAN communication (UDP):

    • Latency: microsecond-level latency (~10 - 30 us level).
    • Mechanism: application-layer optimized UDP protocol with reliability and ordering guarantees, significantly better than traditional kernel TCP/IP stack.

Comparison: baseline latency of traditional TCP/IP communication is usually at the millisecond (ms) level. Aeron's LAN latency advantage is an order-of-magnitude difference.

Key Differences Between Aeron UDP and Traditional TCP/IP

The design philosophies of Aeron UDP and traditional kernel TCP are fundamentally different.

  • Traditional TCP/IP aims to provide a general-purpose, reliable, fair WAN transport layer, but sacrifices latency stability and predictability.
  • Aeron is focused on low latency, high throughput, and predictable latency message-stream transport. It moves complex transport logic to the application layer, and bypasses kernel TCP/IP bottlenecks through shared memory (IPC) and optimized UDP (application-layer reliability).
  • With Aeron, developers gain fine-grained control over latency and transport behavior, enabling extremely low and highly predictable communication latency.
FeatureAeron (UDP/Application Layer)Traditional TCP/IP (Kernel/Transport Layer)
Reliability mechanismNACK mechanism: retransmit only when receiver detects missing data. High efficiency, low traffic.ACK mechanism: sender waits for acknowledgments, increasing latency and network traffic.
Ordering guaranteeHandled in application layer: out-of-order receive, ordered reassembly at receiver log.Guaranteed in kernel layer: all packets must be delivered in order.
HoL blockingEliminated (No Head-of-Line Blocking). Even with packet loss, subsequent data can still be received and buffered, without breaking pipeline processing.Exists (TCP). Packet loss stalls subsequent packets in kernel buffer until retransmission completes, significantly increasing P99 latency.
Backpressure/flow controlApplication-layer flow control (e.g., Term Offset/Position), fast, fine-grained, responsive.Kernel sliding-window/flow control limited by OS scheduling, relatively slower response.

Summary

Two key points in network programming performance are data transfer and encoding/decoding. With Aeron + SBE, both bottlenecks are addressed:

  1. Aeron handles efficient message transport: ultra-low transfer latency, zero-copy in IPC, and network transport superior to traditional TCP/UDP.
  2. SBE handles efficient encoding/decoding: ultra-low CPU usage, with zero-copy and zero-parse encode/decode process.

Final benefits

  • SBE = extreme CPU efficiency, pushing message processing speed to hardware limits.
  • Aeron + SBE = end-to-end performance guarantee.
  • Zero GC / Zero Copy: SBE operates directly on buffers, Aeron transmits in shared memory, jointly achieving the high-performance goal of zero memory copy and zero garbage collection.