Why It Is Fast

Introduction

The framework's internal message transport layer uses the Aeron + SBE combination: true zero-copy, zero-loopback, zero-reflection, zero-GC, zero runtime parsing, and near-zero encoding/decoding overhead. Through a Lock-Free design and a Zero-GC strategy, it can achieve sub-microsecond or even nanosecond-level end-to-end latency. Especially in IPC, performance can nearly reach the hardware limit.

Shared Memory (Aeron IPC)

Aeron's unique IPC mechanism is based on shared memory and lock-free design, fundamentally eliminating copy and context-switch overhead during inter-process transfer.

Difference between Aeron IPC and Netty Zero-Copy

Feature	Netty system-level zero-copy	Aeron application-level zero-copy (IPC)
Transport medium	Kernel buffers, network protocol stack	Shared-memory files (mapped files)
Data flow	Reduces copies between user space and kernel space, but still involves the kernel network stack	Completely bypasses kernel, network stack, and copies between user spaces
Zero-copy definition	Reduces CPU copies during system calls	Cross-process, fully eliminates data copying

Aeron allocates a memory region (Log Buffer) at the underlying layer so different processes can point to the same memory. By mapping the same memory into multiple processes, data transfer between processes does not require copying through kernel, network stack, or user space. This is a more complete, higher-level zero-copy model.

The diagram below shows two processes pointing to the same memory. This memory is a reusable ring buffer (composed of Term Buffers) to avoid GC. Memory reuse is a key reason for Aeron's high performance.

performance_zero

The ring buffer completely avoids frequent memory allocation/release during data transfer, greatly reducing JVM GC pressure and pauses, and ensuring very low latency.

Advantages of this design:

Latency: extremely low. It avoids network stack and context-switch overhead, usually at the nanosecond/microsecond level.
Transfer model: because multiple processes map the same memory, data transfer between processes achieves zero-copy transmission.
Overhead: fundamentally eliminates data-copy and context-switch overhead in inter-process transfer.

SBE (Simple Binary Encoding)

Aeron and SBE are often used together as a "best combination" to provide an end-to-end optimal solution for high-throughput, low-latency applications.

SBE is a message encoding/decoding standard for high-performance financial and trading applications. Its core goal is to achieve extreme CPU efficiency while keeping very low latency. SBE is designed entirely for machine efficiency, not human readability. It discards the unnecessary overhead common in XML, JSON, Google Protobuf, Thrift, and similar formats.

Key characteristics:

Extreme CPU efficiency: SBE encoding/decoding is almost zero CPU overhead. It uses a mechanism called "On-the-wire Encoding/Decoding".
Fixed memory layout: field sizes are fixed, and message structure plus field offsets are known at compile time, so encoding/decoding overhead is nearly zero.
Zero GC: minimal memory overhead, no temporary objects, and no new Java object creation during encode/decode.

SBE is extremely fast, with encode/decode latency at the nanosecond level. Thanks to fixed memory layout, performance is often several times faster than runtime-parsing frameworks such as Google Protobuf. On a single core, SBE can easily process tens of millions of messages per second, demonstrating extreme CPU efficiency.

Time Conversion and Latency Metrics

To measure performance clearly, first review time units.

Unit	Relation	Description	Conversion
Nanosecond (ns)	10^-9 second	One billionth of a second	1 second (s) = 1,000,000,000 nanoseconds (ns)
Microsecond (us)	10^-6 second	One millionth of a second	1 microsecond (us) = 1,000 nanoseconds (ns)
Millisecond (ms)	10^-3 second	One thousandth of a second	1 millisecond (ms) = 1,000 microseconds (us)
Second (s)			1 second (s) = 1,000 milliseconds (ms)

Aeron latency metrics

Intra-process/inter-process communication (IPC):
- Latency: nanosecond-level latency (~100 ns level).
- Mechanism: lock-free, shared-memory ring buffers; fully avoids kernel mode switches and lock contention, which is key to ultra-low latency.
LAN communication (UDP):
- Latency: microsecond-level latency (~10 - 30 us level).
- Mechanism: application-layer optimized UDP protocol with reliability and ordering guarantees, significantly better than traditional kernel TCP/IP stack.

Comparison: baseline latency of traditional TCP/IP communication is usually at the millisecond (ms) level. Aeron's LAN latency advantage is an order-of-magnitude difference.

Key Differences Between Aeron UDP and Traditional TCP/IP

The design philosophies of Aeron UDP and traditional kernel TCP are fundamentally different.

Traditional TCP/IP aims to provide a general-purpose, reliable, fair WAN transport layer, but sacrifices latency stability and predictability.
Aeron is focused on low latency, high throughput, and predictable latency message-stream transport. It moves complex transport logic to the application layer, and bypasses kernel TCP/IP bottlenecks through shared memory (IPC) and optimized UDP (application-layer reliability).
With Aeron, developers gain fine-grained control over latency and transport behavior, enabling extremely low and highly predictable communication latency.

Feature	Aeron (UDP/Application Layer)	Traditional TCP/IP (Kernel/Transport Layer)
Reliability mechanism	NACK mechanism: retransmit only when receiver detects missing data. High efficiency, low traffic.	ACK mechanism: sender waits for acknowledgments, increasing latency and network traffic.
Ordering guarantee	Handled in application layer: out-of-order receive, ordered reassembly at receiver log.	Guaranteed in kernel layer: all packets must be delivered in order.
HoL blocking	Eliminated (No Head-of-Line Blocking). Even with packet loss, subsequent data can still be received and buffered, without breaking pipeline processing.	Exists (TCP). Packet loss stalls subsequent packets in kernel buffer until retransmission completes, significantly increasing P99 latency.
Backpressure/flow control	Application-layer flow control (e.g., Term Offset/Position), fast, fine-grained, responsive.	Kernel sliding-window/flow control limited by OS scheduling, relatively slower response.

Summary

Two key points in network programming performance are data transfer and encoding/decoding. With Aeron + SBE, both bottlenecks are addressed:

Aeron handles efficient message transport: ultra-low transfer latency, zero-copy in IPC, and network transport superior to traditional TCP/UDP.
SBE handles efficient encoding/decoding: ultra-low CPU usage, with zero-copy and zero-parse encode/decode process.

Final benefits

SBE = extreme CPU efficiency, pushing message processing speed to hardware limits.
Aeron + SBE = end-to-end performance guarantee.
Zero GC / Zero Copy: SBE operates directly on buffers, Aeron transmits in shared memory, jointly achieving the high-performance goal of zero memory copy and zero garbage collection.

Introduction​

Shared Memory (Aeron IPC)​

SBE (Simple Binary Encoding)​

Time Conversion and Latency Metrics​

Key Differences Between Aeron UDP and Traditional TCP/IP​

Summary​