Performance & Benchmarks
Rigatoni is designed for high-performance data replication workloads. This page provides comprehensive benchmark results and performance characteristics to help you understand the system’s capabilities and optimize your deployments.
Table of contents
- Executive Summary
- Quick Recommendations
- Batch Processing Performance
- Serialization & Format Performance
- Concurrency & Throughput
- Memory & State Management
- Advanced Processing Patterns
- Production Configuration Recommendations
- Performance Optimization Checklist
- Known Performance Characteristics
- Running Benchmarks Yourself
- Benchmark Environment
- Performance Metrics Summary
- Conclusion
Executive Summary
Rigatoni demonstrates excellent performance characteristics across all workload types:
- ~780ns per event for core processing (linear scaling up to 10K events)
- ~1.2μs per event for JSON serialization
- 7.65ms to write 1000 events to S3 with ZSTD compression
- ~450ns per operation for in-memory state store
- ~2ns per event for operation filtering (near-zero cost)
Quick Recommendations
Based on comprehensive benchmarking, we recommend:
// Optimal configuration for production
Pipeline::builder()
.batch_size(500) // Sweet spot for latency/throughput
.batch_timeout(50) // milliseconds
.max_concurrent_writes(3) // Optimal S3 concurrency
.build()
S3Config::builder()
.serialization_format(SerializationFormat::Json)
.compression(Compression::Zstd) // 14% faster than GZIP
.build()
Batch Processing Performance
Optimal Batch Size
Batch size has minimal impact on latency across a wide range:
| Batch Size | Latency | Overhead vs. Smallest |
|---|---|---|
| 10 events | 19.07 ms | baseline |
| 50 events | 19.19 ms | +0.6% |
| 100 events | 20.00 ms | +4.9% |
| 500 events | 20.44 ms | +7.2% |
| 1000 events | 20.68 ms | +8.4% |
| 2000 events | 20.25 ms | +6.2% |
Key Insight: Batch sizes from 100-2000 events show minimal latency differences (< 10% overhead). Choose 500 events as a balanced default for most workloads.
Core Processing (No I/O)
Pure event processing demonstrates excellent linear scaling:
| Events | Time | Per-Event Latency |
|---|---|---|
| 10 | 7.71 μs | 771 ns |
| 100 | 78.84 μs | 788 ns |
| 1000 | 788.82 μs | 789 ns |
| 5000 | 4.11 ms | 822 ns |
Excellent: Near-perfect linear scaling at ~780-790ns per event up to 1000 events. Slight degradation at 5000 events is likely due to cache effects.
Serialization & Format Performance
Event Serialization (JSON)
| Events | Time | Throughput |
|---|---|---|
| 10 | 12.07 μs | 828K events/sec |
| 100 | 123.21 μs | 812K events/sec |
| 1000 | 1.27 ms | 788K events/sec |
Consistent: ~1.2μs per event serialization time across all batch sizes.
S3 Format Comparison (1000 events)
| Format | Time | vs. Best | Recommended Use Case |
|---|---|---|---|
| JSON + ZSTD | 7.58 ms | baseline | ✅ Production (best compression/speed) |
| JSON + GZIP | 8.77 ms | +16% | ✅ Compatibility with legacy systems |
| JSON (uncompressed) | 11.79 ms | +56% | ⚠️ Development/testing only |
| Parquet | 12.36 ms | +63% | ⚠️ Analytics workloads (columnar) |
| Avro | 11.46 ms | +51% | ⚠️ Schema evolution requirements |
Recommendation: Use JSON + ZSTD for production. It’s 33% faster than uncompressed JSON and 14% faster than GZIP.
Compression Benefits by Batch Size
| Format | 10 events | 100 events | 1000 events |
|---|---|---|---|
| JSON (none) | 2.88 ms | 4.06 ms | 11.45 ms |
| JSON + GZIP | 3.37 ms | 3.81 ms | 8.63 ms |
| JSON + ZSTD | 3.71 ms | 3.81 ms | 7.65 ms ⭐ |
| Parquet | 3.81 ms | 4.05 ms | 12.38 ms |
Insight: Compression overhead is negligible for small batches, but provides significant benefits at 100+ events. ZSTD consistently outperforms other formats at scale.
Concurrency & Throughput
Concurrent S3 Writes (1000 events each)
| Concurrency | Time | Throughput | Efficiency |
|---|---|---|---|
| 2 concurrent | 5.06 ms | ~395 events/ms | 99% |
| 4 concurrent | 8.36 ms | ~478 events/ms | 61% |
| 8 concurrent | 15.16 ms | ~528 events/ms | 33% |
Analysis:
- 2 concurrent writes show excellent efficiency (99% - near-linear scaling)
- Diminishing returns beyond 4 concurrent writes
- Recommendation: Use concurrency level 2-4 for S3 destinations
Large Batch Performance
| Events | Time | Events/ms |
|---|---|---|
| 5000 | 32.54 ms | 153.6 |
| 10000 | 61.07 ms | 163.7 |
Good scaling: Slight improvement in per-event throughput at larger batches, though diminishing returns suggest 1000-2000 event batches are optimal.
Memory & State Management
Memory State Store Performance
| Operations | Time | Per-Operation |
|---|---|---|
| 10 | 4.87 μs | 487 ns |
| 100 | 45.43 μs | 454 ns |
| 1000 | 455.40 μs | 455 ns |
Excellent: Consistent ~450ns per operation. In-memory state store is blazingly fast for single-instance deployments.
Event Cloning (Memory Overhead)
| Events | Time | Per-Event |
|---|---|---|
| 10 | 7.42 μs | 742 ns |
| 100 | 75.0 μs | 750 ns |
| 1000 | 756 μs | 756 ns |
Very efficient: ~750ns per event clone. Minimal overhead for Arc/clone operations in the async runtime.
Batch Deduplication
| Events | Time | Overhead vs. Creation |
|---|---|---|
| 100 | 136.86 μs | +30% |
| 1000 | 1.40 ms | +32% |
Acceptable: Deduplication adds ~30% overhead, consistent across batch sizes. Worth the cost for exactly-once semantics.
Advanced Processing Patterns
Batch by Collection (Grouping)
| Events | Time | Overhead vs. Simple Creation |
|---|---|---|
| 100 | 119.91 μs | +14% |
| 1000 | 1.18 ms | +12% |
| 10000 | 15.13 ms | +28% |
Efficient: Collection-based batching adds minimal overhead for typical workloads (< 1000 events).
Filter by Operation Type
| Events | Time | Per-Event |
|---|---|---|
| 100 | 0.33 μs | 3.3 ns |
| 1000 | 1.74 μs | 1.7 ns |
| 10000 | 25.97 μs | 2.6 ns |
Outstanding: Operation filtering is nearly zero-cost (~2ns per event). Use filters liberally without performance concerns.
Time-Based Batching
| Timeout | Actual Time | Accuracy |
|---|---|---|
| 10ms | 1192.51 ms | ✅ Within 0.6% |
| 50ms | 5195.94 ms | ✅ Within 0.9% |
| 100ms | 10196.18 ms | ✅ Within 0.4% |
Accurate: Time-based batching respects timeouts precisely, making it reliable for latency-sensitive workloads.
Production Configuration Recommendations
Standard Workload (1K-10K events/sec)
PipelineConfig::builder()
.batch_size(500) // Optimal batch size
.batch_timeout_ms(50) // 50ms max latency
.max_retries(3) // Standard retry count
.build()
S3Config::builder()
.compression(Compression::Zstd) // Best performance
.serialization_format(SerializationFormat::Json)
.build()
High-Throughput Workload (10K-100K events/sec)
PipelineConfig::builder()
.batch_size(1000) // Larger batches for throughput
.batch_timeout_ms(100) // Accept higher latency
.max_retries(5) // More retries for stability
.build()
S3Config::builder()
.compression(Compression::Zstd) // Best compression ratio
.max_concurrent_writes(3) // Parallel S3 writes
.build()
Low-Latency Workload (< 10ms p99)
PipelineConfig::builder()
.batch_size(100) // Small batches
.batch_timeout_ms(10) // Aggressive timeout
.max_retries(2) // Fast fail
.build()
S3Config::builder()
.compression(Compression::None) // Skip compression for speed
.serialization_format(SerializationFormat::Json)
.build()
Performance Optimization Checklist
Use this checklist to optimize your Rigatoni deployment:
- ✅ Batch size: Set to 100-500 for balanced latency/throughput
- ✅ Compression: Use ZSTD for production (14% faster than GZIP)
- ✅ Concurrency: Configure 2-4 concurrent S3 writes
- ✅ State store: Use in-memory for single-instance (< 500μs for 1K ops)
- ✅ State store: Use Redis for multi-instance deployments
- ✅ Filtering: Apply operation filters liberally (~2ns overhead)
- ✅ Monitoring: Enable Prometheus metrics to track performance
- ✅ Batch timeout: Set based on latency requirements (50-100ms typical)
Known Performance Characteristics
Areas for Potential Optimization
- Large Batch Scaling (10K+ events)
- Current: 61ms for 10K events (163 events/ms)
- Minor degradation at 10K+ suggests cache pressure
- Consider: Batch splitting for very large collections
- Parquet Performance
- Current: 63% slower than JSON+ZSTD
- May be acceptable trade-off for analytics use cases
- Consider: Investigate Parquet encoder tuning
- Concurrent Write Efficiency
- Efficiency drops significantly beyond 4 concurrent writes
- Consider: Implementing adaptive concurrency based on load
- Memory Patterns
- Memory pattern benchmarks show 2.6ms for 1K events
- 3.3x slower than basic batch processing
- Consider: Allocation patterns and potential object pooling
Running Benchmarks Yourself
To run benchmarks on your hardware:
# Run all benchmarks
cargo bench --package rigatoni-benches
# Run specific benchmark suite
cargo bench --package rigatoni-benches --bench batch_processing
cargo bench --package rigatoni-benches --bench s3_destination
cargo bench --package rigatoni-benches --bench pipeline_throughput
# View results
open target/criterion/report/index.html
Benchmark reports are generated in target/criterion/ with detailed HTML reports including:
- Performance graphs
- Statistical analysis (mean, median, std dev)
- Comparison with previous runs
- Regression detection
Benchmark Environment
All benchmarks were run on:
- CPU: Apple M-series (ARM64)
- OS: macOS
- Rust: 1.88+
- Build: Release mode with optimizations
- S3: LocalStack for S3 benchmarks (eliminating network variability)
Results may vary based on your hardware, but relative performance characteristics should remain consistent.
Performance Metrics Summary
| Metric | Value | Rating |
|---|---|---|
| Core Processing | ~780ns/event | ⭐⭐⭐⭐⭐ Excellent |
| Serialization | ~1.2μs/event | ⭐⭐⭐⭐⭐ Excellent |
| State Store (Memory) | ~450ns/op | ⭐⭐⭐⭐⭐ Excellent |
| S3 Write (ZSTD) | 7.65ms/1K events | ⭐⭐⭐⭐ Very Good |
| Filtering | ~2ns/event | ⭐⭐⭐⭐⭐ Outstanding |
| Event Cloning | ~750ns/event | ⭐⭐⭐⭐⭐ Excellent |
| Deduplication | +30% overhead | ⭐⭐⭐⭐ Good |
| Concurrent Scaling | 99% @ 2x, 61% @ 4x | ⭐⭐⭐⭐ Good |
Conclusion
Rigatoni demonstrates production-ready performance across all benchmark categories:
- ✅ Sub-microsecond per-event processing times
- ✅ Linear scaling up to 1000 events with minimal overhead
- ✅ Efficient compression with ZSTD providing best results
- ✅ Good concurrency scaling for parallel S3 writes
- ✅ Fast state management for reliable resume token tracking
These benchmarks validate that Rigatoni can handle high-throughput CDC workloads (10K-100K events/sec) with predictable latency characteristics.
For production deployments, see the Production Deployment Guide for complete configuration examples.