Splitter and Aggregator: Handling Bulk Messages in Enterprise Integration

The Splitter and Aggregator patterns are the backbone of any integration that processes batches, fan-outs, or parallel calls. Learn how they work together and how Apache Camel implements them.

The Problem: One Message, Many Parts

Many real-world integration scenarios involve messages that contain multiple items — a batch invoice with 200 line items, a purchase order with multiple products, or a logistics manifest covering 50 shipments. Processing these as a single unit is inflexible. You want to process each item independently, potentially in parallel, and then collect the results back into a single response.

This is exactly what the Splitter and Aggregator patterns solve — and they almost always appear together.

The Splitter Pattern

The Splitter takes a single message and breaks it into a sequence of smaller messages, each sent independently downstream. The original message is typically a collection (a JSON array, an XML element with repeated children, a CSV file).

from("file:input?noop=true")
  .unmarshal().csv()           // parse CSV into List<List<String>>
  .split(body())               // one exchange per row
    .parallelProcessing()      // process rows concurrently
    .to("bean:rowProcessor");

Apache Camel's split DSL supports splitting by:

Collection or array body
XPath expression (for XML)
JSONPath expression (for JSON)
Tokenizer (for newline-delimited records)
Custom Expression implementation

The Aggregator Pattern

The Aggregator collects the results of the split messages and combines them back into a single message. It needs to know two things: which messages belong together (the correlation expression), and when to stop collecting (the completion condition).

from("direct:split-results")
  .aggregate(header("batchId"), new GroupedBodyAggregationStrategy())
    .completionSize(200)          // complete when 200 messages received
    .completionTimeout(30000)     // or after 30s, whichever comes first
  .to("bean:batchResultHandler");

Completion conditions can be:

Size-based — complete after N messages
Timeout-based — complete after T milliseconds of inactivity
Predicate-based — complete when a message with a specific header arrives (e.g. a "last" flag)
Combined — first condition to fire wins

The Scatter-Gather Pattern

A related pattern is Scatter-Gather: send the same message to multiple recipients in parallel and aggregate the responses. This is common in pricing engines (get quotes from multiple suppliers simultaneously) and data enrichment (call multiple APIs to build a complete record).

from("direct:get-product-price")
  .recipientList(constant("direct:supplier-a,direct:supplier-b,direct:supplier-c"))
    .parallelProcessing()
    .aggregationStrategy(new LowestPriceAggregationStrategy())
    .timeout(5000);              // fail fast if suppliers are slow

Persistent Aggregation State

For long-running aggregations (batches that span minutes or hours), the aggregator must persist its state so that a broker restart does not lose partially collected results. Camel supports pluggable AggregationRepository implementations backed by JDBC, Hazelcast, Infinispan, or Redis.

JdbcAggregationRepository repo = new JdbcAggregationRepository(
  dataSource, "ORDER_AGGREGATION"
);

.aggregate(header("orderId"), new MyStrategy())
  .completionSize(50)
  .aggregationRepository(repo);

This combination — Splitter for fan-out, Aggregator with persistent state for fan-in — is the backbone of most high-throughput enterprise batch processing pipelines.