Tiered Storage is a mechanism that keeps hot data on the Kafka broker's local disk while offloading cold data to external object storage (S3/GCS/Azure Blob, etc.). This lets you scale compute (brokers) and storage independently, opening up new options for disaster recovery and cost optimization.
It is offered as Tiered Storage on Confluent Platform / Confluent Cloud, and is increasingly available in Apache Kafka itself via the Remote Log Storage API. On the CCAAK exam, you need to be able to explain that the basic produce/fetch semantics are unchanged, articulate the fundamentals of retention and segments, and accurately describe the operational benefits and constraints.
The primary goal of decoupling is to scale compute resources (CPU/memory/network) and storage capacity independently. Local disks are optimized for serving the hot set, while long-term retention is delegated to object storage. This eliminates the inefficiency of adding broker nodes just to expand disk capacity.
Tiered Storage does not change the write path or ISR/acks semantics. Producers continue to append to the broker's local log as before; after a segment rolls, it is progressively offloaded to external storage. When a consumer reads an older offset, the required portion is fetched back from external storage.
The core components are the broker's local log (active and recent segments), an offloader (which transfers segments to external storage), the external object store, and on-demand reads at fetch time. Offload runs asynchronously after a segment is closed and the local retention conditions are met.
Reads are served from local storage by default. When the requested range is not local, the broker fetches the relevant portion of the target segment from external storage and continues serving the read. This strikes a balance between low latency for the hot working set and reachability of historical data, even when it is cold.
Conceptual diagram of Kafka Tiered Storage
Segment size directly impacts throughput, offload frequency, and recovery time. Too large, and the unit of offload and fetch becomes heavy; too small, and metadata overhead explodes. A starting point of a few hundred MB to 1 GB is a reasonable default.
Think of local and global retention as two separate concepts. Local retention (the hot set) is a capacity plan for serving the most recent N hours/days of reads locally; global retention (object storage) is long-term storage for audit and analytics. Configuration keys vary by vendor and RLM implementation, but foundational keys such as log.segment.bytes and log.retention.ms/bytes remain universally important.
Internal topics (such as __consumer_offsets and __transaction_state) are typically excluded from tiering for low-latency and availability reasons. Limit tiered topics to business data, and set local retention periods based on your SLA.
Example server.properties (stable keys only; tiering enablement is vendor-specific)
# Foundational segment and retention settings (stable Kafka keys)
log.segment.bytes=1073741824 # 1 GiB
log.segment.ms=0 # Disable time-based roll (set if needed)
log.retention.ms=604800000 # 7 days (example)
log.retention.bytes=-1 # Example without a capacity cap
# Keep internal topics short or local-first (operational policy: exclude from tiering)
# Example: keep __consumer_offsets retention short
offsets.retention.minutes=10080 # 7 days (adjust to your requirements)
# Tiered storage enablement and external storage settings vary by product
# Check the official documentation for Confluent Platform/Cloud or your Apache Kafka RLM implementation
# Pseudo-example:
# tiered.storage.enable=true
# tiered.storage.bucket=<your-bucket>
# tiered.storage.region=<region>
# tiered.storage.credentials.provider=<provider>From a cost perspective, the standard play is to keep the local SSD capacity that supports the hot set lean and push long-term retention to object storage. Cost per GB drops significantly for infrequently accessed data, but you need to budget for request charges and bandwidth during restore and first-fetch scenarios.
Disaster recovery and scale-out become faster. When you add or replace a broker, most historical segments already live in external storage, so you only need to sync the minimum required active data and metadata. This also shortens partition reassignment time (though the exact gain depends on implementation and data volume).
Monitoring focuses on offload backlog, remote read latency, local cache hit rate, and external storage error rates / IAM errors. Establish a baseline for normal operation, then define SLOs that account for peaks during overnight batch loads and similar events.
Typical failures include external storage access errors due to insufficient permissions, latency degradation (slow first-fetch), and growing offload backlog. First verify credentials, bucket policies, and network paths, then inspect the broker's offloader threads/queues and timeout settings.
For CCAAK, make sure you can articulate that Tiered Storage is purely a storage tiering mechanism and does not alter the basic semantics of produce/fetch/ISR/acks/transactions. Be ready to pair the benefits (decoupled scaling, faster recovery, cost optimization) with the trade-offs (first-time remote read latency, dependence on external storage SLA/IAM).
Interactions with compaction and topic cleanup are implementation-dependent. On the exam, frame the support story as being defined by product documentation, and be sure to correctly distinguish design decisions like excluding internal topics from tiering, and the difference between local retention and global retention.
| Aspect | Local-Only | Tiered Storage | Caveats |
|---|---|---|---|
| Scaling strategy | Capacity tied to node count | Independent scaling of compute and capacity | Hot-set sizing is critical |
| Recovery / Reassignment | Heavy resync of all data | Past segments live externally, so it is lightweight | Depends on implementation and data distribution |
| Retention cost | Expensive (long-term retention on SSD/HDD) | Cheap (leverages object storage) | Account for request charges and bandwidth |
| Latency | Consistently low (local) | First-time remote reads may be higher latency | Mitigate with caching and prefetching |
| Semantics | Standard Kafka | Standard Kafka (unchanged) | acks/ISR/EOS are preserved |
| Internal topics | Stable when local | Typically excluded | Prioritize SLA and availability |
CCAAK
問題 1
Which of the following is the most accurate statement about Kafka Tiered Storage (Confluent or RLM implementations)?
正解: A
Tiered Storage does not change write semantics (acks/ISR/EOS); it offloads closed segments to external storage. Reads are normally served locally, with the broker fetching from external storage as needed. Externalizing internal topics is not standard practice, and the feature is in fact well-suited to long-term retention.
How does the concept of retention change with Tiered Storage?
You design retention as two layers: local retention (the hot set) and global retention (external storage). Size local retention based on your SLA, and configure longer global retention to meet compliance and analytics requirements. Foundational keys such as log.retention.ms/bytes remain in effect.
How does it affect latency and throughput?
Regular writes and reads of recent data are handled locally, so the impact is limited. However, the first read of older data may incur additional latency due to a round trip to external storage. You mitigate this with hot-set capacity sizing, caching, and prefetching.
Is it compatible with compaction and transactions?
Core semantics (acks/ISR/EOS) are preserved. The exact behavior of compaction and cleanup is implementation-dependent, so check the support matrix in the documentation for your product (Confluent, Apache RLM, etc.). It is common practice to keep internal topics local.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...