Kafka

Audit Log Design in Confluent: CCAAK Practical and Exam Essentials

2026-04-19
NicheeLab Editorial Team

Audit logs consistently record "who did what, where, when, and with what result." In a Kafka environment, the scope includes API calls (Produce/Fetch/Metadata), permission changes, schema operations, and connection information.

Confluent implements and delivers audit logs differently in Cloud and Platform. This article explains the design decisions and implementation pitfalls based on those differences.

Purpose and Scope of Audit Logs

Audit logs serve a different purpose than observability logs or metrics: their primary goal is to retain a consistent event history for after-the-fact verification and compliance evidence. In a Kafka environment, you especially need to comprehensively record authorization decisions (allow/deny), administrative operations (Topic/ACL/RBAC/Connector/ksqlDB/Schema Registry), and client identity (principal, client ID, authentication method, source IP).

In Confluent Cloud, audit logs are delivered to external sinks via a managed delivery mechanism. In Confluent Platform (self-managed), brokers and surrounding components publish audit events as Kafka topics, which are then forwarded externally. Both use schematized events, and the principle of least privilege along with tamper-resistant design is essential.

  • Target actions: authentication/authorization, administrative operations on Topic/ACL/RBAC/Schema/Connect/ksqlDB, and data plane actions like Produce/Fetch
  • Required attributes: actor (executing principal), action (operation), resource (target), outcome (allowed/denied), timestamp, and request metadata (client.id, etc.)
  • Separate audit logs from operational logs, and prioritize WORM characteristics (tamper-resistant) for storage

Confluent Cloud Audit Logs: Design Basics

Confluent Cloud collects audit logs as a managed service and delivers them in near real time to your chosen sinks (for example S3, GCS, Azure Blob, Datadog, Splunk). Users do not subscribe directly from a Kafka topic; storage and analysis happen at the delivery destination.

Events include organization/environment/cluster identifiers, resource type, action, principal, outcome, and timestamp. The basic design assumes retention is managed by the sink's policy; do not expect long-term retention on the Cloud side.

  • Simple to adopt: the management plane is absorbed by Cloud, and delivery reliability and schema consistency are easier to guarantee
  • Access control: operations on audit log settings are restricted to organization-level admin roles
  • Design point: when regions/accounts are separated, separate the sinks as well and aggregate horizontally on the SIEM side
AspectConfluent CloudConfluent Platform (self-managed)
Delivery methodManaged delivery (direct to external sinks)Published to internal topic → forwarded to any sink
Retention responsibilitySink side (user retains the data)Both internal topic and sink must be designed
Event scopeCovers control/data planeDepends on enabled scope and target components
Adoption/maintenanceSmall (Cloud administrator is the configuration owner)Medium to large (requires broker/Connect/sink operations)
Schema evolutionManaged by Cloud; backward compatibility consideredCompatibility verification is the user's responsibility

Confluent Platform Audit Logs: Design Basics

In Confluent Platform (including Confluent Server), audit events are published as internal Kafka topics. The scope covers RBAC/ACL authorization decisions, administrative APIs, and major data plane operations. Audit topics are protected with high availability (sufficient replicas and partitions) and strict access control (read access limited to audit personnel).

External systems (SIEM, object storage) are reached via various Kafka Connect sinks (S3, GCS, Azure, Splunk, etc.). Observability of the forwarding pipeline itself and a retry design (at least at-least-once) are essential.

  • Mirroring the internal topics to a separate cluster (audit-dedicated) is also an effective design
  • Denial events and administrative operations are top monitoring priorities; aggregate/thin successful events according to requirements
  • Strictly synchronize time (NTP) across all nodes to ensure timeline consistency

Design Patterns and Architecture

Cloud uses managed delivery as-is and standardizes storage and analysis at the sink. For Platform, evaluate redundancy and capacity sizing (growth during high throughput) in advance for the internal topic → Connect → sink path.

Critical events (denials, permission changes, data definition changes) should be routed to separate topics/storage tiers and given longer retention. This makes it easier to meet audit requirements while controlling operational cost.

  • Cloud: split sinks by environment/organization and use tags for cost allocation
  • Platform: separate the audit-dedicated Connect worker to isolate backpressure from production traffic
  • Common to all approaches: explicitly define an isolation buffer (DLQ) and retry policy for write failures

Standard Cloud/Platform audit log paths (overview)

Managed ExportRBAC/ACL DecisionsClient/APIConfluent Cloud(Data/Control Plane)SinkSIEM/S3Producers / ConsumersConfluent Platform(Audit Topic)Audit ConnectSIEM/S3

Audit Event Schema and Filtering

Audit events generally consist of the following fields: actor (principal: user/service account), action (operation: CREATE_TOPIC/ALTER_ACL/PRODUCE, etc.), resource (type, name, scope), outcome (ALLOWED/DENIED), reason (denial reason), network/auth information, client.id, correlation.id, and timestamp. There are representation differences between Cloud and Platform, but this skeleton is stable.

High-volume allowed events drive up long-term storage costs. If your requirements permit, prioritize storing denied events and administrative operations, and aggregate (sampling/daily roll-up) successful events. Mask or tokenize confidential data (token fragments, IP segments, user attributes) on the sink side.

  • Inventory fields that may contain PII/sensitive information before collection
  • To prepare for schema evolution, do not drop unknown fields; pass them through transparently
  • Normalize time to a single reference (UTC) and make it explicit at ingestion

Example: Long-term storage of an audit topic with Kafka Connect (S3 Sink)

{
  "name": "audit-s3-sink",
  "config": {
    "connector.class": "io.confluent.connect.s3.S3SinkConnector",
    "tasks.max": "2",
    "topics": "audit-log-events",
    "s3.bucket.name": "org-audit-logs",
    "s3.part.size": "5242880",
    "flush.size": "1000",
    "format.class": "io.confluent.connect.s3.format.json.JsonFormat",
    "partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
    "path.format": "year=YYYY/month=MM/day=dd/hour=HH",
    "locale": "en_US",
    "timezone": "UTC",
    "timestamp.extractor": "Record",
    "behavior.on.null.values": "ignore",
    "rotate.schedule.interval.ms": "900000",
    "schema.compatibility": "BACKWARD"
  }
}

Operations, Protection, and CCAAK Exam Essentials

In operations, record audits of the audit path itself (who disabled or modified it) through a separate path. Keep access permissions on the audit topic minimal; on the Platform side, apply encryption (in transit/at rest) and WORM settings for log storage destinations. On the Cloud side, strictly manage delivery destination IAM, storage class, and lifecycle.

On CCAAK, the differences between Cloud and Platform on "where to configure and where it outputs," the differences in RBAC/ACL and their impact on audit events, and the responsibility boundary when integrating with a SIEM are commonly tested. A frequent pitfall is treating broker standard logs or OS logs as substitutes for audit logs.

  • Cloud: audit logs use managed delivery. Direct consumption by a Kafka consumer is not assumed
  • Platform: durability/protection of the internal topic is the user's responsibility. Clearly define the sink's retry design
  • Performance: the overhead of enabling auditing is small but non-zero. Plan internal topic partitions/replicas accordingly

Example: Audit event (excerpt)

{
  "actor": {"type": "User", "name": "alice"},
  "action": "CREATE_TOPIC",
  "resource": {"type": "Topic", "name": "payments", "scope": {"cluster_id": "lkc-xxxx"}},
  "outcome": "ALLOWED",
  "request": {"client_id": "producer-1", "ip": "203.0.113.10"},
  "auth": {"mechanism": "SASL_SSL", "principal": "User:alice"},
  "correlation_id": "c-7b3d",
  "ts": "2026-04-18T04:21:34Z"
}

Practice Question

CCAAK

問題 1

In a Confluent Cloud environment, you want to aggregate audit logs to an external SIEM with minimal operational cost. Which is the most appropriate design?

  1. Use Cloud's audit log delivery feature to create a SIEM-compatible sink (e.g., Splunk/Datadog) and aggregate from there
  2. Build Kafka Connect outside the cloud and poll the internal topic for audit logs
  3. Collect OS audit logs from each broker and reconstruct the equivalent events
  4. Implement custom logging of every API call in the application and bypass Kafka

正解: A

Confluent Cloud audit logs are designed to be delivered to external sinks via managed delivery. Direct subscription to a Kafka topic is not assumed, so A is the most appropriate, minimal-operational-cost approach. B is a Platform-style method that does not fit Cloud. C and D cannot guarantee audit completeness or coverage.

Frequently Asked Questions

How do audit logs differ from regular broker logs and metrics?

Audit logs record authentication/authorization and administrative operations using a consistent schema of actor, action, resource, outcome, and timestamp. Regular operational logs are aimed at understanding runtime behavior and do not necessarily satisfy audit requirements such as completeness and tamper resistance.

How should audit log storage be designed in Confluent Platform?

A layered design is practical: internal audit topics for high availability and short-to-medium retention, external storage (S3 / object storage) for long-term retention and tamper resistance, and a SIEM optimized for search and correlation analysis.

Is it acceptable to reduce successful (ALLOWED) events?

It depends on your requirements. Always retain denials and administrative changes, and consider aggregating or sampling successful events when regulations allow. Even when reducing volume, preserve schema compatibility and reconstruction capability (explicitly document aggregation logic).

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.