Kafka

AWS MSK vs Confluent: Operations, Features, and Cost from an Exam and Real-World View

2026-04-19
NicheeLab Editorial Team

When teams adopt managed Kafka, many of them get stuck choosing between AWS MSK and Confluent. The two run the same Kafka under the hood, but they differ significantly in operational responsibility, security features, ecosystem services, and billing model.

This article focuses on the differences that matter in day-to-day operations and on the points CCAAK tends to ask about (RBAC/ACL, connectivity, replication, auditing and governance), organized against stable official specs.

Big Picture and Selection Axes

MSK is a managed Kafka broker service running inside your AWS VPC. Its strength is strong Kafka compatibility — existing Kafka tooling generally works as-is. However, peripheral stacks like Schema Registry and ksqlDB are not included and must be provided separately (self-managed or via another service).

Confluent ships Kafka together with Schema Registry, Kafka Connect, ksqlDB, RBAC, auditing, and cross-cluster replication (Cluster Linking and friends) as an integrated package. The Cloud edition is fully managed across multiple clouds; the Platform edition is self-managed.

  • The five major selection axes are: boundary of operational responsibility, security/governance requirements, how much of the ecosystem is bundled, network constraints, and cost model.
  • From a CCAAK perspective, frequent topics include the difference between RBAC and ACL, Schema Registry and compatibility modes, replication options (MM2 vs Cluster Linking), and how private connectivity is handled.
AspectAWS MSKConfluent Cloud/PlatformCCAAK Exam Focus
Operational responsibilityAWS runs the brokers. Peripherals are mostly yours to run (MSK Connect is a separate offering).Kafka plus peripherals (SR, Connect, ksqlDB, RBAC, audit) provided together.Watch for Confluent-specific terminology and APIs around RBAC, audit logs, and organization-scope settings.
IdentityTLS / SCRAM / IAM (MSK IAM auth) + Kafka ACL.API keys / OAuth / mTLS + RBAC + audit.Distinguish ACL vs RBAC and the granularity of principal, resource, and action.
Connectivity / networkVPC-native. Private connectivity by default (optional public exposure).Public endpoint by default. Private Link, peering, and TGW integration are available.Understand DNS, certificates, and endpoint resolution under private connectivity.
ScalingTune broker count and storage (Serverless automates more of this).Scale by cluster size / CKU or usage. Surrounding services scale together.Understand partition and throughput limits and the impact of reassignment.
ReplicationMirrorMaker 2 (self-operated or in containers, etc.).Managed options like Cluster Linking and Replicator.Difference between in-cluster replicas and cross-cluster replication.
ObservabilityCloudWatch metrics and logs.Cloud / Control Center / Metrics API / audit logs.Key indicators: latency, throughput, consumer lag, and reject rate.

Operating Model Differences (Scaling and Upgrades)

MSK (provisioned) requires you to explicitly manage broker count and EBS capacity. Scaling means planning rebalances and partition reassignments. Serverless abstracts part of the capacity planning, but Kafka-specific constraints like partition layout and throughput limits still need design attention.

Confluent Cloud scales by cluster size or usage, and Schema Registry, Connect, and ksqlDB scale alongside it on the same platform. Rolling upgrades and version coherence are absorbed by the provider, but client compatibility (producer/consumer APIs, linearization, compression settings) is the user's responsibility.

  • On MSK, adding brokers means planning networking, security groups, and subnets as well.
  • On Confluent, verify service-side constraints like partition limits, quotas, and CKU in advance.
  • On either platform, increasing partition count involves reassignment and data movement — avoid doing it during peak traffic windows.

Security and Identity (ACL vs RBAC, Authentication)

MSK supports TLS encryption, SASL/SCRAM, and AWS-specific IAM authentication (SASL/IAM). Authorization is handled with Kafka ACLs, granting resource permissions on topics, groups, etc. to principals (users/roles).

Confluent offers API keys, OAuth/OIDC, and mTLS, with authorization done through RBAC (resource owners/roles). Combined with audit logs and Stream Governance (schema compatibility/tagging), you can design organizational governance as a single piece.

  • Exam tip: ACLs are evaluated by action (Read/Write/Create) x resource (Topic/Group/Cluster). RBAC, in contrast, bundles permissions into roles (DeveloperRead, Operator, etc.).
  • Schema evolution compatibility modes (BACKWARD/FORWARD/FULL) show up often with Confluent Schema Registry. With MSK, you assume a separately operated registry.

Example Kafka client configurations (MSK IAM vs Confluent Cloud)

# MSK(SASL/IAM)クライアントプロパティ例
bootstrap.servers=b-1.msk.example.amazonaws.com:9098,b-2.msk.example.amazonaws.com:9098
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler

# Confluent Cloud(SASL/PLAIN)クライアントプロパティ例
bootstrap.servers=pkc-xxxxx.us-central1.gcp.confluent.cloud:9092
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="<API_KEY>" password="<API_SECRET>";
client.dns.lookup=use_all_dns_ips

Ecosystem Features (Schema Registry / Connect / ksqlDB)

MSK on its own does not include Schema Registry or ksqlDB. If you need them, you deploy Confluent Schema Registry self-managed, or combine with another service. Connect is offered as a managed runtime under MSK Connect, but check available connectors, licensing, and operational features per vendor scope.

Confluent provides Schema Registry, managed Connect, and ksqlDB as one integrated package, letting you handle compatibility modes, schema evolution, connector version management, and stream-processing operations through unified UI/APIs.

  • Exam tip: expect questions on choosing schema compatibility modes and detecting breaking changes, connector error handling (DLQ/retry), and ksqlDB persistent queries and their resource consumption.
  • In practice: when you integrate with many heterogeneous DBs/SaaS, the richness of managed Connect and connector catalogs strongly affects total cost of ownership.

Observability, DR, and Networking

MSK integrates with CloudWatch metrics and logs by default. Multi-AZ deployment is the default, and cross-region DR is built with MirrorMaker 2 or similar. Networking is VPC-internal, so private reachability design (routing, DNS, certificates) must be done carefully.

Confluent provides a Metrics API, UI, and audit logs (Cloud/Platform), with DR options including Cluster Linking. Networking exposes a public endpoint over the internet by default; you can isolate it with Private Link, peering, or TGW as needed.

  • Exam tip: map symptoms to causes — consumer lag, throttling (quota exceeded / bandwidth limited), and rejects (RecordTooLarge / RequestQueueFull).
  • When designing DR, be able to articulate the separate purposes of intra-cluster replicas and cross-cluster replication (RPO/RTO).

Conceptual diagram: MSK (inside VPC) and Confluent Cloud (with Private Link)

App ASG/EKSProducer/ConsumerMSK ClusterAZ-a/b/c + NLB/PrivatePrivate Link / PeeringConfluent CloudPublic/Private EP + SR/Connect/ksqlDBMSK (inside VPC) and Confluent Cloud (with Private Link)

Cost Model and Sizing Tips

For MSK (provisioned), the main cost components are brokers (instance type x count x hours), storage (GB/month), data transfer, and MSK Connect (if used). Serverless is mostly billed against usage (ingress/egress, partitions, storage, etc.). Exact unit prices vary by region and time, so always estimate from the official pricing page.

Confluent Cloud is determined by usage billing (GB ingress/egress, storage, partitions, etc.) plus plan/size (Basic / Standard / Dedicated, CKU, etc.). Factor in add-ons such as Schema Registry, ksqlDB, and Private Link as well.

  • On both platforms, inter-AZ, inter-region, and internet-bound data transfer can be billed separately, so always estimate it when designing DR or external integrations.
  • Partition count affects not only performance but also cost (metadata and management overhead). Start with the minimum required and grow incrementally.
  • Storage cost is tied to compaction and retention (retention.ms/bytes). Tune them per topic to match the requirement.

Check with a Question

CCAAK

問題 1

You require strict audit logging and role-based access control (RBAC), and you want Schema Registry and ksqlDB as fully managed services. Connectivity must be private, and cross-cluster replication should be easy to set up. Which choice is most appropriate?

  1. A Confluent Cloud Dedicated cluster with Private Link plus Schema Registry/RBAC
  2. AWS MSK (provisioned) plus MSK Connect, with Schema Registry self-operated on EC2
  3. Meet the requirements with AWS MSK Serverless and Glue integration only
  4. Build OSS Kafka / Schema Registry / Connect entirely from scratch on EC2

正解: A

Only Confluent Cloud delivers RBAC, audit logs, Schema Registry, and ksqlDB as an integrated managed offering, with Private Link and Cluster Linking available. B increases operational responsibility, C struggles to meet the RBAC, ksqlDB, and audit requirements, and D falls outside the managed-service requirement.

Frequently Asked Questions

Which is cheaper, MSK or Confluent Cloud?

It depends on the workload and requirements. MSK works well for sustained, high-throughput operation on a fixed broker fleet, but you pay extra to run the surrounding ecosystem yourself. Confluent Cloud is consumption-based and easier to start small with, and it bundles Schema Registry, Connect, ksqlDB, RBAC, and auditing into the managed offering. Compare total cost of ownership including data transfer (AZ, region, and internet).

Is MSK Serverless robust to traffic spikes?

Part of capacity planning is abstracted away, so it handles steady-state and moderate fluctuations well. However, Kafka-specific limits — per-partition throughput and latency, connection counts, and so on — still apply. For sharp spikes or very high throughput, plan partition layout carefully and verify quotas and limits in advance.

Can I use Confluent Schema Registry with MSK?

Yes. Schema Registry is a separate component from Kafka itself, so you can run it self-managed inside your VPC (or in a separate environment) as long as your clients can reach it. You are responsible for designing and operating compatibility modes, authentication, and availability (redundancy and scale).

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.