What is Apache Kafka?
Apache Kafka is an open-source distributed streaming platform developed by LinkedIn and later donated to the Apache Software Foundation. It's designed to handle real-time data feeds with high throughput, fault tolerance, and scalability. Kafka acts as a distributed commit log, allowing applications to publish and subscribe to streams of records.
Kafka is built around the concept of a distributed commit log, where data is stored in topics that are partitioned and replicated across multiple servers. This architecture enables Kafka to handle millions of messages per second while maintaining durability and fault tolerance.
Core Components
Producer
Applications that publish (write) data to Kafka topics. Producers send records to specific topics and can choose which partition to send data to.
Consumer
Applications that subscribe to (read) data from Kafka topics. Consumers can be part of consumer groups for load balancing and fault tolerance.
Broker
Kafka servers that store data and serve client requests. A Kafka cluster consists of multiple brokers for scalability and fault tolerance.
Topic
Categories or feed names to which records are published. Topics are partitioned and replicated across brokers.
Basic Kafka Producer Example
// Java Producer Example
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
ProducerRecord<String, String> record =
new ProducerRecord<>("user-events", "user123", "login");
producer.send(record);
producer.close();
Consumer Example
// Java Consumer Example
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "user-analytics");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("user-events"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.printf("User: %s, Event: %s%n", record.key(), record.value());
}
}
Use Cases
- Real-time Analytics: Processing streaming data for immediate insights
- Event Sourcing: Storing all changes as a sequence of events
- Log Aggregation: Collecting logs from multiple services
- Stream Processing: Real-time data transformation and enrichment
- Microservices Communication: Asynchronous messaging between services
Career Impact
Kafka expertise is highly valued in the job market, especially for roles involving:
- Data Engineering ($120K - $180K annually)
- Backend Engineering ($110K - $160K annually)
- DevOps Engineering ($115K - $170K annually)
- Solutions Architecture ($140K - $200K annually)
Learning Path
- Understand distributed systems concepts
- Learn Kafka architecture and components
- Practice with Kafka APIs (Producer/Consumer)
- Explore Kafka Streams for stream processing
- Study Kafka Connect for data integration
- Learn monitoring and operations