Apache Kafka

What is Apache Kafka?

Apache Kafka is an open-source distributed streaming platform developed by LinkedIn and later donated to the Apache Software Foundation. It's designed to handle real-time data feeds with high throughput, fault tolerance, and scalability. Kafka acts as a distributed commit log, allowing applications to publish and subscribe to streams of records.

Kafka is built around the concept of a distributed commit log, where data is stored in topics that are partitioned and replicated across multiple servers. This architecture enables Kafka to handle millions of messages per second while maintaining durability and fault tolerance.

1M+

Messages/sec throughput

$140K

Average Kafka Engineer Salary

80%

Fortune 500 companies use Kafka

Core Components

Producer

Applications that publish (write) data to Kafka topics. Producers send records to specific topics and can choose which partition to send data to.

Consumer

Applications that subscribe to (read) data from Kafka topics. Consumers can be part of consumer groups for load balancing and fault tolerance.

Broker

Kafka servers that store data and serve client requests. A Kafka cluster consists of multiple brokers for scalability and fault tolerance.

Topic

Categories or feed names to which records are published. Topics are partitioned and replicated across brokers.

Basic Kafka Producer Example

                        // Java Producer Example
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

Producer<String, String> producer = new KafkaProducer<>(props);

ProducerRecord<String, String> record = 
    new ProducerRecord<>("user-events", "user123", "login");

producer.send(record);
producer.close();

Consumer Example

                        // Java Consumer Example
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "user-analytics");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("user-events"));

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("User: %s, Event: %s%n", record.key(), record.value());
    }
}

Use Cases

Real-time Analytics: Processing streaming data for immediate insights
Event Sourcing: Storing all changes as a sequence of events
Log Aggregation: Collecting logs from multiple services
Stream Processing: Real-time data transformation and enrichment
Microservices Communication: Asynchronous messaging between services

Career Impact

Kafka expertise is highly valued in the job market, especially for roles involving:

Data Engineering ($120K - $180K annually)
Backend Engineering ($110K - $160K annually)
DevOps Engineering ($115K - $170K annually)
Solutions Architecture ($140K - $200K annually)

Learning Path

Understand distributed systems concepts
Learn Kafka architecture and components
Practice with Kafka APIs (Producer/Consumer)
Explore Kafka Streams for stream processing
Study Kafka Connect for data integration
Learn monitoring and operations