Introduction to Kafka

4 min readMar 19, 2023

Kafka is a distributed streaming platform that allows you to build real-time streaming applications and microservices. It is designed to handle high-throughput and low-latency data streams, making it an ideal choice for building large-scale, real-time data processing systems. In this article, we will explore the main components of Kafka and their usage in Java projects.

Kafka Consumer

A Kafka consumer is a client application that reads data from Kafka topics. It subscribes to one or more topics and consumes the messages produced by the Kafka producers. The consumer group is a set of consumers that work together to consume data from a set of partitions within a topic. Each consumer within a group is assigned to one or more partitions, and only one consumer within a group can consume messages from a partition at a time.

The purpose of a Kafka consumer in a Java project is to process and analyze data in real-time. For example, a consumer application can be used to process user actions on a website, such as clicks and purchases, and perform real-time analytics on that data. Here’s an example code snippet that shows how to create a Kafka consumer in Java:

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "my-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("my-topic"));

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
    }
}

This code creates a Kafka consumer that subscribes to the “my-topic” topic and reads messages from it. It uses the StringDeserializer class to deserialize the key and value of each message. The consumer polls for new messages every 100 milliseconds and processes them in a loop.

Kafka Producer

A Kafka producer is a client application that writes data to Kafka topics. It publishes messages to one or more topics, which are then consumed by the Kafka consumers. The producer can either send messages synchronously or asynchronously, depending on the requirements of the application.

The purpose of a Kafka producer in a Java project is to send data to Kafka topics in real-time. For example, a producer application can be used to send log data from a web server to Kafka, which can then be consumed by a consumer application for real-time analysis. Here’s an example code snippet that shows how to create a Kafka producer in Java:

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 100; i++) {
    producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i)));
}

producer.close();

This code creates a Kafka producer that sends 100 messages to the “my-topic” topic. It uses the StringSerializer class to serialize the key and value of each message. The producer sends messages synchronously and waits for all replicas to acknowledge the receipt of each message before sending the next one.

Kafka Offsets

Kafka offsets are a unique identifier that is assigned to each message in a Kafka topic. Offsets are used to keep track of the progress of a consumer within a topic and partition. When a consumer reads a message from a topic, it updates its offset to mark that the message has been consumed. The next time the consumer reads from the topic, it starts reading from the last offset it consumed, ensuring that no messages are missed.

The purpose of Kafka offsets in a Java project is to ensure that each message is consumed only once and that the consumer can resume reading from where it left off in case of failures or restarts. Here’s an example code snippet that shows how to use Kafka offsets in a Java consumer:

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "my-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("my-topic"));

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
    }
    consumer.commitSync();
}

This code uses the commitSync() method to commit the offsets of the last consumed messages, ensuring that they are not consumed again. The next time the consumer starts, it will resume reading from the last committed offset.

Kafka Batches

Kafka batches are a group of messages that are sent to Kafka in a single request. Batching messages can improve the throughput and efficiency of Kafka producers, as it reduces the overhead of sending multiple individual messages.

The purpose of Kafka batches in a Java project is to optimize the performance of Kafka producers when sending large volumes of messages. Here’s an example code snippet that shows how to use Kafka batches in a Java producer:

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("batch.size", 16384);
props.put("linger.ms", 1);

KafkaProducer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 100; i++) {
    producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i)));
}

producer.close();

This code uses the batch.size and linger.ms properties to control the size and timing of the message batches sent by the Kafka producer. The batch.size property sets the maximum size of each batch in bytes, while the linger.ms property sets the maximum amount of time in milliseconds that a batch can wait before being sent. These properties can be adjusted to optimize the performance of Kafka producers for specific use cases.

Conclusion

Kafka is a powerful distributed streaming platform that enables real-time processing and analysis of large-scale data streams. In this article, we covered the main components of Kafka and their usage in Java projects. We learned about Kafka consumers and producers, Kafka offsets, and Kafka batches, and how they can be used to build efficient and scalable real-time streaming applications. With the help of Kafka, you can easily build high-performance data processing systems that can handle large volumes of data in real-time.

Introduction to Kafka

Kafka Consumer

Kafka Producer

Kafka Offsets

Kafka Batches

Conclusion

References

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Ramin Mammadzada

No responses yet