kafka

 

spring.kafka.bootstrap-servers=localhost:9092

🧠 Meaning:

The address of the Kafka broker (the server that stores and processes messages).

✅ Why:

This tells Spring Boot where to send/receive Kafka messages.

🛠 If you’re running Kafka locally, use localhost:9092.
In production, you may have multiple brokers: kafka1:9092,kafka2:9092,...

spring.kafka.consumer.group-id=my-group

🧠 Meaning:

Assigns the consumer group ID to your consumer.

✅ Why:

Kafka uses this to track which consumer read which message. If two consumers share the same group ID, Kafka will load-balance messages between them.

🔄 Helps in parallel processing and message tracking


spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer

spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer

🧠 Meaning:

Kafka sends binary data. These settings tell Kafka how to convert data back into Java strings (deserialize).


✅ Why:

If your producer sends String, the consumer should also receive it as a String. Otherwise, it will throw deserialization errors.


spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer

spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer

🧠 Meaning:

The producer sends String keys and values. These settings tell Kafka to convert Java strings into binary for transmission.

✅ Why:

Kafka can only transmit data in bytes. This setting makes sure that String values (like "order created") are sent properly.



Property Purpose Needed For
spring.kafka.bootstrap-servers Where Kafka is running Both Producer and Consumer
spring.kafka.consumer.group-id Group name for load balancing Consumer only
spring.kafka.consumer.auto-offset-reset From where to start reading Consumer only
key-deserializer / value-deserializer How to read data (bytes → String) Consumer
key-serializer / value-serializer How to send data (String → bytes) Producer

 


spring.kafka.consumer.enable-auto-commit=true

This means:

"Kafka will automatically save the offset (the position of the last message read) for me — I don't need to do it manually."


spring.kafka.consumer.auto-commit-interval=1000

This means:

"Kafka will auto-save the offset every 1000 milliseconds (i.e., every 1 second) while consuming messages."



🧠 Summary: Production-Ready Kafka Consumer

            

Step Action
✅ Set enable-auto-commit=false Prevent auto-ack before you're done
✅ Manually acknowledge (ack.acknowledge()) After successful processing
❌ Do not acknowledge if processing fails Message will be retried
✅ Use Retry + DLQ for stability Avoid message loss or infinite retry loops


🔍 What You Said:

"Are you saying we must manually set/commit offset in Kafka? Suppose I am handling a UPI-like system with billions of transactions — then I have to provide acknowledgment (ack) for all?"

✅ YESand here's why:

🔥 WHY Manual Acknowledgment is Needed (in critical systems like UPI)

In high-stakes systems (like UPI, payments, billing), this is 100% required because:

🔄 Kafka Message 🏦 Real-Life Processing
Kafka gives you a message              You debit/credit money, update DB
You succeed               You call ack.acknowledge() manually
You fail (e.g., DB down)               ❌ Don't ack, so Kafka retries automatically


🎯 Real-World Rule

Offset should be committed only after your business logic is successful.

This guarantees:

  • ✅ No data loss

  • ✅ No double-debit

  • ✅ Safe retries


💡 But Don’t Worry:

You don’t write ack logic manually for each message by hand.
With Spring Kafka:

  • You use batching

  • You use concurrency

  • You use auto/manual config depending on use case

We'll add this step-by-step as your project grows.


 

🤖 How it Works Together

Let's say:

  • You are consuming 5 messages from Kafka.

  • You set enable-auto-commit=true and auto-commit-interval=1000.

Here’s what happens:

Time Action
0 sec Your consumer starts consuming
0.2 sec Kafka delivers message 1
0.4 sec Kafka delivers message 2
0.9 sec Kafka delivers message 3
1.0 sec Kafka saves the offset = 3 (automatically) ✅
1.2 sec Kafka delivers message 4
1.5 sec Kafka delivers message 5
2.0 sec Kafka saves offset = 5 ✅

 

⚠️ Risk (Important)

Let’s say the app crashes at 0.9 sec before offset commit at 1.0 sec:

  • Kafka had sent message 1, 2, 3.

  • But offset wasn't committed yet.

  • So, when you restart the app, those messages will be re-read.

💡 This is why committing too late may cause duplicates, and committing too early may lose messages if processing fails.


✅ Safe Defaults (for beginners)

Setting Safe value
enable-auto-commit true
auto-commit-interval 1000 (1 sec) is fine for dev/testing




🔹 Case 1: Business failure (VALID failure)

Examples:

  • Insufficient balance

  • Invalid UPI ID

  • Limit exceeded

These are expected failures — nothing is broken, just logic conditions failed.


➡️ In such cases:

  • ✅ Save txn to DB as "failed"

  • ✅ Still call ack.acknowledge()

  • No retry needed — because it was processed logically even if result = "fail"

💡 So YES — in these cases, you should mark offset and move on.

🔸 Case 2: System failure (UNEXPECTED failure)

Examples:

  • Database is down

  • Network timeout

  • Kafka broker crash

  • Code throws exception

➡️ In these cases:

  • ❌ You were not able to save/update the txn

  • ❌ It’s unsafe to ack — because you don’t know if processing succeeded

  • ✅ Kafka should retry later

✅ Real-World Strategy:

Failure Type Example Save to DB? Retry Needed? Acknowledge?
✅ Business logic fail Insufficient balance, wrong PIN ✅ Yes ❌ No ✅ Yes
❌ System fail DB down, exception thrown ❌ No ✅ Yes ❌ No


🔁 So, You Control Retry Based on What Failed

🧠 Summary

You Ask Kafka Answer
“Txn failed, can I still ack?” ✅ Yes — if it failed due to business logic and you logged it
“When do I NOT ack?” ❌ When the system crashes or DB fails before saving anything
“Do I need retries?” ✅ Only for unexpected failures




🧪 Bonus Tip for Testing:

You can manually set auto-offset-reset=earliest so that every time you restart the consumer, it reads old messages again — great for beginners to see replay behavior.


# ✅ Kafka Broker Address

spring.kafka.bootstrap-servers=localhost:9092


# ✅ Producer Config

spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer

spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer


# ✅ Consumer Config

spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer

spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer


# 🔑 Unique Consumer Group ID (for this service)

spring.kafka.consumer.group-id=my-consumer-group


# 🧠 Offset Management

# Whether to automatically commit the offset of consumed messages

spring.kafka.consumer.enable-auto-commit=true


# When auto commit is enabled, this sets how often offsets are committed (in ms)

spring.kafka.consumer.auto-commit-interval=1000


# ⚙️ Where to start reading messages when no offset is found:

# earliest → start from beginning

# latest → read only new messages

spring.kafka.consumer.auto-offset-reset=earliest



🔴 Problem with auto.create.topics.enable=true

When this is enabled, Kafka will:

  • Automatically create a topic if it doesn't exist whenever a producer sends data.

  • But it won’t check config (partition, replication) — defaults are used.

  • It may create duplicates if name mismatches slightly ("my-topic" vs "my_topic").

  • Developers might accidentally trigger topic creation, leading to garbage topics.

  • In CI/CD, the topic might be created again if not checked, especially if retention is short-lived.


✅ What to Do in Production

1. Disable auto-creation

Set in server.properties:

auto.create.topics.enable=false


This ensures that:

  • Topics must be created intentionally (by admin script or via code).

  • Avoids unexpected Kafka load or misconfiguration.

2. Create Topics Programmatically — Safe Way

In your Spring Boot application, instead of blindly creating topics every time, do:

✅ Use KafkaAdmin + NewTopic in config

@Configuration
public class KafkaTopicConfig {

    @Bean
    public KafkaAdmin kafkaAdmin() {
        Map<String, Object> configs = new HashMap<>();
        configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        return new KafkaAdmin(configs);
    }

    @Bean
    public NewTopic topic1() {
        return TopicBuilder.name("myTopic")
                .partitions(1)
                .replicas(1)
                .build();
    }
}

☑️ If topic already exists, it won’t recreate or throw error — it silently skip


🛡 Best Practice Summary


Setting / Practice Recommendation
auto.create.topics.enable ❌ Disable in prod
Topic creation ✅ Explicit via code or admin
Topic config (partitions/replicas) ✅ Set manually
Topic naming ✅ Use constants/config
Monitoring topic creation ✅ Enable audit/alerts


🔧 What does KafkaAdmin do in Spring Boot?

KafkaAdmin is a Spring Kafka utility that allows you to manage Kafka topics programmatically during application startup.

✅ Why use it?

When your Spring Boot app starts, you can use KafkaAdmin to:

  • Automatically create topics

  • Configure topic properties (like partitions, replication, etc.)

  • Interact with the Kafka AdminClient API

🧠 Why do we pass a Map<String, Object>?

The map contains configuration properties needed to connect to the Kafka cluster.

Map<String, Object> configs = new HashMap<>();

configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");


This tells Kafka:

"Hey, connect to the Kafka cluster at localhost:9092."

Then we pass that Map into the KafkaAdmin constructor:


return new KafkaAdmin(configs);


📦 Full Example of KafkaAdmin + Topic Creation



@Configuration
public class KafkaTopicConfig {

    @Bean
    public KafkaAdmin kafkaAdmin() {
        Map<String, Object> configs = new HashMap<>();
        configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        return new KafkaAdmin(configs);
    }

    @Bean
    public NewTopic createTopic() {
        return TopicBuilder.name("my-topic")
                .partitions(3)
                .replicas(1)
                .build();
    }
}

🔁 What happens at runtime?

  1. Spring Boot sees KafkaAdmin

  2. It uses the map to connect to Kafka

  3. It sees the NewTopic bean

  4. It checks if my-topic exists — if not, it creates it.

⚠️ NOTE:

If spring.kafka.admin.auto-create=true is set (it is true by default), then it automatically tries to create the topic.


🧪 So when should you use KafkaAdmin + NewTopic?

✅ Ideal for:

  • Local development

  • CI/CD testing

  • Staging environments

  • Prototyping



🏭 In Production, You Should:

  • ❗️Manually create topics via kafka-topics.sh CLI or Kafka Admin tools

  • ✅ Use Infrastructure as Code (Terraform, Ansible, etc.) for topic provisioning

  • ✅ Your Spring Boot app should only consume from or produce to pre-created topics

🔧 Optional: Disable auto topic creation

In production, set this on Kafka Broker:

auto.create.topics.enable=false


This ensures topics must be created beforehand, or you'll get an error — safer for critical systems.


🔍 What is @Bean in Spring?

@Bean is used to manually define an object (bean) that you want Spring to manage in the Spring Application Context.


Using @Bean, you're saying:

“Hey Spring, create and manage this object (like KafkaAdmin or NewTopic) for me as a bean.”


💬 When should you use @Bean?

Use @Bean when... Example
You need to register a custom Java object in Spring context manually KafkaAdmin, NewTopic, ObjectMapper
You're configuring external libraries or custom initialization Kafka, RabbitMQ, etc.
You can't annotate the class with @Component (because it’s from a third-party library) Kafka’s NewTopic, KafkaAdmin

 

✅ Summary for Interview

The @Bean annotation tells Spring to create and manage the return value of a method as a Spring Bean. It's commonly used inside @Configuration classes for manually defining beans that are not automatically detected through @Component.

🔸 Case 1: Your own classes (custom classes like Employee)

If it's your own class, you can do either:

  • ✅ Use @Component, @Service, @Repository, etc.

  • ✅ Or use @Configuration + @Bean to manually define it.

Example:

@Component

public class Employee {

   // spring will auto-detect and register

}

-----------------------or---------------------

@Configuration

public class MyConfig {

    @Bean

    public Employee employee() {

        return new Employee();

    }

}

🔸 Case 2: Third-party or inbuilt classes (like KafkaAdmin, KafkaTemplate, Java SDK classes)

You cannot modify those classes (you can't add @Component to them), so:

👉 You must create their object manually using @Bean.

Example with Kafka:


@Bean

public KafkaAdmin kafkaAdmin() {

    Map<String, Object> config = new HashMap<>();

    config.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

    return new KafkaAdmin(config);

}

✅ Summary Table:

Class Type Can Use @Component? When to Use @Bean
Your own class ✅ Yes Optional (for customization)
Third-party/inbuilt ❌ No ✅ Required

 

✅ 1. What is @Component?

  • Marks a class as a Spring-managed bean.

  • Spring auto-detects it during component scanning (@ComponentScan).

  • Mostly used for business logic classes.

➡️ Spring will automatically create a single object (singleton) of this class and manage it.

✅ 2. What is @Configuration?

  • It is a special type of @Component.

  • Used to manually define beans using the @Bean annotation.

  • Typically used for third-party library classes, configuration settings, Kafka topics, etc.

➡️ This class itself becomes a Spring bean and also tells Spring to register the returned object (ABC) as a bean.


🆚 Difference Between @Component and @Configuration


Feature @Component @Configuration
Purpose Marks a simple bean class Used to define bean creation methods
Can contain @Bean methods  ❌ No ✅ Yes
Usage For business logic, services, repos For custom object creation/configuration
Auto-scanned? ✅ Yes (via @ComponentScan) ✅ Yes (it's a specialized component)
Proxy behavior




✅ When to Use What?

Situation Use
You create your own service/repository/controller @Component, @Service, etc.
You need to use a third-party class (e.g. KafkaAdmin, ObjectMapper) @Configuration + @Bean
You want full control over object creation @Bean method inside @Configuration


==========================================================================================



If you're creating Kafka topics programmatically using KafkaAdmin and NewTopic beans in Spring, here’s how Spring Boot internally handles the "topic already exists" situation:


What Happens Behind the Scenes?

When your application starts, Spring Boot (through KafkaAdmin) looks for all @Beans of type NewTopic and tries to create them using the AdminClient API.



What if the topic already exists?

Kafka does not allow duplicate topics. If a topic already exists, it will NOT throw an exception, it will simply skip creation silently.


@Bean

public NewTopic myTopic() {

    return TopicBuilder.name("employee-events")

            .partitions(3)

            .replicas(1)

            .build();

}



✅ If topic "employee-events" already exists, Spring just skips creation.
❌ It will not throw an error or create a duplicate.


🔍 How does it "check"?

Internally, Spring calls:

adminClient.createTopics(...)


And Kafka AdminClient queries the cluster metadata to check if the topic already exists.

  

📌 Agenda: Kafka Producer in Spring Boot

🎯 Goal:

Send a message from a Spring Boot application to a Kafka topic.


🏗️ Step-by-Step Plan

Step 1: Setup Application

  • Ensure Kafka and Zookeeper (or Kraft if you use new version) are running.

  • Add required dependencies (spring-kafka, spring-boot-starter-web) in pom.xml.

👉 Purpose:
Spring Boot needs Kafka libraries to connect and produce messages.


Step 2: Configure Kafka Producer

  • Create a configuration class for ProducerFactory and KafkaTemplate.

👉 Purpose:

  • ProducerFactory → Defines how a Kafka Producer is created.

  • KafkaTemplate → Helper class that we use in our code to send messages easily.



    Step 3: Create Producer Service

  • A Spring @Service that uses KafkaTemplate to send messages.

     👉 Purpose:
     Encapsulates producer logic in one class for reusability.


    Step 4: Create REST Controller

  • An API endpoint (/publish) that accepts a message and calls the service to send it.

     👉 Purpose:
      So we can easily test producer via browser or Postman.


     Step 5: Test

  • Use Postman:

    http://localhost:8080/kafka/publish?message=HelloKafka



     Verify in Kafka console consumer or via your consumer app.


🔁 Flow Recap:

  1. Client → calls REST endpoint

  2. Controller → receives request, passes message

  3. Service → uses KafkaTemplate to send message

  4. Kafka Producer → publishes message to topic

  5. Kafka Broker → stores it for consumers



🔹 1. Core Concept: Serialization & Deserialization

👉 Problem:
Kafka can only send and store bytes.
But in Java / Spring Boot, we usually deal with objects (like POJOs).

So we need a way to convert:

  • Serialization → Convert Java Object ➝ Bytes (to send into Kafka)

  • Deserialization → Convert Bytes ➝ Java Object (to consume from Kafka)

👉 In JSON messaging:

  • On Producer side → convert POJO → JSON String → Bytes

  • On Consumer side → convert Bytes → JSON String → POJO

Spring Kafka already provides support for this using JsonSerializer and JsonDeserializer.


🔹 2. Tools Used

  • Producer: Uses KafkaTemplate with JsonSerializer

  • Consumer: Uses @KafkaListener with JsonDeserializer

  • POJO: Our custom class (e.g. Employee)

  • Controller: REST API to send messages



📌 Project: Wikimedia Stream → Kafka → Database


🏗️ High-Level Architecture

Wikimedia Stream API  --->  Producer Service  --->  Kafka Topic  
                                                    |  
                                                    v  
                                             Consumer Service  
                                                    |  
                                                    v  
                                                  Database



Wikimedia Stream API → Producer → Kafka Topic → Consumer 
          ↓
      JSON Event
          ↓
Convert JSON → POJO (WikimediaEvent)
          ↓
Validate & Save into MySQL



Flow

  1. Call http://localhost:8080/start-stream

  2. Producer starts streaming Wikimedia edits → sends to Kafka topic wikimedia-stream.

  3. Consumer reads from topic → saves into MySQL via WikimediaDataRepository.


✅ Flow

  1. OkHttp EventSource connects to Wikimedia SSE.

  2. WikimediaEventHandler receives each JSON message.

  3. Message is logged and published to Kafka topic wikimedia_recentchange.

  4. You can later consume from Kafka and store into your MySQL database.


Wikimedia (SSE Stream)
        |
        v
Spring Boot Producer
        |
        v
Kafka Topic
        |
        v
Spring Boot Consumer
        |
        v
MySQL DB


This library provides classes like:

  • EventSource → Opens a connection to an SSE endpoint.

  • EventHandler → An interface you implement to handle events from the stream.


🔄 How It Works

  1. EventSource starts a connection to Wikimedia API.

  2. EventHandler eventHandler = new WikimediaEventHandler(kafkaTemplate, "wikimedia_recentchange");

    EventSource eventSource = new EventSource.Builder(eventHandler, URI.create("https://stream.wikimedia.org/v2/stream/recentchange")).build();

    eventSource.start();

  3. Every time a new event comes in (e.g., someone edits a Wikipedia page),
    onMessage() of your WikimediaEventHandler is called.

  4. Inside onMessage, you send the event data to Kafka.


  5. 🔑 In Short

    • EventHandler is not Spring’s, it’s from okhttp-eventsource library.

    • It’s like a listener that reacts whenever Wikimedia pushes a new event.

    • You implement it to define what to do when data arrives → e.g., publish to Kafka.


@Override
public void onMessage(String event, MessageEvent messageEvent) {
    System.out.println("Event name: " + event);  
    String jsonPayload = messageEvent.getData();

    // Send raw JSON to Kafka
    kafkaTemplate.send("wikimedia_recentchange", jsonPayload);
}


  • event: "message" or null

  • messageEvent.getData(): the whole JSON about the Wikipedia edit


  • Draw a sequence diagram showing how the flow works (EventHandlerEventSourceKafkaTemplate → Kafka → Consumer → DB).



  • 🔑 In Simple Words

    • String eventLabel/name of the event (like “edit”). Often not used here.

    • MessageEventThe actual Wikimedia data (JSON string + metadata).




  • ✅ 2. Sequence Diagram (Conceptual Flow)

    Here’s how your Wikimedia Stream → Kafka → DB flow works:


    [Wikimedia Stream] 

            |

            v

     EventSource (OkHttp EventSource)

            |

            v

     EventHandler (receives JSON messages)

            |

            v

     KafkaTemplate.send("wikimedia-topic", eventJson)

            |

            v

     Kafka Broker (Topic: wikimedia-topic)

            |

            v

     KafkaConsumer (Spring Boot Consumer)

            |

            v

     JPA Repository.save(WikimediaData)

            |

            v

     MySQL Database (wikimedia_data table)


    This library provides classes like:

    • EventSource → Opens a connection to an SSE endpoint.

    • EventHandler → An interface you implement to handle events from the stream.



    📌 Explanation in simple terms

    • EventSource: Opens a live connection to Wikimedia’s EventStreams API.

    • EventHandler: Listens to events and receives raw JSON data.

    • KafkaTemplate: Produces (publishes) that JSON event into Kafka.

    • Kafka Broker: Acts as a middleman (message queue).

    • Consumer: Spring Boot service that listens to Kafka topic.

    • Repository: Converts JSON into an Entity and saves it in MySQL.


    Wikimedia Stream → EventHandler → Producer → Kafka Topic
                                            ↓
                                    Database (raw JSON)
                                            ↓
                                     Kafka Consumer
                                            ↓
                                 Database (structured data)


    Q1. Explain the flow of your project.
    A1. Wikimedia events stream → EventHandler → Producer saves raw JSON to DB and Kafka → Kafka Consumer reads messages → parses JSON into structured Entity → saves in MySQL.
        


    ore Components & Definitions

    1. EventHandler

    • Definition: A listener class that handles real-time events from an external source.

    • In Project: WikimediaEventHandler implements EventHandler to receive new Wikimedia changes.

    • Analogy: Like a waiter who brings food orders from the kitchen.


    @Override
    public void onMessage(String event, MessageEvent messageEvent) {
        producer.sendMessage(messageEvent.getData());
    }


    2. EventSource

    • Definition: Client that opens a connection to a streaming server (Wikimedia) and continuously receives Server-Sent Events (SSE).

    • In Project: Connects to Wikimedia URL and pushes events into the EventHandler.

    • Example:

    • EventSource.Builder builder = new EventSource.Builder(eventHandler, URI.create(url));

    3. Kafka

    • Definition: A distributed messaging platform for high-throughput, fault-tolerant real-time data streaming.

    • In Project: Used to publish Wikimedia events (Producer) and consume them (Consumer).

    • Key Concepts:

      • Topic: Named channel for messages (e.g., wikimedia_events).

      • Partition: Split of a topic for parallelism and scalability.

      • Producer: Publishes messages to Kafka.

      • Consumer: Subscribes and reads messages from Kafka.

      • Group ID: A logical group of consumers; Kafka ensures each message in a topic partition is delivered to one consumer in the group.

      • Zookeeper: Coordinates Kafka brokers (older versions; in new Kafka versions, replaced by KRaft mode).

      • Offset: Position of a consumer in a partition.

    4. Kafka Producer

    • Definition: Publishes data to a Kafka topic.

    • In Project: WikimediaProducer saves raw JSON in MySQL and sends to Kafka.

    • Example:

    • kafkaTemplate.send("wikimedia_events", message);

    @KafkaListener(topics = "wikimedia_events", groupId = "wikimedia_group")
    public void consume(String message) { ... }

    6. ObjectMapper

    • Definition: Jackson library class to map JSON ↔ Java Objects.

    • In Project: Used in Consumer to parse raw JSON into fields.

    • Example:JsonNode node = objectMapper.readTree(message);

    7. DTO (Data Transfer Object)

    • Definition: A plain Java object used to transfer data between layers.

    • In Project: WikimediaEventDTO stores eventId, user, title, timestamp before converting to Entity.

    • Benefit: Adds validation and transformation before persisting to DB.

    8. Entity & Repository

    • Entity: Java class mapped to a DB table using JPA annotations.

    • Repository: Interface extending JpaRepository for CRUD operations.

    • 9. Zookeeper

      • Definition: Centralized service for maintaining Kafka cluster metadata, leader election, and configuration.

      • Note: Kafka 3+ can run without Zookeeper using KRaft mode.

      • In Interview: You can mention you used a Zookeeper-backed Kafka cluster.

      🔹 Interview Questions & Answers

      • Q1. Explain the flow of your project.
        A1. Wikimedia events stream → EventHandler → Producer saves raw JSON to DB and Kafka → Kafka Consumer reads messages → parses JSON into structured Entity → saves in MySQL.


        Q2. Why did you use Kafka here?
        A2. Kafka allows us to handle high-throughput, real-time events asynchronously. It decouples the Producer from the Consumer so each can scale independently.


        Q3. What’s the difference between Entity and DTO?
        A3. Entity maps directly to a DB table, DTO is a lightweight object used for transferring and validating data between layers.


        Q4. What is the role of EventHandler and EventSource?
        A4. EventSource opens a connection to Wikimedia’s SSE endpoint, and EventHandler receives the events and forwards them to the Producer.


        Q5. What is a Kafka Topic and Group ID?
        A5. A Topic is like a channel for messages. Group ID ensures that in a consumer group, each message from a partition is delivered to only one consumer instance for parallel processing.


        Q6. How does ObjectMapper help in your Consumer?
        A6. ObjectMapper from Jackson parses the JSON string into a Java object (DTO) or extracts fields directly, making JSON processing easy and reliable.


        Q7. Why do we need Zookeeper in Kafka?
        A7. Zookeeper manages metadata, leader election, and broker coordination. (Note: Kafka 3+ can use KRaft mode instead of Zookeeper.)


        Q8. How would you handle null fields in Wikimedia events?
        A8. Either make DB columns nullable and provide default values in Consumer, or filter only edit type events in EventHandler to ensure all required fields are present.


      🔹 Key Terms in One Line

        • EventHandler → Receives and processes incoming events.

        • EventSource → Connects to external SSE stream.

        • ObjectMapper → JSON ↔ Java converter.

        • DTO → Transfer object, not tied to DB.

        • Entity → DB table representation.

        • Repository → Data access layer for DB.

        • Topic → Channel in Kafka for messages.

        • Group ID → Consumer group for load balancing.

        • Producer → Publishes messages to Kafka.

        • Consumer → Reads messages from Kafka.

        • Zookeeper → Manages Kafka cluster coordination.

        • Offset → Position marker in Kafka partition.





























    Comments

    Popular posts from this blog

    Two Sum II - Input Array Is Sorted

    Comparable Vs. Comparator in Java

    Increasing Triplet Subsequence