kafka

✅ `spring.kafka.bootstrap-servers=localhost:9092`

🧠 Meaning:

The address of the Kafka broker (the server that stores and processes messages).

✅ Why:

This tells Spring Boot where to send/receive Kafka messages.

🛠 If you’re running Kafka locally, use localhost:9092.
In production, you may have multiple brokers: kafka1:9092,kafka2:9092,...

✅ `spring.kafka.consumer.group-id=my-group`

🧠 Meaning:

Assigns the consumer group ID to your consumer.

✅ Why:

Kafka uses this to track which consumer read which message. If two consumers share the same group ID, Kafka will load-balance messages between them.

🔄 Helps in parallel processing and message tracking

✅ `spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer`

✅ `spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer`

🧠 Meaning:

Kafka sends binary data. These settings tell Kafka how to convert data back into Java strings (deserialize).

✅ Why:

If your producer sends String, the consumer should also receive it as a String. Otherwise, it will throw deserialization errors.

✅ `spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer`

✅ `spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer`

🧠 Meaning:

The producer sends String keys and values. These settings tell Kafka to convert Java strings into binary for transmission.

✅ Why:

Kafka can only transmit data in bytes. This setting makes sure that String values (like "order created") are sent properly.

Property	Purpose	Needed For
`spring.kafka.bootstrap-servers`	Where Kafka is running	Both Producer and Consumer
`spring.kafka.consumer.group-id`	Group name for load balancing	Consumer only
`spring.kafka.consumer.auto-offset-reset`	From where to start reading	Consumer only
`key-deserializer` / `value-deserializer`	How to read data (bytes → String)	Consumer
`key-serializer` / `value-serializer`	How to send data (String → bytes)	Producer

✅ `spring.kafka.consumer.enable-auto-commit=true`

This means:

"Kafka will automatically save the offset (the position of the last message read) for me — I don't need to do it manually."

✅ `spring.kafka.consumer.auto-commit-interval=1000`

This means:

"Kafka will auto-save the offset every 1000 milliseconds (i.e., every 1 second) while consuming messages."

🧠 Summary: Production-Ready Kafka Consumer

Step	Action
✅ Set `enable-auto-commit=false`	Prevent auto-ack before you're done
✅ Manually acknowledge (`ack.acknowledge()`)	After successful processing
❌ Do not acknowledge if processing fails	Message will be retried
✅ Use Retry + DLQ for stability	Avoid message loss or infinite retry loops

🔍 What You Said:

"Are you saying we must manually set/commit offset in Kafka? Suppose I am handling a UPI-like system with billions of transactions — then I have to provide acknowledgment (ack) for all?"

✅ YES — and here's why:

🔥 WHY Manual Acknowledgment is Needed (in critical systems like UPI)

In high-stakes systems (like UPI, payments, billing), this is 100% required because:

🔄 Kafka Message	🏦 Real-Life Processing
Kafka gives you a message	You debit/credit money, update DB
You succeed	You call `ack.acknowledge()` manually
You fail (e.g., DB down)	❌ Don't ack, so Kafka retries automatically

🎯 Real-World Rule

Offset should be committed only after your business logic is successful.

This guarantees:

✅ No data loss
✅ No double-debit
✅ Safe retries

💡 But Don’t Worry:

You don’t write ack logic manually for each message by hand.
With Spring Kafka:

You use batching
You use concurrency
You use auto/manual config depending on use case

We'll add this step-by-step as your project grows.

🤖 How it Works Together

Let's say:

You are consuming 5 messages from Kafka.
You set enable-auto-commit=true and auto-commit-interval=1000.

Here’s what happens:

Time	Action
0 sec	Your consumer starts consuming
0.2 sec	Kafka delivers message 1
0.4 sec	Kafka delivers message 2
0.9 sec	Kafka delivers message 3
1.0 sec	Kafka saves the offset = 3 (automatically) ✅
1.2 sec	Kafka delivers message 4
1.5 sec	Kafka delivers message 5
2.0 sec	Kafka saves offset = 5 ✅

⚠️ Risk (Important)

Let’s say the app crashes at 0.9 sec before offset commit at 1.0 sec:

Kafka had sent message 1, 2, 3.
But offset wasn't committed yet.
So, when you restart the app, those messages will be re-read.

💡 This is why committing too late may cause duplicates, and committing too early may lose messages if processing fails.

✅ Safe Defaults (for beginners)

Setting	Safe value
`enable-auto-commit`	`true`
`auto-commit-interval`	`1000` (1 sec) is fine for dev/testing

🔹 Case 1: Business failure (VALID failure)

Examples:

Insufficient balance
Invalid UPI ID
Limit exceeded

These are expected failures — nothing is broken, just logic conditions failed.

➡️ In such cases:

✅ Save txn to DB as "failed"
✅ Still call ack.acknowledge()
✅ No retry needed — because it was processed logically even if result = "fail"

💡 So YES — in these cases, you should mark offset and move on.

🔸 Case 2: System failure (UNEXPECTED failure)

Examples:

Database is down
Network timeout
Kafka broker crash
Code throws exception

➡️ In these cases:

❌ You were not able to save/update the txn
❌ It’s unsafe to ack — because you don’t know if processing succeeded
✅ Kafka should retry later

✅ Real-World Strategy:

Failure Type	Example	Save to DB?	Retry Needed?	Acknowledge?
✅ Business logic fail	Insufficient balance, wrong PIN	✅ Yes	❌ No	✅ Yes
❌ System fail	DB down, exception thrown	❌ No	✅ Yes	❌ No

🔁 So, You Control Retry Based on What Failed

🧠 Summary

You Ask	Kafka Answer
“Txn failed, can I still ack?”	✅ Yes — if it failed due to business logic and you logged it
“When do I NOT ack?”	❌ When the system crashes or DB fails before saving anything
“Do I need retries?”	✅ Only for unexpected failures

🧪 Bonus Tip for Testing:

You can manually set auto-offset-reset=earliest so that every time you restart the consumer, it reads old messages again — great for beginners to see replay behavior.

# ✅ Kafka Broker Address

spring.kafka.bootstrap-servers=localhost:9092

# ✅ Producer Config

spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer

spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer

# ✅ Consumer Config

spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer

spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer

# 🔑 Unique Consumer Group ID (for this service)

spring.kafka.consumer.group-id=my-consumer-group

# 🧠 Offset Management

# Whether to automatically commit the offset of consumed messages

spring.kafka.consumer.enable-auto-commit=true

# When auto commit is enabled, this sets how often offsets are committed (in ms)

spring.kafka.consumer.auto-commit-interval=1000

# ⚙️ Where to start reading messages when no offset is found:

# earliest → start from beginning

# latest → read only new messages

spring.kafka.consumer.auto-offset-reset=earliest

🔴 Problem with `auto.create.topics.enable=true`

When this is enabled, Kafka will:

Automatically create a topic if it doesn't exist whenever a producer sends data.
But it won’t check config (partition, replication) — defaults are used.
It may create duplicates if name mismatches slightly ("my-topic" vs "my_topic").
Developers might accidentally trigger topic creation, leading to garbage topics.
In CI/CD, the topic might be created again if not checked, especially if retention is short-lived.

✅ What to Do in Production

1. Disable auto-creation

Set in server.properties:

auto.create.topics.enable=false

This ensures that:

Topics must be created intentionally (by admin script or via code).
Avoids unexpected Kafka load or misconfiguration.

2. Create Topics Programmatically — Safe Way

In your Spring Boot application, instead of blindly creating topics every time, do:

✅ Use `KafkaAdmin` + `NewTopic` in config

@Configuration

public class KafkaTopicConfig {

@Bean

public KafkaAdmin kafkaAdmin() {

Map<String, Object> configs = new HashMap<>();

configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

return new KafkaAdmin(configs);

}

@Bean

public NewTopic topic1() {

return TopicBuilder.name("myTopic")

.partitions(1)

.replicas(1)

.build();

}

☑️ If topic already exists, it won’t recreate or throw error — it silently skip

🛡 Best Practice Summary

Setting / Practice	Recommendation
auto.create.topics.enable	❌ Disable in prod
Topic creation	✅ Explicit via code or admin
Topic config (partitions/replicas)	✅ Set manually
Topic naming	✅ Use constants/config
Monitoring topic creation	✅ Enable audit/alerts

🔧 What does `KafkaAdmin` do in Spring Boot?

KafkaAdmin is a Spring Kafka utility that allows you to manage Kafka topics programmatically during application startup.

✅ Why use it?

When your Spring Boot app starts, you can use KafkaAdmin to:

Automatically create topics
Configure topic properties (like partitions, replication, etc.)
Interact with the Kafka AdminClient API

🧠 Why do we pass a `Map<String, Object>`?

The map contains configuration properties needed to connect to the Kafka cluster.

Map<String, Object> configs = new HashMap<>();

configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

This tells Kafka:

"Hey, connect to the Kafka cluster at localhost:9092."

Then we pass that Map into the KafkaAdmin constructor:

return new KafkaAdmin(configs);

📦 Full Example of KafkaAdmin + Topic Creation

@Configuration

public class KafkaTopicConfig {

@Bean

public KafkaAdmin kafkaAdmin() {

Map<String, Object> configs = new HashMap<>();

configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

return new KafkaAdmin(configs);

}

@Bean

public NewTopic createTopic() {

return TopicBuilder.name("my-topic")

.partitions(3)

.replicas(1)

.build();

}

🔁 What happens at runtime?

Spring Boot sees KafkaAdmin
It uses the map to connect to Kafka
It sees the NewTopic bean
It checks if my-topic exists — if not, it creates it.

⚠️ NOTE:

If spring.kafka.admin.auto-create=true is set (it is true by default), then it automatically tries to create the topic.

🧪 So when should you use `KafkaAdmin` + `NewTopic`?

✅ Ideal for:

Local development
CI/CD testing
Staging environments
Prototyping

🏭 In Production, You Should:

❗️Manually create topics via kafka-topics.sh CLI or Kafka Admin tools
✅ Use Infrastructure as Code (Terraform, Ansible, etc.) for topic provisioning
✅ Your Spring Boot app should only consume from or produce to pre-created topics

🔧 Optional: Disable auto topic creation

In production, set this on Kafka Broker:

auto.create.topics.enable=false

This ensures topics must be created beforehand, or you'll get an error — safer for critical systems.

🔍 What is `@Bean` in Spring?

@Bean is used to manually define an object (bean) that you want Spring to manage in the Spring Application Context.

Using @Bean, you're saying:

“Hey Spring, create and manage this object (like KafkaAdmin or NewTopic) for me as a bean.”

💬 When should you use @Bean?

Use `@Bean` when...	Example
You need to register a custom Java object in Spring context manually	KafkaAdmin, NewTopic, ObjectMapper
You're configuring external libraries or custom initialization	Kafka, RabbitMQ, etc.
You can't annotate the class with `@Component` (because it’s from a third-party library)	Kafka’s NewTopic, KafkaAdmin

✅ Summary for Interview

The @Bean annotation tells Spring to create and manage the return value of a method as a Spring Bean. It's commonly used inside @Configuration classes for manually defining beans that are not automatically detected through @Component.

🔸 Case 1: Your own classes (custom classes like `Employee`)

If it's your own class, you can do either:

✅ Use @Component, @Service, @Repository, etc.
✅ Or use @Configuration + @Bean to manually define it.

Example:

@Component

public class Employee {

// spring will auto-detect and register

}

-----------------------or---------------------

@Configuration

public class MyConfig {

@Bean

public Employee employee() {

return new Employee();

}

🔸 Case 2: Third-party or inbuilt classes (like KafkaAdmin, KafkaTemplate, Java SDK classes)

You cannot modify those classes (you can't add @Component to them), so:

👉 You must create their object manually using @Bean.

Example with Kafka:

@Bean

public KafkaAdmin kafkaAdmin() {

Map<String, Object> config = new HashMap<>();

config.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

return new KafkaAdmin(config);

}

✅ Summary Table:

Class Type	Can Use `@Component`?	When to Use `@Bean`
Your own class	✅ Yes	Optional (for customization)
Third-party/inbuilt	❌ No	✅ Required

✅ 1. What is `@Component`?

Marks a class as a Spring-managed bean.
Spring auto-detects it during component scanning (@ComponentScan).
Mostly used for business logic classes.

➡️ Spring will automatically create a single object (singleton) of this class and manage it.

✅ 2. What is `@Configuration`?

It is a special type of @Component.
Used to manually define beans using the @Bean annotation.
Typically used for third-party library classes, configuration settings, Kafka topics, etc.

➡️ This class itself becomes a Spring bean and also tells Spring to register the returned object (ABC) as a bean.

🆚 Difference Between @Component and @Configuration




Feature
@Component
@Configuration




Purpose
Marks a simple bean class
Used to define bean creation methods


Can contain @Bean methods 
❌ No
✅ Yes


Usage
For business logic, services, repos
For custom object creation/configuration


Auto-scanned?
✅ Yes (via @ComponentScan)
✅ Yes (it's a specialized component)


Proxy behavior




✅ When to Use What?




Situation
Use




You create your own service/repository/controller
@Component, @Service, etc.


You need to use a third-party class (e.g. KafkaAdmin, ObjectMapper)
@Configuration + @Bean


You want full control over object creation
@Bean method inside @Configuration



==========================================================================================



If you're creating Kafka topics programmatically using KafkaAdmin and NewTopic beans in Spring, here’s how Spring Boot internally handles the "topic already exists" situation:

✅ What Happens Behind the Scenes?
When your application starts, Spring Boot (through KafkaAdmin) looks for all @Beans of type NewTopic and tries to create them using the AdminClient API.


❓ What if the topic already exists?

Kafka does not allow duplicate topics. If a topic already exists, it will NOT throw an exception, it will simply skip creation silently.

@Bean
public NewTopic myTopic() {
    return TopicBuilder.name("employee-events")
            .partitions(3)
            .replicas(1)
            .build();
}


✅ If topic "employee-events" already exists, Spring just skips creation.

❌ It will not throw an error or create a duplicate.


🔍 How does it "check"?
Internally, Spring calls:
adminClient.createTopics(...)

And Kafka AdminClient queries the cluster metadata to check if the topic already exists.

  
📌 Agenda: Kafka Producer in Spring Boot
🎯 Goal:
Send a message from a Spring Boot application to a Kafka topic.

🏗️ Step-by-Step Plan
Step 1: Setup Application


Ensure Kafka and Zookeeper (or Kraft if you use new version) are running.


Add required dependencies (spring-kafka, spring-boot-starter-web) in pom.xml.




👉 Purpose:

Spring Boot needs Kafka libraries to connect and produce messages.

Step 2: Configure Kafka Producer


Create a configuration class for ProducerFactory and KafkaTemplate.

👉 Purpose:





ProducerFactory → Defines how a Kafka Producer is created.


KafkaTemplate → Helper class that we use in our code to send messages easily.


    Step 3: Create Producer Service


A Spring @Service that uses KafkaTemplate to send messages.


     👉 Purpose:
     Encapsulates producer logic in one class for reusability.

    Step 4: Create REST Controller


An API endpoint (/publish) that accepts a message and calls the service to send it.



     👉 Purpose:
      So we can easily test producer via browser or Postman.

     Step 5: Test


Use Postman:
    http://localhost:8080/kafka/publish?message=HelloKafka



     Verify in Kafka console consumer or via your consumer app.


🔁 Flow Recap:


Client → calls REST endpoint


Controller → receives request, passes message


Service → uses KafkaTemplate to send message


Kafka Producer → publishes message to topic


Kafka Broker → stores it for consumers



🔹 1. Core Concept: Serialization & Deserialization
👉 Problem:

Kafka can only send and store bytes.

But in Java / Spring Boot, we usually deal with objects (like POJOs).
So we need a way to convert:


Serialization → Convert Java Object ➝ Bytes (to send into Kafka)


Deserialization → Convert Bytes ➝ Java Object (to consume from Kafka)

👉 In JSON messaging:


On Producer side → convert POJO → JSON String → Bytes


On Consumer side → convert Bytes → JSON String → POJO


Spring Kafka already provides support for this using JsonSerializer and JsonDeserializer.

🔹 2. Tools Used



Producer: Uses KafkaTemplate with JsonSerializer


Consumer: Uses @KafkaListener with JsonDeserializer


POJO: Our custom class (e.g. Employee)


Controller: REST API to send messages


📌 Project: Wikimedia Stream → Kafka → Database


🏗️ High-Level Architecture

Wikimedia Stream API  --->  Producer Service  --->  Kafka Topic  
                                                    |  
                                                    v  
                                             Consumer Service  
                                                    |  
                                                    v  
                                                  Database



Wikimedia Stream API → Producer → Kafka Topic → Consumer 
          ↓
      JSON Event
          ↓
Convert JSON → POJO (WikimediaEvent)
          ↓
Validate & Save into MySQL



✅ Flow


Call http://localhost:8080/start-stream


Producer starts streaming Wikimedia edits → sends to Kafka topic wikimedia-stream.


Consumer reads from topic → saves into MySQL via WikimediaDataRepository.


✅ Flow


OkHttp EventSource connects to Wikimedia SSE.


WikimediaEventHandler receives each JSON message.


Message is logged and published to Kafka topic wikimedia_recentchange.


You can later consume from Kafka and store into your MySQL database.

Wikimedia (SSE Stream)
        |
        v
Spring Boot Producer
        |
        v
Kafka Topic
        |
        v
Spring Boot Consumer
        |
        v
MySQL DB


This library provides classes like:


EventSource → Opens a connection to an SSE endpoint.


EventHandler → An interface you implement to handle events from the stream.

🔄 How It Works


EventSource starts a connection to Wikimedia API.
EventHandler eventHandler = new WikimediaEventHandler(kafkaTemplate, "wikimedia_recentchange");
EventSource eventSource = new EventSource.Builder(eventHandler, URI.create("https://stream.wikimedia.org/v2/stream/recentchange")).build();
eventSource.start();
Every time a new event comes in (e.g., someone edits a Wikipedia page),

→ onMessage() of your WikimediaEventHandler is called.


Inside onMessage, you send the event data to Kafka.

🔑 In Short



EventHandler is not Spring’s, it’s from okhttp-eventsource library.


It’s like a listener that reacts whenever Wikimedia pushes a new event.


You implement it to define what to do when data arrives → e.g., publish to Kafka.

@Override
public void onMessage(String event, MessageEvent messageEvent) {
    System.out.println("Event name: " + event);  
    String jsonPayload = messageEvent.getData();

    // Send raw JSON to Kafka
    kafkaTemplate.send("wikimedia_recentchange", jsonPayload);
}


event: "message" or null


messageEvent.getData(): the whole JSON about the Wikipedia edit

Draw a sequence diagram showing how the flow works (EventHandler → EventSource → KafkaTemplate → Kafka → Consumer → DB).


🔑 In Simple Words



String event → Label/name of the event (like “edit”). Often not used here.


MessageEvent → The actual Wikimedia data (JSON string + metadata).



✅ 2. Sequence Diagram (Conceptual Flow)
Here’s how your Wikimedia Stream → Kafka → DB flow works:

[Wikimedia Stream] 
        |
        v
 EventSource (OkHttp EventSource)
        |
        v
 EventHandler (receives JSON messages)
        |
        v
 KafkaTemplate.send("wikimedia-topic", eventJson)
        |
        v
 Kafka Broker (Topic: wikimedia-topic)
        |
        v
 KafkaConsumer (Spring Boot Consumer)
        |
        v
 JPA Repository.save(WikimediaData)
        |
        v
 MySQL Database (wikimedia_data table)

This library provides classes like:
EventSource → Opens a connection to an SSE endpoint.
EventHandler → An interface you implement to handle events from the stream.


📌 Explanation in simple terms


EventSource: Opens a live connection to Wikimedia’s EventStreams API.


EventHandler: Listens to events and receives raw JSON data.


KafkaTemplate: Produces (publishes) that JSON event into Kafka.


Kafka Broker: Acts as a middleman (message queue).


Consumer: Spring Boot service that listens to Kafka topic.


Repository: Converts JSON into an Entity and saves it in MySQL.

Wikimedia Stream → EventHandler → Producer → Kafka Topic
                                        ↓
                                Database (raw JSON)
                                        ↓
                                 Kafka Consumer
                                        ↓
                             Database (structured data)


Q1. Explain the flow of your project.

A1. Wikimedia events stream → EventHandler → Producer saves raw JSON to DB and Kafka → Kafka Consumer reads messages → parses JSON into structured Entity → saves in MySQL.
    


ore Components & Definitions
1. EventHandler


Definition: A listener class that handles real-time events from an external source.


In Project: WikimediaEventHandler implements EventHandler to receive new Wikimedia changes.


Analogy: Like a waiter who brings food orders from the kitchen.

@Override
public void onMessage(String event, MessageEvent messageEvent) {
    producer.sendMessage(messageEvent.getData());
}


2. EventSource


Definition: Client that opens a connection to a streaming server (Wikimedia) and continuously receives Server-Sent Events (SSE).


In Project: Connects to Wikimedia URL and pushes events into the EventHandler.


Example:
EventSource.Builder builder = new EventSource.Builder(eventHandler, URI.create(url));
3. Kafka


Definition: A distributed messaging platform for high-throughput, fault-tolerant real-time data streaming.


In Project: Used to publish Wikimedia events (Producer) and consume them (Consumer).


Key Concepts:


Topic: Named channel for messages (e.g., wikimedia_events).


Partition: Split of a topic for parallelism and scalability.


Producer: Publishes messages to Kafka.


Consumer: Subscribes and reads messages from Kafka.


Group ID: A logical group of consumers; Kafka ensures each message in a topic partition is delivered to one consumer in the group.


Zookeeper: Coordinates Kafka brokers (older versions; in new Kafka versions, replaced by KRaft mode).


Offset: Position of a consumer in a partition.
4. Kafka Producer


Definition: Publishes data to a Kafka topic.


In Project: WikimediaProducer saves raw JSON in MySQL and sends to Kafka.


Example:
kafkaTemplate.send("wikimedia_events", message);
@KafkaListener(topics = "wikimedia_events", groupId = "wikimedia_group")
public void consume(String message) { ... }

6. ObjectMapper


Definition: Jackson library class to map JSON ↔ Java Objects.


In Project: Used in Consumer to parse raw JSON into fields.


Example:JsonNode node = objectMapper.readTree(message);
7. DTO (Data Transfer Object)


Definition: A plain Java object used to transfer data between layers.


In Project: WikimediaEventDTO stores eventId, user, title, timestamp before converting to Entity.


Benefit: Adds validation and transformation before persisting to DB.

8. Entity & Repository


Entity: Java class mapped to a DB table using JPA annotations.


Repository: Interface extending JpaRepository for CRUD operations.
9. Zookeeper



Definition: Centralized service for maintaining Kafka cluster metadata, leader election, and configuration.


Note: Kafka 3+ can run without Zookeeper using KRaft mode.


In Interview: You can mention you used a Zookeeper-backed Kafka cluster.
🔹 Interview Questions & Answers
Q1. Explain the flow of your project.

A1. Wikimedia events stream → EventHandler → Producer saves raw JSON to DB and Kafka → Kafka Consumer reads messages → parses JSON into structured Entity → saves in MySQL.
Q2. Why did you use Kafka here?

A2. Kafka allows us to handle high-throughput, real-time events asynchronously. It decouples the Producer from the Consumer so each can scale independently.
Q3. What’s the difference between Entity and DTO?

A3. Entity maps directly to a DB table, DTO is a lightweight object used for transferring and validating data between layers.
Q4. What is the role of EventHandler and EventSource?

A4. EventSource opens a connection to Wikimedia’s SSE endpoint, and EventHandler receives the events and forwards them to the Producer.
Q5. What is a Kafka Topic and Group ID?

A5. A Topic is like a channel for messages. Group ID ensures that in a consumer group, each message from a partition is delivered to only one consumer instance for parallel processing.
Q6. How does ObjectMapper help in your Consumer?

A6. ObjectMapper from Jackson parses the JSON string into a Java object (DTO) or extracts fields directly, making JSON processing easy and reliable.
Q7. Why do we need Zookeeper in Kafka?

A7. Zookeeper manages metadata, leader election, and broker coordination. (Note: Kafka 3+ can use KRaft mode instead of Zookeeper.)
Q8. How would you handle null fields in Wikimedia events?

A8. Either make DB columns nullable and provide default values in Consumer, or filter only edit type events in EventHandler to ensure all required fields are present.
🔹 Key Terms in One Line




















EventHandler → Receives and processes incoming events.


EventSource → Connects to external SSE stream.


ObjectMapper → JSON ↔ Java converter.


DTO → Transfer object, not tied to DB.


Entity → DB table representation.


Repository → Data access layer for DB.


Topic → Channel in Kafka for messages.


Group ID → Consumer group for load balancing.


Producer → Publishes messages to Kafka.


Consumer → Reads messages from Kafka.


Zookeeper → Manages Kafka cluster coordination.


Offset → Position marker in Kafka partition.

kafka

✅ spring.kafka.bootstrap-servers=localhost:9092

🧠 Meaning:

✅ Why:

✅ spring.kafka.consumer.group-id=my-group

🧠 Meaning:

✅ Why:

✅ spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer

✅ spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer

🧠 Meaning:

✅ Why:

✅ spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer

✅ spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer

🧠 Meaning:

✅ Why:

✅ spring.kafka.consumer.enable-auto-commit=true

✅ spring.kafka.consumer.auto-commit-interval=1000

🔍 What You Said:

🎯 Real-World Rule

💡 But Don’t Worry:

🤖 How it Works Together

⚠️ Risk (Important)

🔹 Case 1: Business failure (VALID failure)

🔸 Case 2: System failure (UNEXPECTED failure)

🧪 Bonus Tip for Testing:

🔴 Problem with auto.create.topics.enable=true

✅ What to Do in Production

1. Disable auto-creation

2. Create Topics Programmatically — Safe Way

✅ Use KafkaAdmin + NewTopic in config

🔧 What does KafkaAdmin do in Spring Boot?

✅ Why use it?

🧠 Why do we pass a Map<String, Object>?

🔁 What happens at runtime?

⚠️ NOTE:

🧪 So when should you use KafkaAdmin + NewTopic?

🏭 In Production, You Should:

🔧 Optional: Disable auto topic creation

🔍 What is @Bean in Spring?

✅ Summary for Interview

🔸 Case 1: Your own classes (custom classes like Employee)

🔸 Case 2: Third-party or inbuilt classes (like KafkaAdmin, KafkaTemplate, Java SDK classes)

✅ 1. What is @Component?

✅ 2. What is @Configuration?

✅ What Happens Behind the Scenes?

❓ What if the topic already exists?

🔍 How does it "check"?

📌 Agenda: Kafka Producer in Spring Boot

🎯 Goal:

🏗️ Step-by-Step Plan

Step 1: Setup Application

Step 2: Configure Kafka Producer

Step 3: Create Producer Service

Step 4: Create REST Controller

Step 5: Test

🔁 Flow Recap:

🔹 1. Core Concept: Serialization & Deserialization

🔹 2. Tools Used

✅ Flow

🔄 How It Works

🔑 In Simple Words

✅ 2. Sequence Diagram (Conceptual Flow)

ore Components & Definitions

1. EventHandler

2. EventSource

3. Kafka

4. Kafka Producer

6. ObjectMapper

7. DTO (Data Transfer Object)

8. Entity & Repository

9. Zookeeper

Definition: Centralized service for maintaining Kafka cluster metadata, leader election, and configuration. Note: Kafka 3+ can run without Zookeeper using KRaft mode. In Interview: You can mention you used a Zookeeper-backed Kafka cluster.

🔹 Interview Questions & Answers

🔹 Key Terms in One Line

Comments

Post a Comment

Popular posts from this blog

Two Sum II - Input Array Is Sorted

Comparable Vs. Comparator in Java

Increasing Triplet Subsequence

✅ `spring.kafka.bootstrap-servers=localhost:9092`

✅ `spring.kafka.consumer.group-id=my-group`

✅ `spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer`

✅ `spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer`

✅ `spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer`

✅ `spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer`

✅ `spring.kafka.consumer.enable-auto-commit=true`

✅ `spring.kafka.consumer.auto-commit-interval=1000`

🔴 Problem with `auto.create.topics.enable=true`

✅ Use `KafkaAdmin` + `NewTopic` in config

🔧 What does `KafkaAdmin` do in Spring Boot?

🧠 Why do we pass a `Map<String, Object>`?

🧪 So when should you use `KafkaAdmin` + `NewTopic`?

🔍 What is `@Bean` in Spring?

🔸 Case 1: Your own classes (custom classes like `Employee`)

✅ 1. What is `@Component`?

✅ 2. What is `@Configuration`?

Definition: Centralized service for maintaining Kafka cluster metadata, leader election, and configuration.

Note: Kafka 3+ can run without Zookeeper using KRaft mode.

In Interview: You can mention you used a Zookeeper-backed Kafka cluster.