Kafka notes
- Kafka ibrokers maintain three types of logs: server.log contains Kafka
errors zookeeper.out cointains zookeeper errors controller.log contains
information on controller election state-change.log contains cluster state
change information Kafka has a leader replica and a follower replica Kafka
followers use fetch requests to retrieve messages from the partition leader
- If replica doesn't receive message in 10S it become sout of sync* Replica
checks in with zookeeper every 6 seconds. If it can't check in it's considered
not in sync.
- Three main types of requests:
- produce request
- fetch request
- metadata request
- Acceptor thread creates connection from client to broker
- Processor thread takes request from clients and placed them into a request
queue
- IO thread processes requests in reuqest queue
- Request queue contains the requests waiting to be processed
- Response queue contains requests waiting to be sent back to client
- Produce and fetch requests have to go through the leader
- Metadata requests are sent to find out cluster topology info
- If ACKs=all request is stored in purgatory until committed to all replicas
- Reliability guarantees:
- ACKS=0 - no delivery guarantee
- ACKS=1 - Committed on leader
- ACKS=ALL - Committed on all replicas
- Committed offsets are sent to let Kafka know the consumer has processed
messages up to that offset
- Kafka consumer polls a broker periodically. Poll time set in consumer.
Partition directory structure
- Each partition directory will contain one more files:
00000000000000000000.index 00000000000000099668.timeindex
00000000000000298981.log 00000000000000398649.snapshot
00000000000000000000.log 00000000000000199326.index
00000000000000298981.snapshot 00000000000000398649.timeindex
00000000000000000000.timeindex 00000000000000199326.log
00000000000000298981.timeindex leader-epoch-checkpoint
00000000000000099668.index 00000000000000199326.timeindex
00000000000000398649.index
00000000000000099668.log 00000000000000298981.index
00000000000000398649.log
- Log files contain the actual messages Checkpoint file used to handle leader
failures between replicas Index files map offsets to locations inside the
logs * The starting offset is part of the index / log names:
00000000000000199326.log
Misc commands
- Get Kafka controller who is responsible for electing partion leaders
zookeeper-shell.sh zoo01:2181/kafka
get /controller