Prefetch Technologies // Keeping your cache lines cozy

Kafka Notes

Kafka notes

  • Kafka ibrokers maintain three types of logs: server.log contains Kafka errors zookeeper.out cointains zookeeper errors controller.log contains information on controller election state-change.log contains cluster state change information Kafka has a leader replica and a follower replica Kafka followers use fetch requests to retrieve messages from the partition leader
  • If replica doesn't receive message in 10S it become sout of sync* Replica checks in with zookeeper every 6 seconds. If it can't check in it's considered not in sync.
  • Three main types of requests:
  • produce request
  • fetch request
  • metadata request
  • Acceptor thread creates connection from client to broker
  • Processor thread takes request from clients and placed them into a request queue
  • IO thread processes requests in reuqest queue
  • Request queue contains the requests waiting to be processed
  • Response queue contains requests waiting to be sent back to client
  • Produce and fetch requests have to go through the leader
  • Metadata requests are sent to find out cluster topology info
  • If ACKs=all request is stored in purgatory until committed to all replicas
  • Reliability guarantees:
  • ACKS=0 - no delivery guarantee
  • ACKS=1 - Committed on leader
  • ACKS=ALL - Committed on all replicas
  • Committed offsets are sent to let Kafka know the consumer has processed messages up to that offset
  • Kafka consumer polls a broker periodically. Poll time set in consumer.

Partition directory structure

  • Each partition directory will contain one more files:
00000000000000000000.index      00000000000000099668.timeindex
00000000000000298981.log        00000000000000398649.snapshot
00000000000000000000.log        00000000000000199326.index
00000000000000298981.snapshot   00000000000000398649.timeindex
00000000000000000000.timeindex  00000000000000199326.log
00000000000000298981.timeindex  leader-epoch-checkpoint
00000000000000099668.index      00000000000000199326.timeindex
00000000000000398649.index
00000000000000099668.log        00000000000000298981.index
00000000000000398649.log
  • Log files contain the actual messages Checkpoint file used to handle leader failures between replicas Index files map offsets to locations inside the logs * The starting offset is part of the index / log names: 00000000000000199326.log

Misc commands

  • Get Kafka controller who is responsible for electing partion leaders
zookeeper-shell.sh zoo01:2181/kafka
get /controller