- Paperback: 322 pages
- Publisher: O'Reilly Media, Inc, USA; 1 edition (10 October 2017)
- Language: English
- ISBN-10: 9781491936160
- ISBN-13: 978-1491936160
- ASIN: 1491936169
- Product Dimensions: 17.8 x 1.7 x 23.3 cm
- Boxed-product Weight: 458 g
- Average Customer Review: Be the first to review this item
- Amazon Bestsellers Rank: 37,460 in Books (See Top 100 in Books)
Other Sellers on Amazon
+ FREE Delivery
Kafka - The Definitive Guide Paperback – 10 Oct 2017
|New from||Used from|
Frequently bought together
Customers who bought this item also bought
About the Author
Neha Narkhede is co-founder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s streaming infrastructure built on top of Apache Kafka and Apache Samza. She is one of the initial authors of Apache Kafka and a committer and PMC member on the project.
Gwen Shapira is a system architect at Confluent helping customers achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of "Hadoop Application Architectures", and a frequent presenter at data driven conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.
Todd is a Staff Site Reliability Engineer at LinkedIn, tasked with keeping the largest deployment of Apache Kafka, Zookeeper, and Samza fed and watered. He is responsible for architecture, day-to-day operations, and tools development, including the creation of an advanced monitoring and notification system. Todd is the developer of the open source project Burrow, a Kafka consumer monitoring tool, and can be found sharing his experience on Apache Kafka at industry conferences and tech talks. Todd has spent over 20 years in the technology industryrunning infrastructure services, most recently as a Systems Engineer at Verisign, developing service management automation for DNS, networking, and hardware management, as well as managing hardware and software standards across the company.
From the Publisher
Replication of partitions in a cluster
From the Preface
Who Should Read This Book
Kafka: The Definitive Guide was written for software engineers who develop applications that use Kafka’s APIs and for production engineers (also called SREs, devops, or sysadmins) who install, configure, tune, and monitor Kafka in production. We also wrote the book with data architects and data engineers in mind—those responsible for designing and building an organization’s entire data infrastructure. Some of the chapters, especially chapters 3, 4, and 11 are geared toward Java developers. Those chapters assume that the reader is familiar with the basics of the Java programming language, including topics such as exception handling and concurrency. Other chapters, especially chapters 2, 8, 9, and 10, assume the reader has some experience running Linux and some familiarity with storage and network configuration in Linux. The rest of the book discusses Kafka and software architectures in more general terms and does not assume special knowledge.
Another category of people who may find this book interesting are the managers and architects who don’t work directly with Kafka but work with the people who do. It is just as important that they understand the guarantees that Kafka provides and the trade-offs that their employees and coworkers will need to make while building Kafka-based systems. The book can provide ammunition to managers who would like to get their staff trained in Apache Kafka or ensure that their teams know what they need to know.
Customers who viewed this item also viewed
No customer reviews
|5 star (0%)|
|4 star (0%)|
|3 star (0%)|
|2 star (0%)|
|1 star (0%)|
Review this product
Most helpful customer reviews on Amazon.com
It details many configuration parameters that affect clustering, replication, message delivery.
It also offers valuable material for system administrators who need to manage and monitor a running cluster.
It's not very good for programmers: Java API coverage is partial and inadequate.
The chapters are uncoordinated and poorly integrated with some repeated material.
Many errors in the code samples and the text.
Chapters 3, 4, and 11 discuss the programming API.
The first two deal with message producers and consumers, present the Java API for publishing and consuming messages and discuss delivery semantics. They also detail configuration options that can be used to customize message producers and consumers. Security and access control is mentioned but never really discussed. There is not a single full program that can be run but several snippets full of errors.
Chapter 11 offers a tutorial introduction to stream processing: what it is and what problems it solves. Three code examples illustrate Kafka Streams, the Stream framework that comes with Kafka and provides a high level abstraction for manipulating data streams. The chapter gives you a taste of what you can do with Kafka Streams but doesn't do much to teach how to use it.
Chapter 2 gives a tutorial on Kafka installation and discusses several configuration options that may help in tuning a Kafka cluster. Basic ZooKeeper knowledge can help understand.
Chapter 5 delves into the internals of replication, partitions, request processing, and message storage on physical files.
Chapter 6 discusses data delivery guarantees. It revisits producer and consumer issues related to message delivery, and how to configure brokers and topics. It also explains how "at least once" delivery is easily achievable while "exactly once" delivery is not.
Chapter 7 briefly explores the Kafka Connect architecture: a producer/consumer alternative to exchange data between Kafka and another data storage system.
Chapters 8-10 have a more sysadmin-oriented content.
Chapter 8 explores cross cluster data mirroring, why you may need it, available alternative architectures/models and issues of lost or duplicated data you may come across. It also introduces Kafka's own cluster mirroring tool MirrorMaker, its configuration and tuning.
Chapter 9 covers command line tools to create and manage topics and partitions.
Chapter 10 is on monitoring a Kafka cluster and explores JMX metrics exposed by brokers, producers and consumers that can help in monitoring and detecting problems. Basic JMX knowledge is required to follow along.
Look for similar items by category
- Books > Computers & Internet > Databases & Big Data > Data Mining
- Books > Computers & Internet > Databases & Big Data > Data Modelling & Design
- Books > Computers & Internet > Databases & Big Data > Data Processing
- Books > Computers & Internet > Databases & Big Data > Introduction to Databases
- Books > Textbooks & Study Guides > Textbooks > Computer Science > Database Storage & Design