A technical book about popular space-efficient data structures and fast algorithms that are extremely useful in modern Big Data applications.
Probabilistic data structures is a common name for data structures based mostly on different hashing techniques. Unlike regular (or deterministic) data structures, they always provide approximated answers but with reliable ways to estimate possible errors. Fortunately, the potential losses and errors are fully compensated for by extremely low memory requirements, constant query time, and scaling, the three factors that become essential in Big Data applications.
About the book
The purpose of this book is to introduce technology practitioners which includes software architects and developers, as well as technology decision makers to probabilistic data structures and algorithms.
While it is impossible to cover all the existing amazing solutions, this book is to highlight their common ideas and important areas of application, including membership querying, counting, stream mining, and similarity estimation.
This is not a book for scientists only, but to gain the most out of it you will need to have basic mathematical knowledge and an understanding of the general theory of data structures and algorithms.
What you will learn
Reading the book, you will get a theoretical and practical understanding of probabilistic data structures and learn about their common uses.
- Learn how to solve practical issues of massive data handling
- Master the theoretical aspects of probabilistic data structures
- Identify the right data structures for your particular problems
This book consists of six chapters, each preceded by an introduction and followed by a brief summary and bibliography for further reading relating to that chapter. Every chapter is dedicated to one particular problem in Big Data applications, it starts with an in-depth explanation of the problem and follows by introducing data structures and algorithms that can be used to solve it efficiently.
This book on the Web
You can find errata, examples, and additional information at pdsa.gakhov.com. If you have a comment, technical question about the book, would like to report an error you found, or any other issue, send email to email@example.com.
In case you are also interested in Cython implementation that includes many of the data structures and algorithms from this book, please check out our free and open-source Python library called PDSA at https://github.com/gakhov/pdsa. Everybody is welcome to contribute at any time.