- Paperback: 320 pages
- Publisher: O'Reilly Media, Inc, USA; 1 edition (6 June 2017)
- Language: English
- ISBN-10: 1491952962
- ISBN-13: 978-1491952962
- Product Dimensions: 17.1 x 1.3 x 22.9 cm
- Boxed-product Weight: 522 g
- Average Customer Review: 1 customer review
- Amazon Bestsellers Rank: 5,154 in Books (See Top 100 in Books)
Practical Statistics for Data Scientists Paperback – 6 Jun 2017
|New from||Used from|
Amazon Global Store
Frequently bought together
Customers who bought this item also bought
About the Author
Peter Bruce founded and grew the Institute for Statistics Education at Statistics.com, which now offers about 100 courses in statistics, roughly a third of which are aimed at the data scientist. In recruiting top authors as instructors and forging a marketing strategy to reach professional data scientists, Peter has developed both a broad view of the target market, and his own expertise to reach it.
Andrew Bruce has over 30 years of experience in statistics and data science in academia, government and business. He has a Ph.D. in statistics from the University of Washington and published numerous papers in refereed journals. He has developed statistical-based solutions to a wide range of problems faced by a variety of industries, from established financial firms to internet startups, and offers a deep understanding the practice of data science.
From the Publisher
About this Book
Data science is a fusion of multiple disciplines, including statistics, computer science, information technology and domain specific fields. As a result, a several different terms could be used to reference a given concept. Key terms and their synonyms will be highlighted throughout the book in a sidebar within the text.
This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics. Both of us came to the world of data science from the world of statistics, and have some appreciation of the contribution that statistics can make to the art of data science. At the same time, we are well aware of the limitations of traditional statistics instruction: statistics as a disciple is a century and a half old, and most statistics textbooks and courses are laden with the momentum and inertia worthy of an ocean liner.
Two goals underlie this book:
- To lay out, in digestible, navigable and easily referenced form, key concepts from statistics that are relevant to data science.
- To explain which concepts are important and useful from a data science perspective, which are less so, and why.
There was a problem filtering reviews right now. Please try again later.
Most helpful customer reviews on Amazon.com
* Decent review of core concepts
* Good coverage of importance of distinguishing between sample and population statistics
* Better discussion of bootstrapping than I've seen anywhere else
* Good ideas on dealing with non-normal data and avoiding the assumption that all data is normally distributed
* Assumes that you know R. Lots of code, no explanations of the code.
* Inconsistent level of detail and depth. Detailed coverage of mean, range, quartile, but rampant hand-waving when you get to bagging and boosting
* Many of the math explanations are unclear or incomplete. The authors make you do a lot of work to figure things out and you will need external resources
* The last part of the book is a thin and purely practical survey of ML models. You don't get much understanding of how or why things work.
It is true that the textbook does not provide in-depth coverage for all topics, but I don't think that was the intent of the authors. However, the text DOES provide an excellent introduction to topics relevant to students and data scientists. After reading the text and working through the examples, you will be equipped to further your knowledge in whichever topic you require for you data analysis task.
I found this book a very engaging read: it sets itself apart from other books on statistics in clearly telling which concepts are not-so-relevant for the modern computerized explorative analysis toolset. Many concepts that are presented in classic books on the subjects are rooted in 20s and 30s where computing power wasn't available and researches resorted to various pre-calculated distributions and formulas to do their work. A modern data-scientist's approach would eschew some of the old ways and instead rely on randomization, resampling and computing power.
This book not only tells what something is, but also why it is that way and if a concept is still relevant today.
I can recommend this book if your statistics knowledge is spotty or ephemeral, it serves its purpose well and doesn't bog down the reader with (sometimes) unnecessary mathematical concepts to demonstrate an idea.
Why the four stars:
1. Lack of examples in programming languages.
2. Complete lack of exercises (at least 1-2 exercises are necessary).
3. All scarce examples that are available are in R. No Python. :(
I dislike that the authors make a number of categorical statements of the form "Data Scientists do this" or "Data Scientists don't need that". I disagree with many of these assertions and I think they have taken a definition of "data science" which is narrower than the prevailing consensus in the industry.
This book has some errors (see, for example, the confusion matrix on page 196) but overall the accuracy is above average relative to recent norms.
As other reviewers have noted, the author's github repository for the book is currently empty. If that's important to you, check it under "andrewgbruce" on github and make sure it's been updated before you buy the book.
Look for similar items by category
- Books > Computers & Internet > Databases & Big Data > Data Processing
- Books > Computers & Internet > Databases & Big Data > Data Warehousing
- Books > Computers & Internet > Databases & Big Data > Introduction to Databases
- Books > Computers & Internet > Programming > Software Design, Testing & Engineering > Testing
- Books > Computers & Internet > Software > Mathematical & Statistical
- Books > Science, Nature & Maths > Mathematics > Applied > Probability & Statistics
- Books > Textbooks & Study Guides > Textbooks > Computer Science > Database Storage & Design
- Books > Textbooks & Study Guides > Textbooks > Science & Mathematics > Mathematics > Statistics