Faster previews. Personalized experience. Get started with a FREE account.
Data Science from Scratch

Data Science from Scratch

by Joel Grus
464 Pages · 2015 · 7.1 MB · 1,527 Downloads · New!
" Happiness doesn't result from what we get, but from what we give. ” ― Ben Carson
Data-Intensive Text Processing with MapReduce
by Chris Dyer
178 Pages · 2010 · 1.7 MB · 4,927 Downloads · New!
Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever.
Hadoop: The Definitive Guide, 2nd Edition
by Tom White
625 Pages · 2010 · 6.5 MB · 4,700 Downloads · New!
Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework – an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters.
Pro Hadoop
by Jason Venner
440 Pages · 2009 · 6.9 MB · 1,521 Downloads · New!
You’ve heard the hype about Hadoop: it runs petabyte – scale data mining tasks insanely fast, it runs gigantic tasks on clouds for absurdly cheap, it’s been heavily committed to by tech giants like IBM, Yahoo!, and the Apache Project, and it’s completely open-source. But what exactly is it, and more importantly, how do you even get a Hadoop cluster up and running?
Hadoop Operations
by Eric Sammer
298 Pages · 2012 · 3.5 MB · 2,696 Downloads · New!
If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance.
Hadoop: The Definitive Guide, 3rd Edition
by Tom White
630 Pages · 2012 · 6.3 MB · 3,597 Downloads · New!
With this digital Early Release edition of Hadoop: The Definitive Guide, you get the entire book bundle in its earliest form – the author’s raw and unedited content – so you can take advantage of this content long before the book’s official release. You’ll also receive updates when significant changes are made. Ready to unleash the power of your massive dataset? With the latest edition of this comprehensive resource, you’ll learn how to use Apache Hadoop to build and maintain reliable, scalable, distributed systems. It’s ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.
MapReduce Design Patterns
by Adam Shook
252 Pages · 2012 · 5.4 MB · 1,413 Downloads · New!
Until now, design patterns for the MapReduce framework have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework you’re using.