Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever.
Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework – an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters.
You’ve heard the hype about Hadoop: it runs petabyte – scale data mining tasks insanely fast, it runs gigantic tasks on clouds for absurdly cheap, it’s been heavily committed to by tech giants like IBM, Yahoo!, and the Apache Project, and it’s completely open-source. But what exactly is it, and more importantly, how do you even get a Hadoop cluster up and running?
If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance.
With this digital Early Release edition of Hadoop: The Definitive Guide, you get the entire book bundle in its earliest form – the author’s raw and unedited content – so you can take advantage of this content long before the book’s official release. You’ll also receive updates when significant changes are made. Ready to unleash the power of your massive dataset? With the latest edition of this comprehensive resource, you’ll learn how to use Apache Hadoop to build and maintain reliable, scalable, distributed systems. It’s ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.
Until now, design patterns for the MapReduce framework have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework you’re using.