Apache Hadoop is an open-source framework for extracting information from massively large datasets using the MapReduce programming model. It is capable of distributing workloads across multiple nodes in a cluster for fast parallel processing, and it uses the Hadoop Distributed File System (HDFS) to provide high-aggregate bandwidth to machines comprising the cluster. In this series, Frank La Vigne takes a deep dive into Hadoop and the Hadoop ecosystem and demonstrates how to run these tools locally or in Azure HDInsight clusters to make short work of big data.
Course Title | Author | Duration | Topic(s) |
---|---|---|---|
Introducing Hadoop | Frank La Vigne | 00:23:27 | Data Science, Hadoop, Azure, Big Data |
Processing Big Data with MapReduce | Frank La Vigne | 01:07:23 | Data Science, Azure, Hadoop, Big Data |
Using Hive to Query Hadoop | Frank La Vigne | 00:57:31 | Data Science, Azure, Hadoop, Big Data, Hive |
Using Pig with Hadoop | Frank La Vigne | 00:50:13 | Data Science, Pig, Hadoop, Big Data |
Using HBase | Frank La Vigne | 00:50:35 | Data Science, HBase, Hadoop, Big Data |