Apache Spark is a fast, in-memory data-processing engine with elegant and expressive development APIs that enable data scientists to execute streaming workloads, build sophisticated machine-learning models, and perform other tasks endemic to extracting information from large datasets. Apache Spark for Azure HDInsight makes high-performance Spark clusters available to the masses, and Azure's Data Science Virtual Machine is perfect for learning Spark and other tools such as Jupyter and Microsoft R Server. In this landmark series, Microsoft data scientist Mark Tabladillo takes a deep dive into Apache Spark and the Spark ecosystem and demonstrates how to use the Spark support in Azure to make short work of big-data workloads and build sophisticated machine-learning models.
Course Title | Author | Duration | Topic(s) |
---|---|---|---|
Introduction to Spark | Mark Tabladillo | 00:58:49 | Data Science, Spark, Azure, Big Data |
Business Intelligence Tools and Spark | Mark Tabladillo | 01:02:01 | Data Science, Spark, Azure, Big Data |
Data Processing with Spark 2 | Mark Tabladillo | 00:48:21 | Data Science, Spark, Azure, Big Data |
Text Analytics with Spark ML | Mark Tabladillo | 01:05:57 | Data Science, Spark, Azure, Machine Learning, Big Data |
Regression with Spark ML | Mark Tabladillo | 01:06:39 | Data Science, Spark, Azure, Machine Learning, Big Data |
Classification with Spark ML | Mark Tabladillo | 01:24:13 | Data Science, Spark, Azure, Machine Learning, Big Data |
Clustering with Spark ML | Mark Tabladillo | 01:10:03 | Data Science, Spark, Azure, Machine Learning, Big Data |
Recommendation with Spark ML | Mark Tabladillo | 00:59:49 | Data Science, Spark, Azure, Machine Learning, Big Data |