LAST UPDATED: November 2020
Apache Spark is the next generation batch and stream processing engine. It’s been proven to be almost 100 times faster than Hadoop and much much easier to develop distributed big data applications with. It’s demand has sky rocketed in recent years and having this technology on your resume is truly a game changer. Over 3000 companies are using Spark in production right now and the list is growing very quickly! Some of the big names include: Oracle, Hortonworks, Cisco, Verizon, Visa, Microsoft, Amazon as well as most of the big world banks and financial institutions!
In this course you’ll learn everything you need to know about using Apache Spark in your organization while using their latest and greatest Java Datasets API. Below are some of the things you’ll learn:
How to develop Spark Java Applications using Spark SQL Dataframes
Understand how the Spark Standalone cluster works behind the scenes
How to use various transformations to slice and dice your data in Spark Java
How to marshall/unmarshall Java domain objects (pojos) while working with Spark Datasets
Master joins, filters, aggregations and ingest data of various sizes and file formats (txt, csv, Json etc.)
Analyze over 18 million real–world comments on Reddit to find the most trending words used
Courses : 9
Specification: Master Apache Spark – Hands On!