PySpark helps you perform data analysis at–scale; it enables you to build more scalable analyses and pipelines. This course starts by introducing you to PySpark’s potential for performing effective analyses of large datasets. You’ll learn how to interact with Spark from Python and connect Jupyter to Spark to provide rich data visualizations. After that, you’ll delve into various Spark components and its architecture.
You’ll learn to work with Apache Spark and perform ML tasks more smoothly than before. Gathering and querying data using Spark SQL, to overcome challenges involved in reading it. You’ll use the DataFrame API to operate with Spark MLlib and learn about the Pipeline API. Finally, we provide tips and tricks for deploying your code and performance tuning.
By the end of this course, you will not only be able to perform efficient data analytics but will have also learned to use PySpark to easily analyze large datasets at–scale in your organization.
About the Author
Danny Meijer works as the Lead Data Engineer in the Netherlands for the Data and Analytics department of a leading sporting goods retailer. He is a Business Process Expert, big data scientist and additionally a data engineer, which gives him a unique mix of skills—the foremost of which is his business–first approach to data science and data engineering.
Specification: Mastering Big Data Analytics with PySpark
|
User Reviews
Be the first to review “Mastering Big Data Analytics with PySpark” Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Price | $9.99 |
---|---|
Provider | |
Duration | 8 hours |
Year | 2020 |
Level | Intermediate |
Language | English |
Certificate | Yes |
Quizzes | Yes |
$84.99 $9.99
There are no reviews yet.