New! Updated for Spark 3, more hands–on exercises, and a stronger focus on DataFrames and Structured Streaming.
Big data analysis is a hot and highly valuable skill and this course will teach you the hottest technology in big data: Apache Spark. Employers including Amazon, EBay, NASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault–tolerant Hadoop cluster. You’ll learn those same techniques, using your own Windows system right at home. It’s easier than you might think.
Learn and master the art of framing data analysis problems as Spark problems through over 20 hands–on examples, and then scale them up to run on cloud computing services in this course. You’ll be learning from an ex–engineer and senior manager from Amazon and IMDb.
Learn the concepts of Spark’s DataFrames and Resilient Distributed Datastores
Develop and run Spark jobs quickly using Python
Translate complex analysis problems into iterative or multi–stage Spark scripts
Scale up to larger data sets using Amazon’s Elastic MapReduce service
Understand how Hadoop YARN distributes Spark across computing clusters
Learn about other Spark technologies, like Spark SQL, Spark Streaming, and GraphX
By the end of this course, you’ll be running code that analyzes gigabytes worth of information in the cloud in a matter of minutes.
Instructor Details
Courses : 7
Specification: Taming Big Data with Apache Spark and Python – Hands On!
|
33 reviews for Taming Big Data with Apache Spark and Python – Hands On!
Add a review Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
$99.99 $13.99
Matthew Price –
About 3/4 of the course actually had nothing to do with Spark 3, and was clearly running on versions of Spark 1.x for most of it. Some of the older scripts no longer run correctly (the ones that use the u.ITEM file) which is apparent as the recent acked on sections at the end that use these files read them with additional parameters to address this issue. The course was good (particularly the one example on streaming), but as so much of it felt outdated I cannot recommend this to my colleagues.
Jianyu You –
Explanation is clear.
Francisco Carlos de Lima Pereira –
sim, at o prosente momento…
Mihnea Stefan Tomos –
The curriculum is presented in a clear and concise manner and is comprehensive.
Uttam Kedia –
More practice problems required here
Keith –
Note that this only gets you started, but doesn’t teach you to an intermediate level. For that, you got to learn from experience, or read a book in depth. To understand book contents, you would have needed some introductory framework to sort out these knowledge, and this is where the course is useful. Frank gave a very good introduction to Spark, what its components are, and how to set up a cluster etc. 4 star rating because the final parts of Spark Streaming and MLlib felt very rushed. Would have appreciated a little more depth on these topics
Sachin Sarathe –
I was expecting more assignments that you can give to us … We can grasp the theory and concepts.. all good… but all will be in vain if we don’t get a good hands on practice on it by ourselves…… Hope you can look into it…. But you are a great teacher :):)
Derek Law –
There is some great information here, but the teaching style is mostly just telling. I would have liked a more experiential learning approach where the learner was more active and involved in the learning. I found myself just watching the videos to get a general idea of what Spark can do, and then going to other tutorials to apply it to my own work.
Bryna Zhao –
Great course, easy to understand and follow!
JC Ulat –
not really macos friendly, set up took a long time, for example, python 3.7 is currently the latest compatible version for pyspark
Eisha Budki –
Playback of videos was troublesome.Course content was good
Qiuyu Gu –
If you want to learn Spark, it is a good course to help you get started.
Harsh Joshi –
Very nice introductory course on Pyspark.
Marshall –
I would prefer more, smaller sized exercises throughout. But the content is good and delivered in a clear way.
Naveen Srinivasan –
Good way of explaining the concepts
Raul Gil De Sagredo Martin –
I feel that I have understood how spark works quite deeply, and I feel confident working with the tools that were taught. On the improvement side, I’d say that working in a cluster needs more explanation and more examples that do not require paying. Any case, amazing course!
XIAOYA ZHANG –
its a very good crash course on spark!!! I like the instructor’s way of explaining things
Vladimir Ermilov –
It’s very good course from practical point of view, but it’s not about basics how spark works, if you want theory before you start it, take a book first for better understanding.
Khalid Hanif –
Very through instructor, I really like his way of teaching…
Sobhan Singh –
video not image not cleared
Eric V –
First, please know that I completed the course, so this isn t just a review of the first lecture or two. I did all 51 videos. OVERALL IMPRESSIONS: Very good course. Would recommend. Will definitely take more courses by Frank. IMPRESSION OF INSTRUCTOR: Clear, concise, patient, encouraging, careful, attentive. He has good theory of mind, (meaning that he teaches with the student s perspective in mind which is surprisingly rare. Not all instructors are good teachers, but Frank is). Especially in the beginning he is attentive and deliberate and walked through the code step by step (some might consider a little slow, but it was easy to skip forward). In most cases, Frank was good able explaining both the overall concepts and structure of the code as well as walking through each element of the code. In certain Sections (see further below), things ramped up a little too fast or became a little hand wavy. (Meaning that he didn t walk through the code with care), but to Frank s credit, he did indicate that the purpose of some sections was to grasp the overall concept of what Spark could do over the need to teach every component of a coding recipe. One problem I have with many (most?) instructors (but not with Frank) is that they tend to use lazy, sloppy, and non specific naming conventions for various components which ends up confusing the learner. (For example, they might choose to name a variable, value, when they already have a function called, values, while employing a built in method called, value, while referencing arguments labeled value, and the whole time referring to various quantities as value. I am so very thankful that Frank was deliberate, specific, and creative in his choice of nomenclature for various elements. It made learning a breeze. SUPPORT / ASSISTANCE: Excellent. My TA was, Emad. He responded quickly, patiently, clearly, and in a friendly way. He really knew his stuff and even offered additional recommendations that were thoughtful and helpful. I would not have been able to complete this course without his assistance. Thank you, Emad. OBSERVATIONS / ROOM FOR IMPROVEMENT: Again, I wholeheartedly recommend this course and would recommend it. That said, here are some observations that could be improved. (1) In many Udemy courses I have taken, I have found that a lot of time is spent teaching the student processes and techniques that would probably not be used in a modern working environment. While this course wasn t super bad about doing that, they sure did spend an awful lot of time talking about RDDs when they later confessed (near the end of the course) that the trend was toward Spark SQL like structure and syntax. They even bemoaned how clumsy, verbose and awkward the RDD approach was. It makes me a little resentful that they didn t just lead with that to start with. It feels a decent chunk of the start of the course (dealing with RDDs) may not have been a great investment of my time and mana. (2) Portions of Sections 3 and 4 were a bit hand wavy. To me, they felt more like demonstrations and less like instruction. In other words, I probably would not be able to reproduce (on my own) what was presented. That said, I was giddy and euphoric over Section 5 which dealt with SparkSQL, DataFrames and DataSets. It was easy for me to think of practical use cases for those methodologies and I would be able to re create those products on my own. Arguably my favorite section. (3) I believe Frank may have mentioned very early on that MLLib is not a part of Spark that is getting a lot of development attention. (The implication being that there may be better tools out there for ML work). It would have been good for Frank to re iterate that in Section 7. That said, I found his presentation on Spark Streaming to be useful. I was not as keen on the presentation for GraphX.
Aritra Datta –
Great Learning Experience
Topler –
On a Mac the resource file overrides the course when you open it sometimes. Not a real issue but a tad tedious as the lesson then starts from the beginning. Advise people download resource files first and then start the lesson.
Lucero Yanez –
Excellent course to learn and tame Spark with Python.
Carlos Eduardo Silva –
Good
Craig Woollett –
Pain setting up but worth it. The instructor has excellent diction
Nuha Alharbi –
I enjoyed this course a lot and I’m totally new to this topic, it was so clear and everything explained very well.
Simeon Tsvetankov –
Learned a lot, the structure is great.
Abhijeet Sondkar –
good so far
Avijeet Dutta –
A very good course for all the Spark enthusiasts.
Anurag Mishra –
The content for DataFrame can be improved.
Tapasya Ghorpade –
The course content is very old. Most of the certifications are now only based on dataframes and not on RDDs. 80% material covers RDD only. SparkSQL content needs to be more. Waste of money.
Joao Soares –
Needs to be updated to new installers and websites for JDK, Anaconda 3.8 and Spark 3.0.0