Apache Spark is the de–facto standard for large scale data processing. This is the first course of a series of courses towards the IBM Advanced Data Science Specialization. We strongly believe that is is crucial for success to start learning a scalable data science platform since memory and CPU constraints are to most limiting factors when it comes to building advanced machine learning models. In this course we teach you the fundamentals of Apache Spark using python and pyspark. We’ll introduce Apache Spark in the first two weeks and learn how to apply it to compute basic exploratory and data pre–processing tasks in the last two weeks. Through this exercise you’ll also be introduced to the most fundamental statistical measures and data visualization technologies. This gives you enough knowledge to take over the role of a data engineer in any modern environment. But it gives you also the basis for advancing your career towards data science. Please have a look at the full specialization curriculum: https://www.coursera.org/specializations/advanced–data–science–ibm If you choose to take this course and earn the Coursera course certificate, you will also earn an IBM digital badge. To find out more about IBM digital badges follow the link ibm.biz/badging. After completing this …
Instructor Details
Courses : 5
Specification: Fundamentals of Scalable Data Science
|
55 reviews for Fundamentals of Scalable Data Science
Add a review Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
FREE
Suyash –
There are a lot of glitch with the assignments, hope it gets fixed soon
Tamer M –
Most of the video’s subtitles need to be synced, it was hard to fully understand the Indian accent without subtitles.
Dhinson G D –
I love the course content. Simple but very informative and provides good practical exercises.
PV R K –
excellent experience
Eleni K –
I was really looking forward to this specialization but from the very first course I am really disappointed. The videos refer to various not updated information and then suddenly we are expected to do an assignment that was not at all explained in the course. I am not saying it is difficult, or not achievable but to be honest until now (week 2) it feels mostly like a waste of time.. Really sorry for this review.
Charles–Antoine d T –
very good
Charles Antoine d T –
very good
Octavio A T N –
This is a very good Data Science course. It helped me a lot to think in realistic application of Data Analysis. Impressive !!!
Saman S –
that’s wonderful
Prithvi S –
Great course, I really liked Romeo’s explanation and learned a lot.
Joshua A –
Gave excellent starting point for the course
Madan K –
Excellent course
Ivan J M –
There are a lot of not updated sections, sometimes it confuses me because in some videos he talks about how we will use Node RED but then we don’t use it.
Oritseweyinmi H A –
Strong introduction into parallel computing and big data processing. Romeo’s expertise on the subject matter, combined with his love for teaching was on show during this course. He did a great job explaining the theoretical aspects, and slowly but surely introducing us into the practical aspects as well, through the programming exercises. All in all, this has proved to be a high quality introduction into this space and I’m excited to take the next step, learn more and apply the fundamentals I have picked up here.
Tony H –
I felt that, for a course labelled as ‘Advanced’, there were too many trivial questions in the quizzes and too much hand holding in the programming assignments. That being said I did enjoy the course and learned quite a lot and look forward to the next one in the specialisation.
Ankur S –
One of the Best Courses available here on Coursera.
Gouri K –
Good overall,instructor was very good,but I feel more examples could be used especially when explaining multidimensional vector space and such basics of graphs
praveen k –
First time I got the change to work on cloud data (big data). Thanks to IBM
Jorge S A M B –
Conteudo muito bom, atual e otima didatica!
Muyanja S Z –
This has been an interesting and intellectually nourishing course
Muntakimur R –
Very Informative course, thanks IBM for this course
BAUDRY S –
The functions we need to complete looks quite messy, it’a little bit overwhelming especially for people who start with spark.
Bikash R –
PCA part was fun!
Tom V –
Nice course covering the basics. Not very difficult though.
Ahmad R J –
I liked the course because it introduced me to new topics but it did not really go further as expected from an advanced specialization. Maybe when I finished other courses, I find out that it well prepared me for the rest. However, please provide more sample datasets, similar questions, and generally more practice.
Gautham N –
It’s a very good course
Pierre–Matthieu P –
I’ve gained plenty of interesting information and valuable hands on experience. I had to work for it a little more than I should have, however. The lecturer has a strong accent, speaks very fast and the subtitles are mostly useless as they are wrong more often than not. If you take this course, be prepared to take plenty of notes and watch the videos several times.
Pierre Matthieu P –
I’ve gained plenty of interesting information and valuable hands on experience. I had to work for it a little more than I should have, however. The lecturer has a strong accent, speaks very fast and the subtitles are mostly useless as they are wrong more often than not. If you take this course, be prepared to take plenty of notes and watch the videos several times.
Ankit M –
good
Lucas M –
Seria otimo se atualizassem o conteudo do video para reproduzir a versao atual do sistema e do Python, porem em teoria o conteudo nao deixou a desejar.
Francesco d C –
the assignments could have left more freedom to the student.
Leire A –
Low level
Marcos P L –
As an introductory course on data science and manipulation of large data sets, the course proved to be quite comprehensive and technically capable of leading the student to an understanding of all content.
Tee H L –
I really like this learning method from IBM especially the instant quiz just to make sure I understand the important points.
daniel b –
This class make me confident in using apache spark for data projects that I may need. I really enjoyed how simple and effective it was. Very practical, easy to follow, high level course. Can not wait until the next course. You should probably have some experience with data frames and lambda expressions before coming into this class.
Giovani F M –
Great course to learn basic knowledge in spark!
Carlos F C d S e S –
This course changed my life!
Feng L –
too simple not advanced
Phuoc H L –
Love the content, simple and clear are the best.
Mike H –
Not well structured in my opinion. Difficulty of content not well balanced. Outdated presentations and content…
Xuan H N –
More coding please. One doent learn much just by filling out couple words
srinivasareddy c –
Simply course has a very different and amazing nuances of learning
Jeffrey G D –
Some of the courses have out of date instructions, or the methods recommended are deprecated.
Adamya –
A very nice introduction to Apache Spark and it’s environment. As a bonus, it’s also a very nice refresher to your basic statistics!!! Great course!
George H –
Analytically very simple, and fails to explain much of the syntax needed for the assignments.
Arseniy T –
I want to put things into perspective: I recently completed a one year data science course at Flatiron School which covered all aspects of data science: Python, SQL, data mining, statistics, probability, linear regression, classification, decision trees, deep neural networks and everything in between. You name it, I’ve studied it. If you want to learn data science don’t take this course. Few videos about central limit theorem + several graphs in matplotlib wouldn’t leave you confident enough about how to actually do analysis. Also, assignments for this course were mostly about how to extract data with SQL, pretty easy if you know the basics. The entire course took me less than a day to complete and I’m still confused about how actually spark works under the hood. Some people complain about old videos and the thick accent of the teacher. For me it wasn’t the problem, the code was running smoothly and I understood everything the teacher said. My suggestion would be to give a more detailed explanation of the cloud/parallel computing, how it’s structured, how to set up servers, etc.
Nicole Z K –
Outdated content, with corrections as annotations in the videos. Not very engaging and has just a little of spark content.
Jeremie B –
Nice introduction, not too difficult without being so easy that you learn nothing. Sometimes outdated contents, but I always find solutions quickly to make everything work. In fact it is better to have realistic examples and to use up to date technologies, even it is of course harder to maintain. Therefore my remark is not a complaint. Actually Mr Kienzler does a good job to keep things working and the learners informed.
Ulugbek D –
General into into how to deal with large data using Apache Spark
Mohamed A T –
The course was great, the material and the assignments. IBM Watson platform was easy to use. But I can’t see how this course is included in the “advanced” data science specialization. Honestly I was expecting a more advanced course. But we’ll see with the next ones.
Nabin R P –
well explained with relevant examples
PRABAKARAN C –
Great course
Edi W –
Nicely arranged course. However, both assignment on week 4 should be rechecked to make sure that it could run as exercise to student. Also, please make sure that the video is up to date and less error.
Kaiwalya –
The course content is amazing but the instructor’s accent is very difficult to understand and in some videos subtitles in English weren’t available.
Robert H –
Nice subjects notebooks could be more in dept