Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data. The mission of The Johns Hopkins University is to educate its students and cultivate their capacity for life–long learning, to foster independent and original research, and to bring the benefits of discovery to the world.
Instructor Details
Courses : 8
Specification: Getting and Cleaning Data
|
53 reviews for Getting and Cleaning Data
Add a review Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
FREE
Gustavo d P P –
Muito bom! Aprendi a usar dplyr e agora minha vida se tornou muito mais facil!!!
Pablo d A S –
Very well thought out and more advanced than previous one.
Johnnery A –
Los profesores debe explicar con mas detalle los temas. Pienso que hay una brecha importante entre los temas del curso y los proyectos
Luu T S –
Great !! Thank you
Heidi P –
Challenging!
Gustavo D S F –
excellent!
Ratanaporn –
Goog Job
Nishant R S –
Very deeply explained and very accurately organised.
Pitak P –
Good
Eduardo S B –
In my opinion the structure of the course is not the best. I mainly dislike the fact that some libraries, packages, etc. (e.g. MySQL) are not trivial to install. Still I learnt quite a lot, so I wouldn’t say it’s bad.
Thaer Z –
I am done with this course. every week is the same thing. the lectures are a long list of references to other references. The quiz questions can not be answered without spending hours troubleshooting RStudio or searching the forum for help and hints to find out why the loaded packages or functions are not found. The quiz recommends to load packages that don’t work or have dependencies that are no longer valid. I wanted to take this specialization to learn new data analysis techniques. if I wanted to spend my time searching the internet for answers I can do that without paying monthly fees. Good luck everyone. I am done. I will try a different course or field of interest.
Daniel P –
T h e c o u r s e i s good. I like the videos and the assignment. There is cerain redundancy of information. Much of the “new” information was already elaborated in the previous courses of the same specialization. Additionally, the grading system is based on other students whose knolledge may be not beyond the course scope and submitting an inovative solution can mean not passing the course.
Aki T –
This course was excellent and fundamental in order to even start a data analysis. It sets the foundation for how to read and treat the data, which is as the instructor mentioned, often overlooked. Thank you very much for taking the time to break the cleaning process into each comprehensive pieces.
Amit S –
Very good content
Edward A S M –
More examples of code running in parallel with the course work will be helpful.
Jose C –
Great course!
Graciano P –
Great course for getting and cleaning data using R.
Fabien N –
The course was good but a bit too versatile for such hard quizzes.
Vivek G –
Challenging course for working professionals from time perspective but very educative and useful. I learned a lot.
Christian B –
No idea what they want for the project and the discussion forum is clogged with people asking for peer reviews. The previous courses at least provided you with a understanding of what the final product should be, in this case it’s make tidy data, but with no idea on how that data should look.
Sean S –
Challenging but rewarding! Great community.
Patrick B –
Challenging assignment. Interesting material
Dev P –
Good introduction to getting and cleaning data and very useful learning about the principles of tidy data. Jeff Leek isn’t as good a tutor as Roger Peng and it was a bit frustrating following along at times as no hyperlinks are available for the data. The lessons are just recycled content from Jeff’s lectures. The course project was a good challenge!
Erin A –
This is my third course completed in the Data Science Specialization offered by Johns Hopkins. In all three, I feel the lectures, quizzes, and swirl exercises are easily accessible, and then the final project makes me feel like I am seeing R for the first time. One review of the course made a brilliant suggestions: go through the videos as quickly as you can, and then look at what will be asked of you in the final project. Then, go back through the videos and quizzes with a different set of eyes. I feel like there is just so much to learn with R that sometimes you need a lens to help you focus on a subset of things that you absolutely will need, while getting a “taste” for all that R has to offer. Overall, I am enjoying the courses, but the final projects are indeed a different kind of challenge.
Victor A d S P –
Great course, some background needed. If you are taking the Data Science specialization program, then this is a great catch.
Kunal P –
This was one of the best class. Recommend more side reading material on data. SWIRL has a reading link but the link is not provided anywhere else on the board. Also, it would be beneficial if the links can be made clickable in lecture slides. Thanks.
Abdullah M –
Great content. Would have been better with more resources in addition to videos.
rahul g –
A very interesting and unique course, for the kind of things it helps to learn are often ignored. It brings the breadth of the theme forward.
James L J J –
The Value of the course is in the course projects. I found that working through them really accelerated the understanding.
Ricardo L –
The most important information I learned was the tidy data standard. Very useful and clear. It will make the analysis process easier.
Mathew K –
Pros: I learned a ton about cleaning data, the challenges involved, and how to tackle new problems. The quizzes and projects throw you into the deep end, asking you to import some data set and report some features of it, and you often need to figure out what package to use and how to work with it on your own. Cons: The videos in this course are basically useless. You get a superficial coverage of how to use some package without a lot of explanation on what each part does, and basically all of the examples are broken, because the data have been updated, the site has changed/no longer exists. The instructors very annoyingly bat away any responsibility in the forum by saying it would be too expensive to fix anything. Too expensive? This isn’t a Micheal Bay movie, this is a guy talking over a powerpoint.
Udom A –
Good
Manav S –
good
Prasad R B –
very nice
Bill J –
In weeks two and three, the course presents a list of data format and how to read them into R. I would have preferred a better description on why tidy data sets are considered tidy that included some side–by–side comparisons and downstream effects of untidy data. This would help me evaluate the effort and risk of introducing errors from tidying the data against the benefit of tidying it.
Sujeet S –
Too tough
Jehan H –
The course has useful information.
Amanyiraho R –
Best data cleaning technique.
Zhou c –
well–organized course!
Sreemoyee M –
The challenging quizzes and assignments are a main reason why this course is so great! This course truly ensures that you truly understand and implement the videos.
Adam M –
The information in the lectures is very stale, which makes it extremely frustrating to learn from.
Nachiketas N –
Very useful and hands–on.
Willie C –
Not a great course. The lecture videos were dull and not very informative, and did not do a good job of preparing you for the quizzes at the end of each week. The lecture videos mentioned and linked to a number of external resources, but you couldn’t click on the links through the videos, so that wasn’t useful. The forums were much more helpful than the lecture videos when it came to teaching you what you needed to know. I understand why a course like this is essential to the Data Science specialization, but I feel like this content could’ve been covered in a much more engaging and instructive manner.
Mohamed D –
Very easy to follow.
uttam K –
Good course
Liam C –
Week 1 and 2 are completely worthless. They’re cursory 5–10m introductions to topics that show you HOW to start to do something, but don’t explain any commands or what is going on, it’s just instructions to follow. This leaves you completely unprepared to do any actual work. Then you get the assignments and you basically have to go learn everything independently. The course info is useless. I skipped these. When I want to do the type of work they cover, I’ll watch some tutorials and read documentation to actually learn it. They need to focus in on one or two topics (e.g. APIs, MySQL) and actually teach you the basics of them. The lecture videos even use weird syntax without explanation (e.g. using instead of <–. Using par(), etc.). Like the other courses in this specialization, you'll spend almost all of your time learning independently, and not using any of the materials provided. The discussion board is sometimes useful, but you can see how little work is done to improve the course there, as people point out errors and issues which are still outstanding months/years later.
Whitchurch S R M –
This was an awesome course. I really liked the final project. Especially creating a Codebook as well as tidying up the data. I feel I went too much in–depth into creating the codebook as well as the readme file. But in hindsight it was totally worth it. My advice to future learners. push yourselves to the limit when doing the final project. You will definitely learn much much more by putting in 110% into these hard projects.
Yanying J –
It’s useful and practical! Especially, the swirl excerises offered not only the method, but also a clear example of data cleaning.
Neha P –
Good experience with excellent knowledge
Viktor K –
There is a big gap between class material and practical exercises.
Miguel C –
I really enjoyed and learned a lot in this course. I feel a lot more comfortable with looking for and reading data. I learned how to clean data and getting it ready for further analysis. I think the course project was particularly good for completely understanding the process of tidying data and all the aspects it involves, such as writing a code book and a README file for accompanying it. Furthermore, I believe I further developed my R programming skills, by learning how to code new things or things I already knew but in a more efficient way, by using new packages and techniques. Moreover, I found Professor Jeffrey Leek quite engaging, very easy to understand and I had complete confidence in his knowledge on this subject. However, I believe the course is slightly outdated. I was often disheartened and frustrated by not being able to replicate what was being done in the lecture videos. For example, there were many links that did not work anymore and sometimes information that simply wasn’t correct anymore. I found the discussion forums and many mentors responses to be very helpful. I think this can easily be fixed by writing up an errata or updating the lecture videos.
Apurva G –
Its extremely difficult to install the packages. Most of the time the instructions are not clear on what packages to intall in the videos. There should be a pre read with links and instructions on which packages are needed to be able to work on this course. Extremely frustrating, considering a majority of the time is wasted just trying to figure out how to install packages. If you are serious about success of course takers, you have to make it easier to understand and instructions have to be clear.
Rose G –
Everything was new for me in this course, so I loved learning so much