CloseHelpPrint
Kies de Nederlandse taal
Course module: 201600038
201600038
Data analysis and visualisation
Course infoSchedule
Course code201600038
ECTS Credits7.5
Category / LevelM (Master)
Course typeCourse
Language of instructionEnglish
Offered byFaculty of Social Sciences; M&S for Behavioural, Biomedical & Social Scien;
Contact persondr. D.L. Oberski
E-maild.l.oberski@uu.nl
Lecturers
Lecturer
prof. dr. P.G.M. van der Heijden
Other courses by this lecturer
Lecturer
dr. D.L. Oberski
Other courses by this lecturer
Lecturer
Various teachers
Feedback and availability
Other courses by this lecturer
Teaching period
2  (13/11/2017 to 02/02/2018)
Teaching period in which the course begins
2
Time slot-: Not in use
Study mode
Full-time
Enrolment periodfrom 29/05/2017 up to and including 25/06/2017
Enrolling through OSIRISYes
Enrolment open to students taking subsidiary coursesYes
Pre-enrolmentNo
Post-registrationYes
Post-registration openfrom 09/10/2017 up to and including 13/10/2017
Waiting listNo
Course goals
After successfully completing this course, you will be able to:
  • Understand and explain the different approaches to data analysis;
  • Given a practical data science problem, select appropriate techniques to tackle this problem;
  • Apply various data analysis techniques, including regression, trees, clustering, (categorical) PCA, association rule mining, etc. in R;
  • Implement generic Data Science tools such as train/validation/test sets, crossvalidation, bagging, boosting, and error evaluation in R ;
  • Interpret and evaluate the results of such analyses;
  • Explain these evaluations in layman's terms;
  • Understand and explain the basic principles of data visualization and the grammar of graphics;
  • Construct appropriate visualizations in connection with each of the data analysis techniques in R.
 
Content
What puts former criminals on the right track? How can we prevent heart disease? Can Twitter predict election outcomes? What does a violent brain look like? How many social classes does 21st century society have? Are hospitals spending too much on health care, or too little? When is a series of spikes in hundreds of website logfiles an operational problem?

Data analysis is the art and science of tackling questions like these by looking at data. Just as cartographers make maps to see what a country looks like, data analysts explore the hidden structures of data by creating informative pictures and summarizing relationships among variables. And just as doctors diagnose sick patients and advise healthy ones on how to stay healthy, data analysts predict important events and variables so we can act on this knowledge. Methods from statistics, machine learning, and data mining play an important part in this process, as well as visualizations that allow the analyst and other humans to better understand what we can conclude from the available facts.

During this course, participants will actively learn how to apply the main statistical methods in data analysis and how to use machine learning algorithms and visualizing techniques. The course has a strongly practical, hands-on focus: rather than focusing on the mathematics and background of the discussed techniques, you will gain hands-on experience in using them on real data during the course and interpreting the results.
This course covers both classical and modern topics in data analysis and visualization:
  1. Exploratory data analysis (EDA);
  2. Supervised machine learning and statistical learning;
  3. Unsupervised learning and data mining techniques;
  4. Visualization (throughout the course).
 Note that you need to register for this course via OSIRIS STUDENT and during the UU registration periods. This course is essential as a basis for each track of the Master of Applied Data Science. If you want to register for this course, please also register for the Applied Data Science profile via http://studyguidelifesciences.nl/profiles/applied-data-science
Entry requirements
Prerequisite knowledge
You should be familiar with the basic principles of applied statistics (up to regression). Some familiarity with a high-level programming language, such as (preferably) R or Python is highly desirable.
Required materials
Literature
Excerpt from the freely available text: James, Witten, Hastie & Tibshirani (2015). An introduction to statistical learning with applications in R. New York: Springer. http://www-bcf.usc.edu/~gareth/ISL/
Literature
Excerpt from the freely available text: Wickham. R for Data Science (2016). O’Reilly. http://r4ds.had.co.nz/
Software
All software used (Rstudio, R) is open source and freely available online, as are some of the books.
Recommended materials
Book
Zumel & Mount (2014). Practical data science with R. Shelter Island: Manning.
Literature
Additional literature and references are provided during the course
Instructional formats (attendance required)
Computer practical (Required)

General remarks
In every week, two computer practicals. The exact programme is outlined in the course manual.

Class session preparation
Assigned literature must be read before the lectures, assignments must be made before the meetings.

Contribution to group work
Collaboration by students on homework assignments is allowed and encouraged.
Lecture: Copying or simply dividing up assignments among collaborating students is not allowed and strongly discouraged.

Small-group session (Required)

Tests
Final result
Test weight100
Minimum grade5.5

Deadlines
Will be announced in the course manual.

Aspects of student academic development
Academic thinking, working and acting
Material / data analysis and processing

CloseHelpPrint
Kies de Nederlandse taal