SluitenHelpPrint
Switch to English
Cursus: UCACCMET2J
UCACCMET2J
Making Sense of Data: Programming for Research
Cursus informatie
CursuscodeUCACCMET2J
Studiepunten (EC)5
Cursusdoelen
After completing this course, students:
- can translate research problems into data analysis problems
- can write comprehensive research reports on data-driven projects
- can critically evaluate data analysis approaches and execution
- have strengthened their core academic skills (writing, presenting, project organization) in the context of data analysis
- have a basic understanding of how computers work
- have been exposed to multiple digital tools
- have basic proficiency in multiple programming languages including Python, R, git and LaTeX
- can, in an automated fashion, manipulate large quantities of digital information by being able to search, sort, move, copy and concatenate files
- can clean, aggregate, subset, mutate, and visualise data and results, in an automated fashion
- can employ algorithmic approaches to problem solving
- can work with real (big) datasets
- will understand concepts of unstructured data and structured data, and can work with the commonly-used CSV and JSON data-interchange formats
- can work with common tools to acquire publicly available data and share their own data in a structured way

Aims
Many real-world problems and questions can be answered using data and automated data analysis tools. Increasingly, research in all disciplines, from the natural sciences to social sciences and humanities, involves big data. The availability of vast amounts of textual, audio-visual and structured data from digital sources is revolutionising research in the humanities and social sciences. The most advanced scholarship in these areas, currently and in the foreseeable future, relies on the use of sophisticated tools for accessing, processing, analysing and presenting this data.

In this three-week module, students are exposed to these principles and tools, in such a way that they can bootstrap their own further learning. Using real-world data, students gain familiarity and experience with some common approaches to handling large datasets. They learn how to think about and work with data in a sensible way, how to turn research problems into data problems, and how to communicate data problems and solutions effectively. This module also aims to demystify computers and programming, provide students the bootstraps to solve data problems using digital tools and programming languages, and to foster flexibility and self-learning. Students  engage with a number of very common programming languages and tools, culminating in a group project based on a real-world dataset, where they extract relevant information from it in an automated fashion, perform some simple analysis, and display the results visually.
As part of a liberal arts curriculum, this module stimulates the kind of thinking that our college hopes to engender: the use of multiple paradigms to solve problems, drawing on reasoning, logic, analysis, hypothesis testing, and formal problem-solving methods.
 
Inhoud
Registration
Registration through lab course coodinator: ucu.labcourses@uu.nl
HUM/ SSC students: course only counts as elective. May replace 1 SCI majors lab course, except for Physics majors.

Scope
The scope of the module is a hands-on introduction to the tools and concepts of automated data analysis. As such, it covers the concepts of: operationalizing research questions; exploratory vs. confirmatory data analysis; iterative software development; designing data visualizations; using the right data structures; understanding file and data types; and troubleshooting / debugging. It introduces students to industry standard tools: R (including tidyverse and ggplot libraries), Python, LaTeX, system shells, and git.
As an introductory course, it does not cover: databases; app development; web development; maths; statistics; simulations; machine learning; experiment design and implementation; or any domain-specific research approaches.

Format
This three-week module is full time, running from 09:00 until 17:00 each weekday. The first two weeks of the module consist of interactive instruction. The first week will focus on conceptual aspects of data-driven research, and the second week will focus on strengthening foundational programming skills in this context -- though theory and practical programming / data analysis will be mixed throughout the module. Each day will involve some combination of lectures, in-class exercises, group work, and classroom discussions. Additionally, on some days, the in-class work is to be handed in at the end of the day. These elements are evaluated, and make up the portfolio grades of week 1 and week 2. Generally, all work is expected to be completed during class hours, especially the graded elements -- little to no homework will be assigned.

During the third week, there is a focus on team-forming and project work. This week will include work sessions, presentations, and evening programs related to the theme of the module. The class, divided into groups of 3-4 students, will work on separate projects. At the start of the week, groups develop a written proposal for a data-driven research project. Throughout the week, they will execute this proposal, guided through regular progress meetings with the instructors in a format modeled on software development industry standards. At the end of the week, groups are brought together to a symposium where students will present their findings to the whole class. These presentations are not a graded element, and serve to communicate with the rest of the class, as well as obtain final feedback. After the presentations, groups finish writing up a final academic-style report on their project, which, together with the project proposal and the code implementation, forms the graded element of week 3.

Materials, Tutorials and Reference Works
All coursework in this module will be electronic, and students are requested to bring a laptop, with several gigabytes of free space -- when this is not possible, a solution will be found. All software tools, as well as any additional materials used in the course, will be open access or open source, and thus free of charge to students. Students will be pointed to a choice of tutorials and online courses, including the official tutorials given by makers of particular software, as well as simple user-friendly guides. Standard reference texts will be available during the module for students to consult for assignments. Whenever useful, lecture notes and programming cheat sheets will be provided. We will only be using software that is freely available. Detailed installation instructions will be provided at the start of the course.

Teachers
This course is coordinated and taught by Teun van Gils, Lucie Kattenbroek, Joska de Langen and Joris Vincent.

Administrative supervisor: Dr. Agnes Andeweg

Website
Further information and course materials will be provided on our course website: www.ucudata.nl
SluitenHelpPrint
Switch to English