At the end of the course, students should have a profound understanding of:|
1) Build simple entity-relationships and relational data models
2) Query data with SQL
3) Retrieving information from a relational database via a programming language (Python)
4) Working with public datasets, transferring data from repositories or supplementary material and being able integrate them in a data model.
The course Introduction to Research Data Management gives practical insights on Data Management for scientists. Basic knowledge of relational databases, entity-relationships models, relational models and SQL with MySQL is provided during the course. The programming language used to process data from and to the database is Python.|
Proper management of research data is a requirement by funding agencies, publishers or academic institutions. This course provides the technical keys to understand how to model, structure and query data. Benefits of having these skills are numerous: a better insight on how to manage research data and comply with research data management policies, more efficiently store and reuse important data for computational experiments and awareness of the current techniques available to make these tasks easier. The modeling part of the course is focused on communicating the important aspects of datasets to colleagues or an audience via simple models that can be included in posters or other types of publications.
The course is divided into six modules:
- Research Data Management and Databases
- Data and Models
- Starting with MySQL and Workbench
- Structuring and Querying Data
- Storing and Processing Data with Python
- Working with data repositories
Next, more practical insights are given, mainly about:
- Data modeling with E-R and relational schemas
- SQL (mainly DML)
- Working with MySQL and Workbench (modeling)
- Working with publicly available data by modeling, importing and integrating data into relational databases.
- Working with data schemas and public repositories
The final grade consists of:
- Online quizzes (10%), three attempts per quiz. Min. score is 6 per quiz.
- Two minor assignments (20%), No minimum score. There is one opportunity to resubmit one of the two minor assignments to improve its grade.
- A final assignment (70%), Min. score is 5. There is one opportunity to resubmit the final assignment is the grade is less than 6.
Min. final grade to pass the course: 5.5
Literature/study material used:
Course content and material is hosted on https://elearning.ubc.uu.nl
A virtual machine (Ubuntu Linux) containing all the necessary software is available for students.
Alternatively, students may choose to install the required software on their own machine. In that case, they will need a computer environment with:
- Minimum: Python 2.7.9 or Python 3.4.x/3.5.x
- Jupyter (IPython) notebook
- MySQL 5.7.X branch
- MySQL Workbench CE 6.3.X
- Python pandas (http://pandas.pydata.org/)
- Windows users can install WinPython (http://winpython.github.io/) containing all the necessary modules by default
|Python basics: type of variables (int, string…), data structures (tables, lists, dictionaries), functions, control statements (if, while…). The knowledge of any other programming language at an identical level is sufficient.|
Command line basics: the course requires some command line manipulation. Having some experience with working in a shell (bash, powershell…) will help.
|Verplicht materiaal-Werkvormen (aanwezigheidsplicht)|