R for Data Science

Module code: GY7702

This module focuses on the programming language R as an effective tool for data science. R is one of the most widely used programming languages, and it provides access to a vast repository of programming libraries, which cover all aspects of data science from data wrangling to statistical analysis, from machine learning to data visualisation. That includes a variety of libraries for processing spatial data, perform geographic information analysis, and create maps. As such, R is an extremely versatile, free and opensource tool in geographic information science, which combines the capabilities of traditional GIS software with the advantages of a scripting language, and an interface to a vast array of algorithms.

As part of the MSc programme, this module will provide you with the necessary skills in basic programming, data wrangling and reproducible research to tackle sophisticated but non-spatial data analyses. These skills will form the foundations for the methods and approaches discussed in the second semester, particularly in the Geospatial Data Analytics and the Geospatial Databases and Information Retrieval modules.

The first part of the module will focus on core programming techniques, data wrangling and practices for reproducible research. The second part of the module will focus on non-spatial data analysis approaches, including statistical analysis and machine learning.


  • 10 hours of lectures
  • 20 hours of practicals
  • 120 hours of independent study


  • Basic R programming exercise (30%)
  • Data science project (70%)