Big Data and Predictive Analytics

Module code: CO7093

As we increasingly rely upon the online environment for our daily routines, we leave behind a vast amount of information about us. Commercial and public organisations can use this information to predict our behaviour.

In this module you will study the methods and tools that enable someone to identify variables of interest and their relationships from an existing data set in order to develop a statistical model that can predict values of variables of interest. This kind of analysis can give an insight into individual preferences, and most importantly, what someone is likely to do in a given scenario. Some of the applications include credit bank approval, marketing, stock price predictions, demand forecasting or political campaigning.

We will also study the importance of good quality data and will rely upon open libraries such as scikit-learn to implement basic models with much less programming effort. We will explore how to compare and contrast different models for the same data and objective. As a predictive analysis does not necessarily demand a huge amount of data, we will also discuss the utility or misfortune of the so-called big data and how to process such a large amount of data efficiently by using a distributed approach such as Apache Spark.