Statistics for Data Science

Module code: MA7023

The most important method in statistical analysis is the natural extension of simple linear regression models to include several explanatory variables, thus giving general linear models. With a judicious choice of variables, a variety of data analysis techniques can be covered within a general framework: multiple linear regression, polynomial regression, analysis of variance (ANOVA) techniques. The two main goals of analysis using this model include determining which explanatory variables are important, and the exact relationship of these variables to the response variable. It is possible to associate confidence intervals with estimates or predictions obtained from the model and assign p-values to the hypotheses you want to test. The model analysis is based on the method of least squares estimation, where the 'residual sum of squares' not only provides an estimate of the error variance but, perhaps more importantly, also offers a method for assessing the acceptability of any proposed model.

This module covers the fundamentals of probability theory, statistical inference, and general linear models. A brief introduction to least squares estimation of simple linear regression lines aims to show how linear relationships can be established between two variables. This module also aims to present the method of least squares estimation for estimating the parameters of linear models and methods for making inferences about these parameters by computing confidence intervals and using hypothesis testing. In particular, the purpose of this module is to impart an understanding of how statistical models explain variation. The aim of covering the theory of general linear regression models is to enable the extension of the theory to cover multiple linear regression, polynomial regression, and analysis of variance (ANOVA). This module aims to provide the necessary foundation for the study of generalised linear models.

Back to top