Review of the course “Statistics: Making Sense of Data” by Alison Gibbs, Jeffrey Rosenthal
This spring the University of Toronto ran an online course on statistics and statistical calculus at the Coursea.org platform. We are delighted to present a review of this course written by Viktoria Denisova.
I explored the course ‘Statistics: Making Sense of Data’ by Alison Gibbs and Jeffrey Rosenthal, which I think is particularly interesting for those who would like to learn how to make statistical inference. In general, the course teaches how to make conclusions based on given data. It also touches the question of how to collect data in order to make a statistical inference. Last but not least, it discusses statistical inference, i.e. what we see in the real world and which conclusions we can make in a theoretical world.
I appreciated the structure of the course and its step-by-step explanation of the basic statistical concepts. The program is divided into an obligatory part and an optional one. The lectures of each week contain both parts. The obligatory part is about the real world data and the use of theory of statistics in its analysis. The optional part is dedicated to the statistical application of R, which is used as a tool for statistical analysis. I found it engaging to spend not only the necessary time on theory but also to immediately get a feel for the actual use of it. Besides the fact that this makes the course more interesting and fun to follow, for a person with a professional interest such as myself, it is also an extremely useful way of studying. If your aim is to go beyond theory and make your hands dirty with practical tasks, I strongly recommend this course.
The lectures start with the introduction of the basic statistical terms with examples of their use in connection with data samples. The abstract notions stay very close to the field of work in this way. The introductory part shows how the real world data can be explained in a theoretical world followed by a discussion of theoretical topics such as probability and confidence intervals. Each week ends with a test and exercises which allow you to check your knowledge. The course ends with some examples of the application of the course material to data analysis.
Just to give you a test of the course, I provide with a couple of topics we touched in the course. Terms such as categorical and quantitative variables are broadly discussed. The importance of five number summary, median, and mean in the analysis of quantitative data are thoroughly explained. Here it started to become clear how you can operate successfully with a seemingly unforeseeable amount of data. Each definition is, of course, followed by some examples.
There are two examples given below. They deal with the ways to represent the quantitative and categorical variables in order to explore the data.
In this picture one of the ways of representing the quantitative variables is shown. This example discusses how the estimated age of death differs from the actual age of death based on the sample of 400 skeletons. In the modified boxplot the Five number summary is represented: minimum, first quartile, median, third quartile and maximum. The Interquartile Range (IR), lower innerfence and upper innerfence are calculated. This modified boxlot is used to detect unusual observations. In this example, one of the unusual observations – 60, this means that the actual age of death is 60 years less than estimated age. These unusual observations are important for exploring of the data. This topic is broadly discussed in the subsequent lectures.
In this slide the pie chart is drown. This is one of the ways to represent the categorical variables. The pie chart considers in a proportion of how the countries and territories fit into 6 regions.
The course is taught in ordinary language, so it is easy to follow the course. The authors avoided very difficult explanations of complex formulae. The definitions are given straightforwardly without going into details, as who introduced these formulae and why. At the same time, the importance of the use of these definitions is clearly underlined. The use of box plots, bar and pie charts is explained. The lectures contained questions to check your understanding of the lecture’s material.
The authors explain the relationships between categorical and quantitative variables as well as between two categorical variables ant they show that the relationships between the variables can be illustrated with the use of plots. The distribution of data in tables is discussed in an easy way. What I especially liked in the course was that each example is illustrated with tables and the important patterns are underlined. The authors clearly explain the ways of data collections such as sampling, observational studies and experiments. I feel it is very important that we also pay attention to collecting the data and not just operating on it, since the first will ultimately determine the practical value of the second.
The last week of the course is dedicated to case studies. These cases clearly explain the way in which the theoretical material of the previous weeks can be applied to real world scenarios. I even found it the most exciting moment of the course.
I would recommend this course to anyone who would like to learn the basics of statistics its use in working with real world data. This course can serve as an introduction to data analysis and I believe that this course is very useful for those interested in statistics professionally pursuing a career in the business world.
About the author
Viktoria Denisova studied philosophy and logic in Saint Petersburg State University where she obtained a specialist degree in philosophy. She then continued her studies at the University of Amsterdam. She obtained her Masters of Science in Logic degree from the Institute for Logic, Language and Computation. Now, she focuses on forecasting and data analysis to which she brings an expertise in logical analysis. Victoria may be contacted via email msc.denisova@gmail.com