Blog posts

Intro

Learning about simple linear regression
Published on 2021-03-25 by Andrew Reid	#16

The linear model

A scientific model is an approximation about the behaviour of natural systems, based on observations of them. Galileo's heliocentric model of Earth's solar system is a good example of this.

A "linear" model is the approximation of a system using a line equation, which in early education usually takes the form $y = ax + b$ (sometimes the $a$ is replaced with an $m$). In this equation, $x$ and $y$ are variables, and $a$ and $b$ are parameters, which assume constant values that define the slope and y-intercept of the line, respectively. It is common, especially in statistics, to refer to these parameters with the Greek letter $\beta$; so in what follows I'll be using $\beta_0$ to refer to the intercept and $\beta_1$ to refer to the slope.

So how can a linear model be useful for an aspiring scientist? As an example, suppose you live in the early 20th-century, when the automobile industry was just getting off the ground, and you notice that more people seem to be getting into accidents after having consumed alcohol than when sober. You formulate this as a hypothesis: alcohol consumption impairs reaction time. You can express this as a linear model:

$$\hat{RT} = \beta_0 + \beta_1 BAC,$$

where $\hat{RT}$ is the predicted reaction time, in milliseconds (called the outcome variable), and $BAC$ is blood alcohol concentration, in grams per millilitre (called the predictor variable).

As a scientist, you design an experiment and measure your observations from a randomly selected, representative sample. You define your null hypothesis (the thing you are trying to disprove) as a complete lack of relationship between $BAC$ and $RT$ in the population of interest. This corresponds to a horizontal line, where $\beta_1=0$:

$$H_0: \beta_1=0$$ $$H_a: \beta_1 \neq 0$$

What do you do next?

Fitting a linear model to observations

The first thing you want to know in this scenario is: "how well can I approximate the data with my model?"

Problem is, you don't know what $\beta$ parameters to use. You can get an idea about this by plotting your variables against one another as a scatterplot:

These plots need a bit of explanation. For the scatterplot at top, each circle is a single participant, and its $x$ and $y$ positions represent the values of the two variables you measured from that participant (e.g., $BAC$ and $RT$). The red line is the line we are trying to use to approximate the relationship between these variables. This line allows us to predict the value of $y$ from an observed value of $x$.

This predicted value of $y$ is referred to as $\hat{y}$ (pronounced "y hat"), and the fact that we are making a prediction can be expressed like this:

$$\hat{y}=\beta_0+\beta_1 x$$

The quality of our model can be assessed by looking at how well the model's prediction $\hat{y}$ deviates from the actual observed value $y$ for each data point. In other words, how far do our data points fall from the line? This is called residual error, and is shown above as the green lines.

In the above plots, you can use the slider to play with the value of $\beta_1$ to try and minimize the model error. The line equation for that value is given in the red box. The model error can be quantified as a single value called mean squared error, or $MSE$, shown in the green box. $MSE$ can be obtained by squaring the residual errors (which makes them positive) and averaging across all data points:

$$MSE = \frac{\sum{(y-\hat{y})^2}}{N},$$

where $N$ is the number of data points.

Our goal here is to find the values of $\beta$ that best fit our observations, and this corresponds to the values where $MSE$ is minimal. In the bottom plot, we are plotting $MSE$ as a function of the different values of $\beta_1$ that you are exploring with the slider. If you move the slider enough, you can see that this forms a U-shaped curve, with a minimum around 0.62.

This is is the slope we are looking for!

The simplest way to solve this minimization problem (without playing with a slider) is through a method called ordinary least squares, commonly abbreviated as OLS, which for a single predictor variable is derived here (disclaimer: math!). The basic logic is: as long as our $MSE$ values form a U-shaped (also called convex) curve, we can use calculus to determine analytically where the minimum is [1,2]. We first take the derivative (rate of change of $MSE$ over values of $\beta_1$) and then determine at which value of $\beta_1$ it is exactly 0 [3]:

$$\frac{d MSE}{d \beta_1}=0$$

Statistical inference on linear models

Going back to our original example, suppose we do use OLS to obtain an optimal model for the relationship between $BAC$ and $RT$. Where do we go from here?

Our next task is to show that our estimated $\beta$ parameters generalize to our population of interest. In other words, can we use this evidence to reject $H_0$ and thereby infer that a relationship exists between our two variables? To do this, we need to show that our estimated $\beta_1$ is further away from zero than would be expected by chance (within some acceptance threshold $p < \alpha$) from a random sample of size $N$.

Intuitively, the issue we are grappling with is whether the difference between our model and the null model (i.e., where $\beta_1=0$) is sufficiently greater than the overall variation in our data. These are plotted below:

On the left, we've got the null model: in this case, $\beta_1=0$ and our best guess at $y$ is its mean: $\hat{y}=\bar{y}$. In other words, $x$ gives us no information whatsoever about $y$. The deviations of each data point with the mean ($y-\bar{y}$), or total deviations, are shown as orange lines. In the middle, we've estimated a model to fit our data, shown again as a red line. The differences between this and the null model ($\hat{y}-\bar{y}$), or model deviations, are shown as purple lines. On the right, the remaining variability ($y-\hat{y}$), or residual deviations, are the same as we saw in the plot above, and are again coloured green.

Hopefully it is clear that the model and residual deviations are the sum of the total deviations, but also that the balance between model to residual squared deviations (called $SS_M$ and $SS_R$, respectively), gives us an estimate of how well our model explains the data. In the extreme cases: for the null model, all deviation is residual and model deviation is 0: $SS_T=SS_R$ (where $SS_T$ is the sum of squared total deviations); and for a perfect model, all data points lie on the red line, and residual deviation is 0: $SS_T=SS_M$.

Indeed, based on this balance, we can quantify the proportion of the variance of $y$ that is explained by our model as:

$$R^2=\frac{SS_M}{SS_T}$$

You may note that this is analogous to the $r^2$ for Pearson correlations.

Another thing we can do is get a sampling distribution for this ratio that can be used to derive a p value (the probability of observing this ratio or higher when the null hypothesis is true). For this purpose, however, we use a slightly different ratio based on "mean" sums of squares:

$$MS_M = \frac{SS_M}{df_M}$$ $$MS_R = \frac{SS_R}{df_R}$$

I put "mean" in quotes because, confusingly, these are not the same as the $MSE$ introduced above. Instead, they are divided by the degrees of freedom ($df$) for each quantity (more about those here). The reason we introduce df's at this point is to deal with sample size and model complexity. As is true of any sampling distribution, parameter estimates will have more variability for smaller sample sizes. Similarly, they will have more variability for more complex models (i.e., the number of predictor variables we are using to fit the data), because you can always overfit sample data with more parameters. This is reflected in the equations for these df's:

$$df_M=k$$ $$df_R=N-1-k,$$

where $k$ is the number of predictor variables (in the case of a single $x$ variable, this is 1). Both more predictor variables and fewer participants reduce the ratio, to account for increased variability.

Conveniently, the ratio of model to residual mean sums of squares has an F distribution under the null hypothesis, and is referred to as the F ratio:

$$F(df_M,df_R)=\frac{MS_M}{MS_R}$$

Comparing our F ratio to this distribution allows us to determine a p value. In the plot below [4], you can play with $k$, $N$, $SS_M$, and $SS_R$, to see their effect on the shape of the F distribution and the p value for our linear model. The p value is computed as the area under the F probability density function (PDF) between our F ratio (red line) and $+\infty$, shown as the blue shaded area.

Interpreting a simple linear regression result

Okay, so let's say you've obtained a p-value for your reaction time experiment, which is less than your pre-specified threshold of $\alpha=0.05$. You can now go ahead and reject $H_0$, on which basis you infer that there is, indeed, a likely relationship between $RT$ and $BAC$ in the general population.

Can you make the bolder inference that higher $BAC$ causes slower $RT$? This depends on your research design. If you had randomly (and blindly) assigned alcohol to participants, then you could fairly confidently make this claim, as there are no conceivable confounding variables left uncontrolled. If, on the other hand, you had let participants determine their own alcohol intake, then you might have a problem: a third factor might conceivably have influenced both their choice of imbibement and their reaction time.

In general, unless your research design is strictly experimental (i.e., the predictor variable was manipulated in a controlled manner), it is fairly difficult to support a causal inference from a linear regression result. The more epidemiological the design (i.e., measuring many variables from a large sample with few or no controls), the harder this becomes. This is because it is possible/likely that unmeasured factors exist, which could influence or mediate the association between two measured ones. If we were, for instance, to use a questionnaire to ask people for their average alcohol consumption and frequency of automobile accidents, and find an association between these variables, we could not be certain that one variable causes the other.

One way to approach confounding is to measure the variables we think may be confounders. For example, if we hypothesized that males drink more on average than females, or that younger people partake more often than do older people, then we could add sex and age as covariates in our linear model.

That brings us to multiple linear regression, which is another kettle of fish that I will serve up in a future blog post. :)

Incidentally, the use of squared model errors instead of absolute values is largely so that the $MSE$ curve shown above is U-shaped (smooth) rather than V-shaped, for which the derivative at the minimum is undefined.

Handily, it can be shown that this so-called cost function for any input set will be convex (here's some more math), which makes OLS quite ubiquitous.

Actually, we need to use partial derivatives here, as we are minimizing with respect to both $\beta_0$ and $\beta_1$ (and more parameters for multiple linear regression), but for our purposes this illustration gets the point across!

Javascript code for computing the F-ratio probability density function (PDF) and cumulative distribution function (CDF) was obtained from this excellent site.

Comments here

In this post, I introduce the concept of simple linear regression, where we are evaluating the how well a linear model approximates a relationship between two variables of interest, and how to perform statistical inference on this model. This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · Linear regression · F distribution · Teaching

Causal discovery: An introduction
Published on 2024-09-23 by Andrew Reid	#21

This post continues my exploration of causal inference, focusing on the type of problem an empirical researcher is most familiar with: where the underlying causal model is not known. In this case, the model must be discovered. I use some Python code to introduce the PC algorithm, one of the original and most popular approaches to causal discovery. I also discuss its assumptions and limitations, and briefly outline some more recent approaches. This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · Causality · Causal inference · Causal discovery · Graph theory · Teaching

Causal inference: An introduction
Published on 2023-07-17 by Andrew Reid	#20

Hammer about to hit a nail, representing a causal event.

In this post, I attempt (as a non-expert enthusiast) to provide a gentle introduction to the central concepts underlying causal inference. What is causal inference and why do we need it? How can we represent our causal reasoning in graphical form, and how does this enable us to apply graph theory to simplify our calculations? How do we deal with unobserved confounders? This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · Causality · Causal inference · Graph theory · Teaching

Multiple linear regression: short videos
Published on 2022-08-10 by Andrew Reid	#19

In a previous series of posts, I discussed simple and multiple linear regression (MLR) approaches, with the aid of interactive 2D and 3D plots and a bit of math. In this post, I am sharing a series of short videos aimed at psychology undergraduates, each explaining different aspects of MLR in more detail. The goal of these videos (which formed part of my second-year undergraduate module) is to give a little more depth to fundamental concepts that many students struggle with. This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · Linear regression · Teaching

Learning about multiple linear regression
Published on 2021-12-30 by Andrew Reid	#18

In this post, I explore multiple linear regression, generalizing from the simple two-variable case to three- and many-variable cases. This includes an interactive 3D plot of a regression plane and a discussion of statistical inference and overfitting. This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · Linear regression · Teaching

Learning about fMRI analysis
Published on 2021-06-24 by Andrew Reid	#17

In this post, I focus on the logic underlying statistical inference based on fMRI research designs. This consists of (1) modelling the hemodynamic response; (2) "first-level" within-subject analysis of time series; (3) "second-level" population inferences drawn from a random sample of participants; and (4) dealing with familywise error. This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · FMRI · Hemodynamic response · Mixed-effects model · Random field theory · False discovery rate · Teaching

New preprint: Tract-specific statistics from diffusion MRI
Published on 2021-03-05 by Andrew Reid	#15

In our new preprint, we describe a novel methodology for (1) identifying the most probable "core" tract trajectory for two arbitrary brain regions, and (2) estimating tract-specific anisotropy (TSA) at all points along this trajectory. We describe the outcomes of regressing this TSA metric against participants' age and sex. Our hope is that this new method can serve as a complement to the popular TBSS approach, where researchers desire to investigate effects specific to a pre-established set of ROIs.

Tags:Diffusion-weighted imaging · Tractography · Connectivity · MRI · News

Learning about correlation and partial correlation
Published on 2021-02-04 by Andrew Reid	#14

This is the first of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology. In this post, I will try to provide an intuitive explanation of (1) the Pearson correlation coefficient, (2) confounding, and (3) how partial correlations can be used to address confounding.

Tags:Stats · Linear regression · Correlation · Partial correlation · Teaching

Linear regression: dealing with skewed data
Published on 2020-11-17 by Andrew Reid	#13

One important caveat when working with large datasets is that you can almost always produce a statistically significant result when performing a null hypothesis test. This is why it is even more critical to evaluate the effect size than the p value in such an analysis. It is equally important to consider the distribution of your data, and its implications for statistical inference. In this blog post, I use simulated data in order to explore this caveat more intuitively, focusing on a pre-print article that was recently featured on BBC.

Tags:Linear regression · Correlation · Skewness · Stats

Functional connectivity as a causal concept
Published on 2019-10-14 by Andrew Reid	#12

In neuroscience, the conversation around the term "functional connectivity" can be confusing, largely due to the implicit notion that associations can map directly onto physical connections. In our recent Nature Neuroscience perspective piece, we propose the redefinition of this term as a causal inference, in order to refocus the conversation around how we investigate brain connectivity, and interpret the results of such investigations.

Tags:Connectivity · FMRI · Causality · Neuroscience · Musings

Functional connectivity? But...
Published on 2017-07-26 by Andrew Reid	#11

Functional connectivity is a term originally coined to describe statistical dependence relationships between time series. But should such a relationship really be called connectivity? Functional correlations can easily arise from networks in the complete absence of physical connectivity (i.e., the classical axon/synapse projection we know from neurobiology). In this post I elaborate on recent conversations I've had regarding the use of correlations or partial correlations to infer the presence of connections, and their use in constructing graphs for topological analyses.

Tags:Connectivity · FMRI · Graph theory · Partial correlation · Stats

Driving the Locus Coeruleus: A Presentation to Mobify
Published on 2017-07-17 by Andrew Reid	#10

How do we know when to learn, and when not to? Recently I presented my work to Vancouver-based Mobify, including the use of a driving simulation task to answer this question. They put it up on YouTube, so I thought I'd share.

Tags:Norepinephrine · Pupillometry · Mobify · Learning · Driving simulation · News

Limitless: A neuroscientist's film review
Published on 2017-03-29 by Andrew Reid	#9

In the movie Limitless, Bradley Cooper stars as a down-and-out writer who happens across a superdrug that miraculously heightens his cognitive abilities, including memory recall, creativity, language acquisition, and action planning. It apparently also makes his eyes glow with an unnerving and implausible intensity. In this blog entry, I explore this intriguing possibility from a neuroscientific perspective.

Tags:Cognition · Pharmaceuticals · Limitless · Memory · Hippocampus · Musings

The quest for the human connectome: a progress report
Published on 2016-10-29 by Andrew Reid	#8

The term "connectome" was introduced in a seminal 2005 PNAS article, as a sort of analogy to the genome. However, unlike genomics, the methods available to study human connectomics remain poorly defined and difficult to interpret. In particular, the use of diffusion-weighted imaging approaches to estimate physical connectivity is fraught with inherent limitations, which are often overlooked in the quest to publish "connectivity" findings. Here, I provide a brief commentary on these issues, and highlight a number of ways neuroscience can proceed in light of them.

Tags:Connectivity · Diffusion-weighted imaging · Probabilistic tractography · Tract tracing · Musings

New Article: Seed-based multimodal comparison of connectivity estimates
Published on 2016-06-24 by Andrew Reid	#7

Our article proposing a threshold-free method for comparing seed-based connectivity estimates was recently accepted to Brain Structure & Function. We compared two structural covariance approaches (cortical thickness and voxel-based morphometry), and two functional ones (resting-state functional MRI and meta-analytic connectivity mapping, or MACM).

Tags:Multimodal · Connectivity · Structural covariance · Resting state · MACM · News

Four New ANIMA Studies
Published on 2016-03-18 by Andrew Reid	#6

Announcing four new submissions to the ANIMA database, which brings us to 30 studies and counting. Check them out if you get the time!

Tags:ANIMA · Neuroscience · Meta-analysis · ALE · News

Exaptation: how evolution recycles neural mechanisms
Published on 2016-02-27 by Andrew Reid	#5

Exaptation refers to the tendency across evolution to recycle existing mechanisms for new and more complex functions. By analogy, this is likely how episodic memory — and indeed many of our higher level neural processes — evolved from more basic functions such as spatial navigation. Here I explore these ideas in light of the current evidence.

Tags:Hippocampus · Memory · Navigation · Exaptation · Musings

The business of academic writing
Published on 2016-02-04 by Andrew Reid	#4

Publishers of scientific articles have been slow to adapt their business models to the rapid evolution of scientific communication — mostly because there is profit in dragging their feet. I explore the past, present, and future of this important issue.

Tags:Journals · Articles · Impact factor · Citations · Business · Musings

Reflections on multivariate analyses
Published on 2016-01-15 by Andrew Reid	#3

Machine learning approaches to neuroimaging analysis offer promising solutions to research questions in cognitive neuroscience. Here I reflect on recent interactions with the developers of the Nilearn project.

Tags:MVPA · Machine learning · Nilearn · Elastic net · Statistics · Stats

New ANIMA study: Hu et al. 2015
Published on 2016-01-11 by Andrew Reid	#2

Announcing a new submission to the ANIMA database: Hu et al., Neuroscience & Biobehavioral Reviews, 2015.

Tags:ANIMA · Neuroscience · Meta-analysis · ALE · Self · News

Who Am I?
Published on 2016-01-10 by Andrew Reid	#1

Musings on who I am, where I came from, and where I'm going as a Neuroscientist.

Tags:Labels · Neuroscience · Cognition · Musings

Andrew Reid PhD

The linear model

Fitting a linear model to observations

Statistical inference on linear models

Interpreting a simple linear regression result