For each of the 10 plots: identify the direction, strength, form, outliers, variability, and grouping. Submit your answers to see your score and feedback.
Select a dataset and variable pair, then drag the slider to your best estimate of Pearson's r and submit. r ranges from −1 (perfect negative) to +1 (perfect positive).
Pearson's r measures the strength and direction of a linear association. It always falls between −1 and +1.
Key warnings:
r only measures the strength and direction of a linear association between two quantitative variables.
| 5 pts | Error ≤ 0.05 (bullseye) |
| 4 pts | Error ≤ 0.15 |
| 3 pts | Error ≤ 0.22 |
| 2 pts | Error ≤ 0.32 |
| 1 pt | Error ≤ 0.50 |
| 0 pts | Error > 0.50 |
Max: 75 pts (15 × 5)
Drag the line handles to get your SSE as close to the true minimum SSE as possible. The OLS line is the only line that achieves that minimum — your goal is to find it.
For any line ŷ = b₀ + b₁x, each point has a residual — the vertical gap between the actual y and what the line predicts:
eᵢ = yᵢ − ŷᵢ
We square each residual so negatives don't cancel positives, then sum them all:
SSE = Σ eᵢ²
The line of best fit (OLS line) is the unique line that makes SSE as small as possible. No other line through that data will have a smaller SSE.
The orange boxes on the plot are the squared residuals — shrink the total area of those boxes and you are minimising the SSE.
Drag the ● handles on the line to adjust slope and intercept.
Scored on how close your SSE is to the true minimum SSE:
| 5 pts | Within 2× true SSE |
| 4 pts | Within 3× |
| 3 pts | Within 5× |
| 2 pts | Within 10× |
| 1 pt | Within 20× |
| 0 pts | More than 20× |
Max: 100 pts (20 × 5)
Drag the line to see how R² = 1 − SSE/SST changes. Red boxes = SST (fixed). Blue boxes = SSE (shrink them to maximise R²).
Without x, your best guess for any y is ȳ. The red boxes show the squared deviations from ȳ — this is the total variability in y:
SST = Σ(yᵢ − ȳ)²
The blue boxes show the squared residuals from your line — the variability your model does not explain:
SSE = Σ(yᵢ − ŷᵢ)²
R² is the fraction of the total variability in y that your line explains:
R² = 1 − SSE/SST
The dashed line is ȳ. For a fixed dataset, SST stays the same — as the fit improves, SSE gets smaller and R² gets larger.
Scored on how close your R² is to the true maximum (OLS):
| 5 pts | Within 0.04 of true R² |
| 4 pts | Within 0.08 |
| 3 pts | Within 0.15 |
| 2 pts | Within 0.25 |
| 1 pt | Within 0.40 |
| 0 pts | More than 0.40 off |
Max: 75 pts (15 × 5)
The permutation test asks: could the observed slope happen by chance if x and y were unrelated? Hit Permute to shuffle y values and build the null distribution one slope at a time.
① Start with the real data. Compute r and the observed slope b₁.
② Ask: if there were no real linear association between x and y, could we have seen a slope this large just by chance?
③ Simulate the null: keep the x-values fixed, and randomly reassign the y-values to different x positions. This breaks any real pairing between x and y.
④ Refit the line on the scrambled data and record the new slope b₁*.
⑤ Repeat many times. The collection of b₁* values forms the null distribution — the slopes we would expect just by chance if the true slope were 0.
⑥ p-value = fraction of permuted slopes that are at least as extreme as the observed b₁. If the observed slope falls far into the tail, that is evidence against the null and suggests a real linear association.
Press "Permute Once" to run the first simulation.
Drag the slider to slice through the true regression line at any x. The cross-section shows how individual y-values are distributed at that x.
The red line is the true population line — fitted to 53,940 ggplot2 diamonds: price = −$2,254 + $7,753·carat, σ = $1,548.
At every carat value, prices scatter around μy with standard deviation σ. This scatter is the irreducible error in the model.
🎲 Simulated: points drawn randomly from the true model — perfect for understanding the concept.
💎 Real Diamonds: reveal actual diamonds from our 500-diamond sample one batch at a time — see the same tunnel emerge from real data.
At any x, individual y-values follow a normal distribution centred at μy(x) with spread σ.
Drag the slider left and right — the curve moves because μy changes, but it stays exactly the same width. σ does not depend on x.
This is the equal-variance (homoscedasticity) assumption of SLR: σ is constant across all values of x.
True model fitted to 53,940 ggplot2 diamonds: price = −$2,254 + $7,753·carat · σ = $1,548 · Each click adds 20 real diamonds from our 500-diamond sample.
Each bootstrap resample (n=500 with replacement) produces a slightly different OLS line. Watch the confidence band form, then see the sampling distributions of the slope and intercept.
Each bootstrap resample (n=500 with replacement from our 500 diamonds) produces a slightly different OLS line. The blue lines scatter around the true red line — forming the bootstrap confidence band.
The tunnel is narrowest near the mean carat (~0.8 ct) and wider at the extremes, because OLS predictions are most stable where the data is densest.
Drag the slider to slice through the tunnel at any x and see the spread of ŷ values from all samples.
True model fitted to 500 diamonds: price = −$2,254 + $7,753·carat · σ̂ = $1,548 · Bootstrap: n=500 resamples with replacement
Draw samples to begin.
Drag any point. The OLS line recomputes instantly. Watch the residual plot update live.
Generate a dataset to begin.
For each point, the residual is: e = y − ŷ
It is the vertical distance from the point to the OLS line — positive if above, negative if below.
When you drag a point, its ŷ changes because the line moves, so the residual changes for every point — not just the one you moved.
High leverage — point far from x̄. Has the potential to pull the line but may not if it follows the trend.
High influence — actually changes the line substantially. Measured by Cook's D: high leverage combined with a large residual produces high influence.
Outlier (low leverage) — large residual near x̄. Doesn't move the line much but inflates RMSE.
Note: This is a conceptual tool — the data shown is mathematically constructed to illustrate shape patterns. Use it to build intuition for what skew and heavy/light tails look like on a QQ plot. Real residuals will be messier.