A statistical analysis of the World Happiness Report dataset (1,969 observations across 163 countries, 2019–2024) investigating how GDP, social support, healthy life expectancy, freedom, generosity, and perceptions of corruption predict national happiness scores. Multiple linear regression and permutation-based hypothesis testing reveal that 71% of happiness variation is explained by economic and social factors, with high-income nations scoring significantly higher than low-income peers.
The World Happiness Report is an annual publication by the Sustainable Development Solutions Network (SDSN), based on Gallup World Poll surveys. Countries self-report happiness as a Cantril Ladder score — respondents rate their current life on a scale from 0 (worst possible) to 10 (best possible). The report also attributes each country's score to six contributing factors, enabling both descriptive and inferential analysis.
This project addresses two research questions: (1) How do economic, social, and governance factors jointly influence national happiness scores? (2) Is there a statistically significant difference in happiness between high-income and low-income countries? The dataset spans 163 countries across six years (2019–2024) with 1,969 complete observations after cleaning.
The analytical approach moves from exploratory visualization (ggpairs correlation matrix) through six simple linear regressions — one per predictor — each with full model diagnostics (residuals vs. fitted, Q-Q plot, histogram). These single-predictor models establish baselines before fitting a full multiple regression model including all six factors plus a Region dummy variable, which achieves an adjusted R² of 0.7585.
For the hypothesis test, countries are split into High and Low income groups by median GDP. A permutation-based test evaluates whether the observed difference in mean happiness scores (Δ = 1.31 points) could plausibly arise by chance. The extremely small p-value strongly rejects the null hypothesis of equal group means.
Self-assessed life rating on the Cantril Ladder: 0 = worst possible life, 10 = best possible life. Global range in dataset: 1.72 (Afghanistan, 2023) to 7.74 (Finland, 2024).
Log-transformed GDP per capita, scaled to the happiness score contribution. Measures how much each country's economic output per person contributes to its Ladder score.
Survey response to: "If you were in trouble, do you have relatives or friends you can count on?" Reflects the strength of personal support networks within a country.
Number of years an individual is expected to live in good health, combining length and quality of life using WHO data and Gallup polling on perceived health.
National average of responses to: "Are you satisfied or dissatisfied with your freedom to choose what you do with your life?" Covers personal, political, and civil freedoms.
Average of two binary questions on perceived government and business corruption. Higher values indicate lower corruption perception — a country is seen as cleaner.
Six simple linear regression models were fitted — one predictor at a time — to isolate each factor's individual relationship with the Happiness Score. Each model was validated with full diagnostics: residuals vs. fitted values (linearity), histogram and Q-Q plot of residuals (normality), and visual inspection for heteroscedasticity. All significant predictors meet regression conditions. Generosity is the sole non-significant predictor (p = 0.209, R² ≈ 0.002).
For every one-unit increase in GDP contribution, the Happiness Score rises by 1.668 points. GDP explains 47.42% of happiness variation on its own — the strongest single economic predictor. Residuals are evenly distributed and near-normal, confirming model validity.
Highest R² (single predictor)Social support explains 47.37% of happiness variation — nearly identical to GDP — underscoring that social infrastructure matters as much as economic strength. A one-unit increase corresponds to 2.180 points of additional happiness score.
Significant ★★★Health explains 43.27% of happiness variation with the steepest slope among the significant predictors (3.314 per unit). The Q-Q plot shows slight deviation at the lower tail, suggesting some left skew for low-health countries.
Significant ★★★Freedom explains 29.33% of happiness — significant but weaker than GDP, social support, and health individually. The residuals vs. fitted plot shows modest heteroscedasticity at low fitted values, reflecting greater variability in less-free countries.
Significant ★★★Higher corruption perception scores (less corrupt) associate with meaningfully higher happiness scores. R² = 0.19 is the weakest of the significant predictors, but the steep slope (4.040) indicates corruption has a strong directional impact when it varies.
Significant ★★★Generosity is the only predictor that fails to reach statistical significance in simple regression (p = 0.209). It explains just 0.18% of happiness variation. However, it remains significant in the full multiple regression model (p < 0.001), suggesting its effect is masked by collinear predictors in isolation.
Not Significant (simple model)The full model (GDP + SocialSupport + Health + Freedom + Generosity + Corruption + Region) explains 75.85% of happiness score variation. All six predictors are statistically significant (p < 0.001). Adding a simplified Region dummy (Eastern Europe, Latin America, North America, Western Europe) adds a further 7.03% improvement over the non-region model.
A permutation test comparing High vs. Low income groups (split by median GDP, n = 433 and 435 respectively) finds a statistically significant difference: High group mean = 6.19, Low group mean = 4.88, Δ = 1.31 points. The p-value from the null distribution is effectively zero — this gap cannot plausibly arise by chance.
| Model | Adj R² | Improvement | Key Insight |
|---|---|---|---|
| GDP only | 0.4736 | — | Baseline economic model |
| + Health | 0.5985 | +26.37% | Largest single-predictor gain |
| + Freedom | 0.6590 | +10.10% | Governance adds significant power |
| + Social Support | 0.6840 | +3.79% | Partial overlap with GDP/Health |
| + Generosity | 0.7010 | +2.49% | Becomes significant in MLR context |
| + Corruption | 0.7081 | +1.01% | Incremental governance signal |
| + Region (simplified) | 0.7579 | +7.03% | Cultural/geographic context matters |
| Predictor | Estimate | Std Error | t-value | Significance |
|---|---|---|---|---|
| (Intercept) | 2.299 | 0.093 | 24.59 | *** |
| GDP | 0.656 | 0.066 | 9.978 | *** |
| Social Support | 0.636 | 0.079 | 8.094 | *** |
| Health | 1.127 | 0.123 | 9.174 | *** |
| Freedom | 0.862 | 0.134 | 6.456 | *** |
| Generosity | 1.719 | 0.246 | 6.986 | *** |
| Corruption | 1.047 | 0.210 | 4.983 | *** |
| Region: Latin America | 0.688 | 0.076 | 9.113 | *** |
| Region: Eastern Europe | 0.443 | 0.083 | 5.321 | *** |
| Region: Western Europe | 0.615 | 0.100 | 6.169 | *** |
| Region: North America | 0.650 | 0.180 | 3.622 | *** |
| Region: Asia-Pacific | −0.028 | 0.065 | −0.430 | n.s. |
augment() and patchwork for 2×2 grids. Conditions for
linearity, normality, and equal variance confirmed for all five
significant predictors.
| Language | R · R Markdown |
| Packages | tidyverse · ggplot2 · GGally · broom · patchwork · glmnet · infer · readxl |
| Dataset | World Happiness Report 2019–2024 · 1,969 obs × 14 vars |
| Source | worldhappiness.report/data-sharing · Gallup World Poll · SDSN |
| Models | 6× Simple Linear Regression · Multiple Linear Regression + Region |
| Inference | Permutation test (infer) · High/Low income group split by median GDP |
| Diagnostics | Residuals vs Fitted · Histogram of Residuals · Q-Q Plot (per model) |
| Course | DATA 606 · Data Analysis · CUNY School of Professional Studies · 2024 |
Full R Markdown report on RPubs · source code on GitHub.