R · Statistics · CUNY DATA 606

World Happiness Report —
Economic, Social & Governance Influences

A statistical analysis of the World Happiness Report dataset (1,969 observations across 163 countries, 2019–2024) investigating how GDP, social support, healthy life expectancy, freedom, generosity, and perceptions of corruption predict national happiness scores. Multiple linear regression and permutation-based hypothesis testing reveal that 71% of happiness variation is explained by economic and social factors, with high-income nations scoring significantly higher than low-income peers.

R Multiple Regression Hypothesis Testing ggplot2 World Happiness Report TSPLIB CUNY DATA 606

Language

R · R Markdown

Dataset

World Happiness Report · 2019–2024

Methods

Simple & Multiple Linear Regression · Permutation Test

Course

DATA 606 · CUNY SPS · Final Project

Data Source

worldhappiness.report · Gallup World Poll

Links

GitHub Repo ↗ RPubs Report ↗

01About

The World Happiness Report is an annual publication by the Sustainable Development Solutions Network (SDSN), based on Gallup World Poll surveys. Countries self-report happiness as a Cantril Ladder score — respondents rate their current life on a scale from 0 (worst possible) to 10 (best possible). The report also attributes each country's score to six contributing factors, enabling both descriptive and inferential analysis.

This project addresses two research questions: (1) How do economic, social, and governance factors jointly influence national happiness scores? (2) Is there a statistically significant difference in happiness between high-income and low-income countries? The dataset spans 163 countries across six years (2019–2024) with 1,969 complete observations after cleaning.

The analytical approach moves from exploratory visualization (ggpairs correlation matrix) through six simple linear regressions — one per predictor — each with full model diagnostics (residuals vs. fitted, Q-Q plot, histogram). These single-predictor models establish baselines before fitting a full multiple regression model including all six factors plus a Region dummy variable, which achieves an adjusted R² of 0.7585.

For the hypothesis test, countries are split into High and Low income groups by median GDP. A permutation-based test evaluates whether the observed difference in mean happiness scores (Δ = 1.31 points) could plausibly arise by chance. The extremely small p-value strongly rejects the null hypothesis of equal group means.

02Dataset

World Happiness Report Data

1,969 observations × 14 variables · 163 countries · 2019–2024 · Source: worldhappiness.report/data-sharing

1,969

Observations

163

Countries

Years (2019–2024)

Predictors

Dependent Variable

Ladder Score (Happiness)

Self-assessed life rating on the Cantril Ladder: 0 = worst possible life, 10 = best possible life. Global range in dataset: 1.72 (Afghanistan, 2023) to 7.74 (Finland, 2024).

Economic

Log GDP per Capita

Log-transformed GDP per capita, scaled to the happiness score contribution. Measures how much each country's economic output per person contributes to its Ladder score.

Social

Social Support

Survey response to: "If you were in trouble, do you have relatives or friends you can count on?" Reflects the strength of personal support networks within a country.

Health

Healthy Life Expectancy

Number of years an individual is expected to live in good health, combining length and quality of life using WHO data and Gallup polling on perceived health.

Governance

Freedom to Make Life Choices

National average of responses to: "Are you satisfied or dissatisfied with your freedom to choose what you do with your life?" Covers personal, political, and civil freedoms.

Governance

Perceptions of Corruption

Average of two binary questions on perceived government and business corruption. Higher values indicate lower corruption perception — a country is seen as cleaner.

03Simple Regressions

Six simple linear regression models were fitted — one predictor at a time — to isolate each factor's individual relationship with the Happiness Score. Each model was validated with full diagnostics: residuals vs. fitted values (linearity), histogram and Q-Q plot of residuals (normality), and visual inspection for heteroscedasticity. All significant predictors meet regression conditions. Generosity is the sole non-significant predictor (p = 0.209, R² ≈ 0.002).

GDP

Log GDP per Capita

Ŷ = 3.499 + 1.668 × GDP

R² = 0.4742 | F = 781.1 | p < 2e-16

For every one-unit increase in GDP contribution, the Happiness Score rises by 1.668 points. GDP explains 47.42% of happiness variation on its own — the strongest single economic predictor. Residuals are evenly distributed and near-normal, confirming model validity.

Highest R² (single predictor)

Social Support

Ŷ = 3.188 + 2.180 × SocialSupport

R² = 0.4737 | F = 779.5 | p < 2e-16

Social support explains 47.37% of happiness variation — nearly identical to GDP — underscoring that social infrastructure matters as much as economic strength. A one-unit increase corresponds to 2.180 points of additional happiness score.

Significant ★★★

Health

Healthy Life Expectancy

Ŷ = 3.736 + 3.314 × Health

R² = 0.4327 | F = 660.4 | p < 2e-16

Health explains 43.27% of happiness variation with the steepest slope among the significant predictors (3.314 per unit). The Q-Q plot shows slight deviation at the lower tail, suggesting some left skew for low-health countries.

Significant ★★★

Freedom

Freedom to Make Life Choices

Ŷ = 3.629 + 3.383 × Freedom

R² = 0.2933 | F = 359.4 | p < 2e-16

Freedom explains 29.33% of happiness — significant but weaker than GDP, social support, and health individually. The residuals vs. fitted plot shows modest heteroscedasticity at low fitted values, reflecting greater variability in less-free countries.

Significant ★★★

Corruption

Perceptions of Corruption

Ŷ = 4.952 + 4.040 × Corruption

R² = 0.1867 | F = 198.8 | p < 2e-16

Higher corruption perception scores (less corrupt) associate with meaningfully higher happiness scores. R² = 0.19 is the weakest of the significant predictors, but the steep slope (4.040) indicates corruption has a strong directional impact when it varies.

Significant ★★★

Generosity

Ŷ = 5.450 + 0.553 × Generosity

R² = 0.0018 | F = 1.583 | p = 0.209

Generosity is the only predictor that fails to reach statistical significance in simple regression (p = 0.209). It explains just 0.18% of happiness variation. However, it remains significant in the full multiple regression model (p < 0.001), suggesting its effect is masked by collinear predictors in isolation.

Not Significant (simple model)

04Key Results

Overall Findings

Multiple Regression — Adj R²

0.7585 with Region

The full model (GDP + SocialSupport + Health + Freedom + Generosity + Corruption + Region) explains 75.85% of happiness score variation. All six predictors are statistically significant (p < 0.001). Adding a simplified Region dummy (Eastern Europe, Latin America, North America, Western Europe) adds a further 7.03% improvement over the non-region model.

Hypothesis Test — Income vs Happiness

Reject H₀ — p ≈ 0

A permutation test comparing High vs. Low income groups (split by median GDP, n = 433 and 435 respectively) finds a statistically significant difference: High group mean = 6.19, Low group mean = 4.88, Δ = 1.31 points. The p-value from the null distribution is effectively zero — this gap cannot plausibly arise by chance.

Sequential R² Improvement — Adding Predictors

Model	Adj R²	Improvement	Key Insight
GDP only	0.4736	—	Baseline economic model
+ Health	0.5985	+26.37%	Largest single-predictor gain
+ Freedom	0.6590	+10.10%	Governance adds significant power
+ Social Support	0.6840	+3.79%	Partial overlap with GDP/Health
+ Generosity	0.7010	+2.49%	Becomes significant in MLR context
+ Corruption	0.7081	+1.01%	Incremental governance signal
+ Region (simplified)	0.7579	+7.03%	Cultural/geographic context matters

Multiple Regression Coefficients — Full Model

Predictor	Estimate	Std Error	t-value	Significance
(Intercept)	2.299	0.093	24.59	***
GDP	0.656	0.066	9.978	***
Social Support	0.636	0.079	8.094	***
Health	1.127	0.123	9.174	***
Freedom	0.862	0.134	6.456	***
Generosity	1.719	0.246	6.986	***
Corruption	1.047	0.210	4.983	***
Region: Latin America	0.688	0.076	9.113	***
Region: Eastern Europe	0.443	0.083	5.321	***
Region: Western Europe	0.615	0.100	6.169	***
Region: North America	0.650	0.180	3.622	***
Region: Asia-Pacific	−0.028	0.065	−0.430	n.s.

Hypothesis Test — Income Group vs Happiness

Permutation Test · α = 0.05 · Method: Infer Package (R)

H₀: μ_high = μ_low

(No significant difference in happiness scores between

high-income and low-income countries)

Hₐ: μ_high ≠ μ_low

(Significant difference exists)

High GDP group mean 6.19 (n = 433)

Low GDP group mean 4.88 (n = 435)

Observed difference (Δ) 1.31 points

p-value (permutation) ≈ 0 → Reject H₀

Conclusion Significant difference confirmed

05Contributions

01 Exploratory data analysis — produced a full ggpairs correlation matrix across all six predictors and the happiness score, identifying GDP (0.689), Social Support (0.688), and Health (0.658) as the strongest individual correlates, and establishing that Generosity (0.043) is near-zero correlated with score.
02 Six simple linear regression models — each with full diagnostic plots (residuals vs. fitted, histogram, Q-Q) using broom's augment() and patchwork for 2×2 grids. Conditions for linearity, normality, and equal variance confirmed for all five significant predictors.
03 Multiple linear regression with Region — added a categorical Region variable (7 levels, later simplified to 5 via dimensionality reduction) using dummy coding. Achieved Adj R² = 0.7579. Showed that Region adds 7.03% R² improvement beyond the six core predictors.
04 Sequential predictor analysis — built an incremental R² improvement table showing the marginal contribution of each predictor when added to the model, demonstrating that Health (+26.37%) and Freedom (+10.10%) add the most explanatory power after GDP.
05 Permutation-based hypothesis test — using the infer package, split countries by median GDP into High/Low income groups and tested H₀: μ_high = μ_low. Generated a null distribution via permutation and computed the p-value, confirming a statistically significant happiness gap of 1.31 points between groups.

Language	R · R Markdown
Packages	tidyverse · ggplot2 · GGally · broom · patchwork · glmnet · infer · readxl
Dataset	World Happiness Report 2019–2024 · 1,969 obs × 14 vars
Source	worldhappiness.report/data-sharing · Gallup World Poll · SDSN
Models	6× Simple Linear Regression · Multiple Linear Regression + Region
Inference	Permutation test (infer) · High/Low income group split by median GDP
Diagnostics	Residuals vs Fitted · Histogram of Residuals · Q-Q Plot (per model)
Course	DATA 606 · Data Analysis · CUNY School of Professional Studies · 2024

Read the full analysis

Full R Markdown report on RPubs · source code on GitHub.

GitHub Repo ↗ RPubs Report ↗ All Projects

World Happiness Report — Economic, Social & Governance Influences

Read the full analysis

World Happiness Report —
Economic, Social & Governance Influences