R · Statistics · CUNY DATA 606

World Happiness Report
Economic, Social & Governance Influences

A statistical analysis of the World Happiness Report dataset (1,969 observations across 163 countries, 2019–2024) investigating how GDP, social support, healthy life expectancy, freedom, generosity, and perceptions of corruption predict national happiness scores. Multiple linear regression and permutation-based hypothesis testing reveal that 71% of happiness variation is explained by economic and social factors, with high-income nations scoring significantly higher than low-income peers.

R Multiple Regression Hypothesis Testing ggplot2 World Happiness Report TSPLIB CUNY DATA 606
Language
R · R Markdown
Dataset
World Happiness Report · 2019–2024
Methods
Simple & Multiple Linear Regression · Permutation Test
Course
DATA 606 · CUNY SPS · Final Project
Data Source
worldhappiness.report · Gallup World Poll
01About

The World Happiness Report is an annual publication by the Sustainable Development Solutions Network (SDSN), based on Gallup World Poll surveys. Countries self-report happiness as a Cantril Ladder score — respondents rate their current life on a scale from 0 (worst possible) to 10 (best possible). The report also attributes each country's score to six contributing factors, enabling both descriptive and inferential analysis.

This project addresses two research questions: (1) How do economic, social, and governance factors jointly influence national happiness scores? (2) Is there a statistically significant difference in happiness between high-income and low-income countries? The dataset spans 163 countries across six years (2019–2024) with 1,969 complete observations after cleaning.

The analytical approach moves from exploratory visualization (ggpairs correlation matrix) through six simple linear regressions — one per predictor — each with full model diagnostics (residuals vs. fitted, Q-Q plot, histogram). These single-predictor models establish baselines before fitting a full multiple regression model including all six factors plus a Region dummy variable, which achieves an adjusted R² of 0.7585.

For the hypothesis test, countries are split into High and Low income groups by median GDP. A permutation-based test evaluates whether the observed difference in mean happiness scores (Δ = 1.31 points) could plausibly arise by chance. The extremely small p-value strongly rejects the null hypothesis of equal group means.

02Dataset
World Happiness Report Data
1,969 observations × 14 variables · 163 countries · 2019–2024 · Source: worldhappiness.report/data-sharing
1,969
Observations
163
Countries
6
Years (2019–2024)
6
Predictors
Dependent Variable
Ladder Score (Happiness)

Self-assessed life rating on the Cantril Ladder: 0 = worst possible life, 10 = best possible life. Global range in dataset: 1.72 (Afghanistan, 2023) to 7.74 (Finland, 2024).

Economic
Log GDP per Capita

Log-transformed GDP per capita, scaled to the happiness score contribution. Measures how much each country's economic output per person contributes to its Ladder score.

Social
Social Support

Survey response to: "If you were in trouble, do you have relatives or friends you can count on?" Reflects the strength of personal support networks within a country.

Health
Healthy Life Expectancy

Number of years an individual is expected to live in good health, combining length and quality of life using WHO data and Gallup polling on perceived health.

Governance
Freedom to Make Life Choices

National average of responses to: "Are you satisfied or dissatisfied with your freedom to choose what you do with your life?" Covers personal, political, and civil freedoms.

Governance
Perceptions of Corruption

Average of two binary questions on perceived government and business corruption. Higher values indicate lower corruption perception — a country is seen as cleaner.

03Simple Regressions

Six simple linear regression models were fitted — one predictor at a time — to isolate each factor's individual relationship with the Happiness Score. Each model was validated with full diagnostics: residuals vs. fitted values (linearity), histogram and Q-Q plot of residuals (normality), and visual inspection for heteroscedasticity. All significant predictors meet regression conditions. Generosity is the sole non-significant predictor (p = 0.209, R² ≈ 0.002).

GDP
Log GDP per Capita
Ŷ = 3.499 + 1.668 × GDP
R² = 0.4742  |  F = 781.1  |  p < 2e-16

For every one-unit increase in GDP contribution, the Happiness Score rises by 1.668 points. GDP explains 47.42% of happiness variation on its own — the strongest single economic predictor. Residuals are evenly distributed and near-normal, confirming model validity.

Highest R² (single predictor)
Social Support
Social Support
Ŷ = 3.188 + 2.180 × SocialSupport
R² = 0.4737  |  F = 779.5  |  p < 2e-16

Social support explains 47.37% of happiness variation — nearly identical to GDP — underscoring that social infrastructure matters as much as economic strength. A one-unit increase corresponds to 2.180 points of additional happiness score.

Significant ★★★
Health
Healthy Life Expectancy
Ŷ = 3.736 + 3.314 × Health
R² = 0.4327  |  F = 660.4  |  p < 2e-16

Health explains 43.27% of happiness variation with the steepest slope among the significant predictors (3.314 per unit). The Q-Q plot shows slight deviation at the lower tail, suggesting some left skew for low-health countries.

Significant ★★★
Freedom
Freedom to Make Life Choices
Ŷ = 3.629 + 3.383 × Freedom
R² = 0.2933  |  F = 359.4  |  p < 2e-16

Freedom explains 29.33% of happiness — significant but weaker than GDP, social support, and health individually. The residuals vs. fitted plot shows modest heteroscedasticity at low fitted values, reflecting greater variability in less-free countries.

Significant ★★★
Corruption
Perceptions of Corruption
Ŷ = 4.952 + 4.040 × Corruption
R² = 0.1867  |  F = 198.8  |  p < 2e-16

Higher corruption perception scores (less corrupt) associate with meaningfully higher happiness scores. R² = 0.19 is the weakest of the significant predictors, but the steep slope (4.040) indicates corruption has a strong directional impact when it varies.

Significant ★★★
Generosity
Generosity
Ŷ = 5.450 + 0.553 × Generosity
R² = 0.0018  |  F = 1.583  |  p = 0.209

Generosity is the only predictor that fails to reach statistical significance in simple regression (p = 0.209). It explains just 0.18% of happiness variation. However, it remains significant in the full multiple regression model (p < 0.001), suggesting its effect is masked by collinear predictors in isolation.

Not Significant (simple model)
04Key Results
Overall Findings
Multiple Regression — Adj R²
0.7585 with Region

The full model (GDP + SocialSupport + Health + Freedom + Generosity + Corruption + Region) explains 75.85% of happiness score variation. All six predictors are statistically significant (p < 0.001). Adding a simplified Region dummy (Eastern Europe, Latin America, North America, Western Europe) adds a further 7.03% improvement over the non-region model.

Hypothesis Test — Income vs Happiness
Reject H₀ — p ≈ 0

A permutation test comparing High vs. Low income groups (split by median GDP, n = 433 and 435 respectively) finds a statistically significant difference: High group mean = 6.19, Low group mean = 4.88, Δ = 1.31 points. The p-value from the null distribution is effectively zero — this gap cannot plausibly arise by chance.

Sequential R² Improvement — Adding Predictors
Model Adj R² Improvement Key Insight
GDP only 0.4736 Baseline economic model
+ Health 0.5985 +26.37% Largest single-predictor gain
+ Freedom 0.6590 +10.10% Governance adds significant power
+ Social Support 0.6840 +3.79% Partial overlap with GDP/Health
+ Generosity 0.7010 +2.49% Becomes significant in MLR context
+ Corruption 0.7081 +1.01% Incremental governance signal
+ Region (simplified) 0.7579 +7.03% Cultural/geographic context matters
Multiple Regression Coefficients — Full Model
Predictor Estimate Std Error t-value Significance
(Intercept)2.2990.09324.59***
GDP0.6560.0669.978***
Social Support0.6360.0798.094***
Health1.1270.1239.174***
Freedom0.8620.1346.456***
Generosity1.7190.2466.986***
Corruption1.0470.2104.983***
Region: Latin America0.6880.0769.113***
Region: Eastern Europe0.4430.0835.321***
Region: Western Europe0.6150.1006.169***
Region: North America0.6500.1803.622***
Region: Asia-Pacific−0.0280.065−0.430n.s.
Hypothesis Test — Income Group vs Happiness
Permutation Test · α = 0.05 · Method: Infer Package (R)
H₀: μ_high = μ_low
(No significant difference in happiness scores between
high-income and low-income countries)

Hₐ: μ_high ≠ μ_low
(Significant difference exists)
High GDP group mean 6.19  (n = 433)
Low GDP group mean 4.88  (n = 435)
Observed difference (Δ) 1.31 points
p-value (permutation) ≈ 0  → Reject H₀
Conclusion Significant difference confirmed
05Contributions
LanguageR · R Markdown
Packagestidyverse · ggplot2 · GGally · broom · patchwork · glmnet · infer · readxl
DatasetWorld Happiness Report 2019–2024 · 1,969 obs × 14 vars
Sourceworldhappiness.report/data-sharing · Gallup World Poll · SDSN
Models6× Simple Linear Regression · Multiple Linear Regression + Region
InferencePermutation test (infer) · High/Low income group split by median GDP
DiagnosticsResiduals vs Fitted · Histogram of Residuals · Q-Q Plot (per model)
CourseDATA 606 · Data Analysis · CUNY School of Professional Studies · 2024

Read the full analysis

Full R Markdown report on RPubs · source code on GitHub.