R · Quarto · Shiny · CUNY DATA 607

Housing Affordability
Dynamics in New York State

A multi-source analysis integrating Census ACS, FRED macroeconomic data, and HUD Fair Market Rent benchmarks to investigate what drives housing burden across all 62 New York counties — with a novel application of machine unlearning to assess how 2022–2024 interest rate shocks shaped predictive models.

R Quarto Shiny Machine Learning Machine Unlearning Housing Policy CUNY DATA 607
Language
R · Quarto
Data Sources
Census ACS · FRED · HUD FMR
Models
Linear Regression · Random Forest
Course
CUNY DATA 607 Final Project
Status
Published · Live
01About

This is the final project for CUNY's DATA 607 — but the research question came from genuine curiosity, not a rubric. Housing affordability is one of the defining pressures on New York residents, and I wanted to understand it rigorously: what actually drives the gap between what people earn and what they pay to live here?

The project integrates three heterogeneous government data sources — Census ACS 5-Year estimates for 62 counties (2009–2024), FRED API data for mortgage rates and inflation, and HUD Fair Market Rent schedules as a policy-grounded rental benchmark — into a single unified panel dataset built from scratch via API calls and GitHub-hosted CSVs.

The workflow follows an OSEMN framework (Obtain, Scrub, Explore, Model, Interpret) — progressing from raw API ingestion and cleaning, through exploratory visualization, into predictive modeling with tidymodels and a novel experiment in machine unlearning.

The machine unlearning component — selectively removing 2022–2024 high-interest-rate observations by zeroing out case weights — was the methodological centerpiece. It let me ask: how much of what these models "know" is rate-cycle specific versus structurally stable? That question has real implications for how HUD FMR benchmarks should be adjusted across rate-cycle transitions.

02Research Questions

Two questions structure the modeling work — one about prediction, one about sensitivity. Both are answered through the same unified county-year panel of 62 New York counties from 2009 to 2024.

RQ 1
What structural and macroeconomic factors best predict a county's housing burden?
  • Local structural factors — home prices, population density, and regional rent levels — explain approximately 96% of housing burden variation across counties.
  • Macroeconomic variables such as mortgage rates and the Fed Funds Rate contributed minimally once local conditions were accounted for.
  • 🌲 Random Forest outperformed Linear Regression on held-out data, confirming nonlinear interactions between geographic and economic variables.
RQ 2
How sensitive are HUD FMR predictions to removal of high-interest-rate years (2022–2024)?
  • $ Machine unlearning produced a mean statewide prediction shift of −$70/month — modest but real influence of the rate-hike era on what models learned.
  • Sensitivity was heavily concentrated in high-cost downstate counties. Rural and mid-size upstate markets were largely unaffected, reflecting lower rate-cycle sensitivity.
  • Both models maintained high R² post-unlearning, confirming FMR is primarily driven by structural county factors, not exclusively by interest rate environment.
03Interactive App
Live Shiny Dashboard
Explore housing burden by county, year, and income group. Interact with the maps, filters, and model outputs directly.
Open in new tab ↗
04What I practiced
Language R · Quarto
Data tidycensus, fredr, httr, jsonlite
Wrangling tidyverse, lubridate, zoo, dplyr
Modeling tidymodels, vip, randomForest
Spatial leaflet, tigris, sf
Viz ggplot2, patchwork, scales, gt
App Shiny · ShinyApps.io
Publish RPubs · GitHub
Dataset Census ACS 5-Yr · FRED · HUD FMR (62 NY counties, 2009–2024)

Explore the full project

Published report, interactive app, and all source code are publicly available.