--- title: "ACS-Decennial-Block Harmonization" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{ACS-Decennial-Block Harmonization} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} knitr::opts_chunk$set(collapse=TRUE, comment = "#>", eval=file.exists(here::here("data-raw/fcc.csv"))) ``` This vignette examines the accuracy of the ACS-Decennial-Block harmonization calculated by `bl_harmonize_vars()`. We start where the [README](../) leaves off. ```{r eval=F} library(blockpop) library(dplyr) library(stringr) library(tidyr) acs_d = bl_download_acs_vars("WA") census_d = bl_download_2010_vars("WA") block_d = bl_download_fcc("data-raw/fcc.csv") %>% bl_load_state("WA", .) %>% bl_est_2020() %>% bl_harmonize_vars(census_d, acs_d) print(block_d) ``` ```{r setup, echo=F, message=F} library(blockpop) library(dplyr) library(stringr) library(tidyr) fcc_d = bl_load_state("WA", here::here("data-raw/fcc.csv")) block_d = bl_est_2020(fcc_d) acs_d = readr::read_rds(here::here("data-raw/acs.rds")) census_d = readr::read_rds(here::here("data-raw/census2010.rds")) block_d = bl_harmonize_vars(block_d, census_d, acs_d) print(block_d) ``` Then we join summarize our block-level estimates at the block group level and join them to the ACS data to compare. ```{r} sum_d = block_d %>% select(-starts_with("vap_")) %>% mutate(bgroup = str_sub(block, 1, 12)) %>% select(-block, -state) %>% group_by(bgroup) %>% summarize(across(everything(), sum)) %>% rename_with(~ str_c("est_", .), .cols=starts_with("pop_")) %>% inner_join(rename_with(acs_d, ~ str_c("acs_", .), .cols=-bgroup), by="bgroup") %>% pivot_longer(c(starts_with("est_pop_"), starts_with("acs_pop_")), names_to=c("source", "type", "race"), names_sep="_", values_to="count") %>% select(-type) %>% pivot_wider(names_from=source, values_from=count) %>% mutate(est_pct = est/pop2020, acs_pct = acs/acs_pop) ``` ```{r eval=T, include=F} if (exists("sum_d")) { readr::write_rds(sum_d, here::here("inst/extdata/sum.rds"), compress="gz") } else { sum_d = readr::read_rds(system.file("extdata", "sum.rds", package="blockpop")) } ``` Plotting, we see excellent agreement between the ACS and estimated demographic percentages at the block group level. In fact, the Spearman correlation is in excess of 99.9%. ```{r eval=T} library(ggplot2) ggplot(sum_d, aes(acs_pct, est_pct, color=race)) + geom_point(size=0.1, alpha=0.5) + coord_cartesian(expand=0) + guides(color=guide_legend(override.aes=list(size=5))) + theme_bw() ``` This agreement occurs with 2020 block total population estimates exactly equal to the projections from `bl_est_2020()`, and using block-level demographic estimates that are as close as possible to the 2010 Census counts. This means much more accuracy and granularity than simple population-based disaggregation from block-group-level ACS data.