Title: | Read Census Privacy Protected Microdata Files |
---|---|
Description: | Implements data processing described in <doi:10.1126/sciadv.abk3283> to align modern differentially private data with formatting of older US Census data releases. The primary goal is to read in Census Privacy Protected Microdata Files data in a reproducible way. This includes tools for aggregating to relevant levels of geography by creating geographic identifiers which match the US Census Bureau's numbering. Additionally, there are tools for grouping race numeric identifiers into categories, consistent with OMB (Office of Management and Budget) classifications. Functions exist for downloading and linking to existing sources of privacy protected microdata. |
Authors: | Christopher T. Kenny [aut, cre] |
Maintainer: | Christopher T. Kenny <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2024-11-03 03:04:54 UTC |
Source: | https://github.com/christopherkenny/ppmf |
Adds the GEOID identifier common to spatial census data sets, such as those loaded by tigris. This allows for easier merging or aggregation by a single variable.
add_geoid( ppmf, state = TABBLKST, county = TABBLKCOU, tract = TABTRACT, block_group = TABBLKGRP, block = TABBLK, level = "block" )
add_geoid( ppmf, state = TABBLKST, county = TABBLKCOU, tract = TABTRACT, block_group = TABBLKGRP, block = TABBLK, level = "block" )
ppmf |
tibble of ppmf data |
state |
Column in ppmf with state (fips) ID. Default is |
county |
Column in ppmf with county (fips) ID. Default is |
tract |
Column in ppmf with tract ID. Default is |
block_group |
Column in ppmf with block group ID. Default is |
block |
Column in ppmf with block ID. Default is |
level |
Geographic level to write the GEOID for. Options are block (default), block_group, tract, and county. |
input data ppmf with added column GEOID
data(ppmf_ex) ppmf_ex <- ppmf_ex |> add_geoid()
data(ppmf_ex) ppmf_ex <- ppmf_ex |> add_geoid()
Add ppmf12 path to Renviron
add_ppmf12_path(path, overwrite = FALSE, install = FALSE)
add_ppmf12_path(path, overwrite = FALSE, install = FALSE)
path |
path where ppmf12 data is stored |
overwrite |
Defaults to FALSE. Should existing ppmf12 in Renviron be overwritten? |
install |
Defaults to FALSE. Should ppmf12 be added to '~/.Renviron' file? |
path, invisibly
## Not run: tp <- tempfile(fileext = '.csv') add_ppmf12_path(tp) path12 <- Sys.getenv('path12') ## End(Not run)
## Not run: tp <- tempfile(fileext = '.csv') add_ppmf12_path(tp) path12 <- Sys.getenv('path12') ## End(Not run)
Add ppmf19 path to Renviron
add_ppmf19_path(path, overwrite = FALSE, install = FALSE)
add_ppmf19_path(path, overwrite = FALSE, install = FALSE)
path |
path where ppmf19 data is stored |
overwrite |
Defaults to FALSE. Should existing ppmf19 in Renviron be overwritten? |
install |
Defaults to FALSE. Should ppmf19 be added to '~/.Renviron' file? |
path, invisibly
## Not run: tp <- tempfile(fileext = '.csv') add_ppmf19_path(tp) path19 <- Sys.getenv('path19') ## End(Not run)
## Not run: tp <- tempfile(fileext = '.csv') add_ppmf19_path(tp) path19 <- Sys.getenv('path19') ## End(Not run)
Path for the 19.61 replication in 2023.
add_ppmf19r_path(path, overwrite = FALSE, install = FALSE)
add_ppmf19r_path(path, overwrite = FALSE, install = FALSE)
path |
path where ppmf19r data is stored |
overwrite |
Defaults to FALSE. Should existing ppmf19 in Renviron be overwritten? |
install |
Defaults to FALSE. Should ppmf19r be added to '~/.Renviron' file? |
path, invisibly
## Not run: tp <- tempfile(fileext = '.csv') add_ppmf19r_path(tp) path19 <- Sys.getenv('path19') ## End(Not run)
## Not run: tp <- tempfile(fileext = '.csv') add_ppmf19r_path(tp) path19 <- Sys.getenv('path19') ## End(Not run)
Add ppmf4 path to Renviron
add_ppmf4_path(path, overwrite = FALSE, install = FALSE)
add_ppmf4_path(path, overwrite = FALSE, install = FALSE)
path |
path where ppmf4 data is stored |
overwrite |
Defaults to FALSE. Should existing ppmf4 in Renviron be overwritten? |
install |
Defaults to FALSE. Should ppmf4 be added to '~/.Renviron' file? |
path, invisibly
## Not run: tp <- tempfile(fileext = '.csv') add_ppmf4_path(tp) path4 <- Sys.getenv('path4') ## End(Not run)
## Not run: tp <- tempfile(fileext = '.csv') add_ppmf4_path(tp) path4 <- Sys.getenv('path4') ## End(Not run)
Aggregate PPMF Data
agg(ppmf, group = GEOID, age = VOTING_AGE, race = CENRACE, hisp = CENHISP)
agg(ppmf, group = GEOID, age = VOTING_AGE, race = CENRACE, hisp = CENHISP)
ppmf |
tibble of ppmf data |
group |
Column in ppmf to group by, typically GEOID |
age |
Column in ppmf containing 1 for not voting age and 2 for voting age |
race |
Column in ppmf containing race codes |
hisp |
Column in ppmf containing 1 for Not Hispanic and 2 for Hispanic |
tibble of ppmf data aggregated by group with race classified with columns:
group
: named by entry group
pop
: total population
pop_hisp
: total population - Hispanic or Latino (of any race)
pop_white
: total population - White alone, not Hispanic or Latino
pop_black
: total population - Black or African American alone, not Hispanic or Latino
pop_aian
: total population - American Indian and Alaska Native alone, not Hispanic or Latino
pop_asian
: total population - Asian alone, not Hispanic or Latino
pop_nhpi
: total population - Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino
pop_other
: total population - Some Other Race alone, not Hispanic or Latino
pop_two
: total population - Population of two or more races, not Hispanic or Latino
vap
: voting age population
vap_hisp
: voting age population - Hispanic or Latino (of any race)
vap_white
: voting age population - White alone, not Hispanic or Latino
vap_black
: voting age population - Black or African American alone, not Hispanic or Latino
vap_aian
: voting age population - American Indian and Alaska Native alone, not Hispanic or Latino
vap_asian
: voting age population - Asian alone, not Hispanic or Latino
vap_nhpi
: voting age population - Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino
vap_other
: voting age population - Some Other Race alone, not Hispanic or Latino
vap_two
: voting age population - Population of two or more races, not Hispanic or Latino
data(ppmf_ex) ppmf_ex <- ppmf_ex |> add_geoid() blocks <- agg(ppmf_ex)
data(ppmf_ex) ppmf_ex <- ppmf_ex |> add_geoid() blocks <- agg(ppmf_ex)
Breakdown GEOID into Components
breakdown_geoid(ppmf, GEOID = GEOID)
breakdown_geoid(ppmf, GEOID = GEOID)
ppmf |
tibble of ppmf data |
GEOID |
Column in ppmf with GEOID. Default is |
tibble. ppmf with columns added for state, county, tract, block group, and/or block
data(ppmf_ex) ppmf_ex <- ppmf_ex |> add_geoid() ppmf_ex <- ppmf_ex |> censable::breakdown_geoid()
data(ppmf_ex) ppmf_ex <- ppmf_ex |> add_geoid() ppmf_ex <- ppmf_ex |> censable::breakdown_geoid()
Downloads zipped ppmf files from GitHub.
download_ppmf(dsn, dir = "", version = "19r", overwrite = FALSE)
download_ppmf(dsn, dir = "", version = "19r", overwrite = FALSE)
dsn |
(data save name) string to unzip the data to |
dir |
the folder or directory to save the file in |
version |
string in '19r', '19', '12' or '4' signifying the revised 19.61, original 19.61, 12.2 or 4.5 versions respectively |
overwrite |
If a file is found at path/dsn, should it be overwritten? Defaults to FALSE. |
a string path to where the file was downloaded to
## Not run: # Takes a few minutes and requires read access to files temp <- tempdir() path <- download_ppmf(dsn = 'ppmf_12', dir = temp) ## End(Not run)
## Not run: # Takes a few minutes and requires read access to files temp <- tempdir() path <- download_ppmf(dsn = 'ppmf_12', dir = temp) ## End(Not run)
Returns the urls for the data. This will be expanded to link to prior or any new releases.
get_ppmf_links(version = "19r", compressed = TRUE)
get_ppmf_links(version = "19r", compressed = TRUE)
version |
string in '19r',, '19', '12' or '4' signifying the 19.61, 12.2, or 4.5 versions respectively |
compressed |
boolean. Return a compressed version (TRUE). FALSE gives the Census Bureau link to the uncompressed data. |
a string with url
# 04.28.2021 version 12.2 get_ppmf_links() # 04.28.2021 version 4.5 get_ppmf_links(version = '4')
# 04.28.2021 version 12.2 get_ppmf_links() # 04.28.2021 version 4.5 get_ppmf_links(version = '4')
Overwrite Races with Hispanic
overwrite_hisp_race(ppmf, race = CENRACE, hisp = CENHISP)
overwrite_hisp_race(ppmf, race = CENRACE, hisp = CENHISP)
ppmf |
tibble of ppmf data |
race |
Column in ppmf containing race codes |
hisp |
Column in ppmf containing 1 for Not Hispanic and 2 for Hispanic |
tibble with race column entries replaced if the individual is Hispanic
data(ppmf_ex) ppmf_ex |> replace_race() |> overwrite_hisp_race()
data(ppmf_ex) ppmf_ex |> replace_race() |> overwrite_hisp_race()
Includes Perry County, Alabama PPMF data from the April 28, 2021 PPMF data release. This is a subset taken from the 12-2 P data.
As each observation is a person, this does not cover every block in the county and due to DAS, not every block with population appears in this data.
data('ppmf_ex')
data('ppmf_ex')
tibble with sample ppmf data
data('ppmf_ex')
data('ppmf_ex')
This data includes the basic race classifications used for redistricting to
get to an easier to work with set of values. This does not include hisp
grouping
which is controlled separately by race within the census
data('races')
data('races')
tibble with three columns
code: the two digit code used to code races
desc: the description of the races
group: the summary group used
data('races')
data('races')
Read PPMF data and Merge with Census 2010 Data
read_merge_ppmf( state, level, versions = c("19"), prefixes = paste0("v", versions, "_"), paths = Sys.getenv(paste0("ppmf", versions)) )
read_merge_ppmf( state, level, versions = c("19"), prefixes = paste0("v", versions, "_"), paths = Sys.getenv(paste0("ppmf", versions)) )
state |
state abbreviation |
level |
geography level. One of 'block', 'block group', 'tract', 'county' |
versions |
character vector of ppmf versions. Currently '19', '12', and/or '4' |
prefixes |
prefixes to give pop and vap columns in output. Default is |
paths |
paths to PPMF data. Default is |
sf tibble of PPMF merged with Census 2010 data
## Not run: # Requires Census Bureau API de_bg <- read_merge_ppmf('DE', 'block group') ## End(Not run)
## Not run: # Requires Census Bureau API de_bg <- read_merge_ppmf('DE', 'block group') ## End(Not run)
This reads in PPMF data from a file. Use download_ppmf()
if you do
not have a local copy of the ppmf data.
read_ppmf(state, path, ...)
read_ppmf(state, path, ...)
state |
two letter state (+ DC + PR) abbreviation or two digit state fips code |
path |
where the data is saved to |
... |
additional arguments passed on to |
tibble of ppmf data
## Not run: # Takes a few minutes and requires read access to files temp <- tempdir() path <- download_ppmf('ppmf_12.csv', dir = temp) # If you already have it downloaded, point to it with path: ppmf <- read_ppmf('AL', path) ## End(Not run)
## Not run: # Takes a few minutes and requires read access to files temp <- tempdir() path <- download_ppmf('ppmf_12.csv', dir = temp) # If you already have it downloaded, point to it with path: ppmf <- read_ppmf('AL', path) ## End(Not run)
Replaces the Census's numeric categories for race with less specific racial classifications, typically useful for redistricting purposes.
replace_race(ppmf, race = CENRACE)
replace_race(ppmf, race = CENRACE)
ppmf |
tibble of ppmf data |
race |
Column in ppmf containing race codes |
tibble with race column replaced by simpler racial classifications
data(ppmf_ex) ppmf_ex |> replace_race()
data(ppmf_ex) ppmf_ex |> replace_race()
This data includes the 52 geographies (50 states plus D.C. and P.R.). Within the 2010 PPMF, skip and n_max indicate the relevant rows for a geography.
data('states')
data('states')
tibble with sample ppmf data
data('states')
data('states')