Package 'ppmf' reference manual

Title:	Read Census Privacy Protected Microdata Files
Description:	Implements data processing described in <doi:10.1126/sciadv.abk3283> to align modern differentially private data with formatting of older US Census data releases. The primary goal is to read in Census Privacy Protected Microdata Files data in a reproducible way. This includes tools for aggregating to relevant levels of geography by creating geographic identifiers which match the US Census Bureau's numbering. Additionally, there are tools for grouping race numeric identifiers into categories, consistent with OMB (Office of Management and Budget) classifications. Functions exist for downloading and linking to existing sources of privacy protected microdata.
Authors:	Christopher T. Kenny [aut, cre]
Maintainer:	Christopher T. Kenny <[email protected]>
License:	MIT + file LICENSE
Version:	0.2.0
Built:	2025-04-02 03:05:36 UTC
Source:	https://github.com/christopherkenny/ppmf

Add Standard GEOID to PPMF Data

Description

Adds the GEOID identifier common to spatial census data sets, such as those loaded by tigris. This allows for easier merging or aggregation by a single variable.

Usage

add_geoid(
  ppmf,
  state = TABBLKST,
  county = TABBLKCOU,
  tract = TABTRACT,
  block_group = TABBLKGRP,
  block = TABBLK,
  level = "block"
)
add_geoid(
  ppmf,
  state = TABBLKST,
  county = TABBLKCOU,
  tract = TABTRACT,
  block_group = TABBLKGRP,
  block = TABBLK,
  level = "block"
)

Arguments

`ppmf`	tibble of ppmf data
`state`	Column in ppmf with state (fips) ID. Default is `TABBLKST`.
`county`	Column in ppmf with county (fips) ID. Default is `TABBLKCOU`.
`tract`	Column in ppmf with tract ID. Default is `TABBLKTRACT`.
`block_group`	Column in ppmf with block group ID. Default is `TABBLKGRP`
`block`	Column in ppmf with block ID. Default is `TABBLK`.
`level`	Geographic level to write the GEOID for. Options are block (default), block_group, tract, and county.

Value

input data ppmf with added column GEOID

Examples

data(ppmf_ex)
ppmf_ex <- ppmf_ex |> add_geoid()

data(ppmf_ex)
ppmf_ex <- ppmf_ex |> add_geoid()

Add ppmf12 path to Renviron

Description

Add ppmf12 path to Renviron

Usage

add_ppmf12_path(path, overwrite = FALSE, install = FALSE)
add_ppmf12_path(path, overwrite = FALSE, install = FALSE)

Arguments

`path`	path where ppmf12 data is stored
`overwrite`	Defaults to FALSE. Should existing ppmf12 in Renviron be overwritten?
`install`	Defaults to FALSE. Should ppmf12 be added to '~/.Renviron' file?

Value

path, invisibly

Examples

## Not run: 
tp <- tempfile(fileext = '.csv')
add_ppmf12_path(tp)
path12 <- Sys.getenv('path12')

## End(Not run)

## Not run: 
tp <- tempfile(fileext = '.csv')
add_ppmf12_path(tp)
path12 <- Sys.getenv('path12')

## End(Not run)

Add ppmf19 path to Renviron

Description

Add ppmf19 path to Renviron

Usage

add_ppmf19_path(path, overwrite = FALSE, install = FALSE)
add_ppmf19_path(path, overwrite = FALSE, install = FALSE)

Arguments

`path`	path where ppmf19 data is stored
`overwrite`	Defaults to FALSE. Should existing ppmf19 in Renviron be overwritten?
`install`	Defaults to FALSE. Should ppmf19 be added to '~/.Renviron' file?

Value

path, invisibly

Examples

## Not run: 
tp <- tempfile(fileext = '.csv')
add_ppmf19_path(tp)
path19 <- Sys.getenv('path19')

## End(Not run)

## Not run: 
tp <- tempfile(fileext = '.csv')
add_ppmf19_path(tp)
path19 <- Sys.getenv('path19')

## End(Not run)

Add ppmf19r path to Renviron

Description

Path for the 19.61 replication in 2023.

Usage

add_ppmf19r_path(path, overwrite = FALSE, install = FALSE)
add_ppmf19r_path(path, overwrite = FALSE, install = FALSE)

Arguments

`path`	path where ppmf19r data is stored
`overwrite`	Defaults to FALSE. Should existing ppmf19 in Renviron be overwritten?
`install`	Defaults to FALSE. Should ppmf19r be added to '~/.Renviron' file?

Value

path, invisibly

Examples

## Not run: 
tp <- tempfile(fileext = '.csv')
add_ppmf19r_path(tp)
path19 <- Sys.getenv('path19')

## End(Not run)

## Not run: 
tp <- tempfile(fileext = '.csv')
add_ppmf19r_path(tp)
path19 <- Sys.getenv('path19')

## End(Not run)

Add ppmf4 path to Renviron

Description

Add ppmf4 path to Renviron

Usage

add_ppmf4_path(path, overwrite = FALSE, install = FALSE)
add_ppmf4_path(path, overwrite = FALSE, install = FALSE)

Arguments

`path`	path where ppmf4 data is stored
`overwrite`	Defaults to FALSE. Should existing ppmf4 in Renviron be overwritten?
`install`	Defaults to FALSE. Should ppmf4 be added to '~/.Renviron' file?

Value

path, invisibly

Examples

## Not run: 
tp <- tempfile(fileext = '.csv')
add_ppmf4_path(tp)
path4 <- Sys.getenv('path4')

## End(Not run)

## Not run: 
tp <- tempfile(fileext = '.csv')
add_ppmf4_path(tp)
path4 <- Sys.getenv('path4')

## End(Not run)

Aggregate PPMF Data

Description

Aggregate PPMF Data

Usage

agg(ppmf, group = GEOID, age = VOTING_AGE, race = CENRACE, hisp = CENHISP)
agg(ppmf, group = GEOID, age = VOTING_AGE, race = CENRACE, hisp = CENHISP)

Arguments

`ppmf`	tibble of ppmf data
`group`	Column in ppmf to group by, typically GEOID
`age`	Column in ppmf containing 1 for not voting age and 2 for voting age
`race`	Column in ppmf containing race codes
`hisp`	Column in ppmf containing 1 for Not Hispanic and 2 for Hispanic

Value

tibble of ppmf data aggregated by group with race classified with columns:

group: named by entry group
pop: total population
pop_hisp: total population - Hispanic or Latino (of any race)
pop_white: total population - White alone, not Hispanic or Latino
pop_black: total population - Black or African American alone, not Hispanic or Latino
pop_aian: total population - American Indian and Alaska Native alone, not Hispanic or Latino
pop_asian: total population - Asian alone, not Hispanic or Latino
pop_nhpi: total population - Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino
pop_other: total population - Some Other Race alone, not Hispanic or Latino
pop_two: total population - Population of two or more races, not Hispanic or Latino
vap: voting age population
vap_hisp: voting age population - Hispanic or Latino (of any race)
vap_white: voting age population - White alone, not Hispanic or Latino
vap_black: voting age population - Black or African American alone, not Hispanic or Latino
vap_aian: voting age population - American Indian and Alaska Native alone, not Hispanic or Latino
vap_asian: voting age population - Asian alone, not Hispanic or Latino
vap_nhpi: voting age population - Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino
vap_other: voting age population - Some Other Race alone, not Hispanic or Latino
vap_two: voting age population - Population of two or more races, not Hispanic or Latino

Examples

data(ppmf_ex)
ppmf_ex <- ppmf_ex |> add_geoid()
blocks <- agg(ppmf_ex)

data(ppmf_ex)
ppmf_ex <- ppmf_ex |> add_geoid()
blocks <- agg(ppmf_ex)

Breakdown GEOID into Components

Description

Breakdown GEOID into Components

Usage

breakdown_geoid(ppmf, GEOID = GEOID)
breakdown_geoid(ppmf, GEOID = GEOID)

Arguments

`ppmf`	tibble of ppmf data
`GEOID`	Column in ppmf with GEOID. Default is `GEOID`.

Value

tibble. ppmf with columns added for state, county, tract, block group, and/or block

Examples

data(ppmf_ex)
ppmf_ex <- ppmf_ex |> add_geoid()
ppmf_ex <- ppmf_ex |> censable::breakdown_geoid()
data(ppmf_ex)
ppmf_ex <- ppmf_ex |> add_geoid()
ppmf_ex <- ppmf_ex |> censable::breakdown_geoid()

Download PPMF Files

Description

Downloads zipped ppmf files from GitHub.

Usage

download_ppmf(dsn, dir = "", version = "19r", overwrite = FALSE)
download_ppmf(dsn, dir = "", version = "19r", overwrite = FALSE)

Arguments

`dsn`	(data save name) string to unzip the data to
`dir`	the folder or directory to save the file in
`version`	string in '19r', '19', '12' or '4' signifying the revised 19.61, original 19.61, 12.2 or 4.5 versions respectively
`overwrite`	If a file is found at path/dsn, should it be overwritten? Defaults to FALSE.

Value

a string path to where the file was downloaded to

Examples

## Not run: 
# Takes a few minutes and requires read access to files
temp <- tempdir()
path <- download_ppmf(dsn = 'ppmf_12', dir = temp)

## End(Not run)
## Not run: 
# Takes a few minutes and requires read access to files
temp <- tempdir()
path <- download_ppmf(dsn = 'ppmf_12', dir = temp)

## End(Not run)

Get PPMF File Links

Description

Returns the urls for the data. This will be expanded to link to prior or any new releases.

Usage

get_ppmf_links(version = "19r", compressed = TRUE)
get_ppmf_links(version = "19r", compressed = TRUE)

Arguments

`version`	string in '19r',, '19', '12' or '4' signifying the 19.61, 12.2, or 4.5 versions respectively
`compressed`	boolean. Return a compressed version (TRUE). FALSE gives the Census Bureau link to the uncompressed data.

Value

a string with url

Examples

# 04.28.2021 version 12.2
get_ppmf_links()
# 04.28.2021 version 4.5
get_ppmf_links(version = '4')
# 04.28.2021 version 12.2
get_ppmf_links()
# 04.28.2021 version 4.5
get_ppmf_links(version = '4')

Overwrite Races with Hispanic

Description

Overwrite Races with Hispanic

Usage

overwrite_hisp_race(ppmf, race = CENRACE, hisp = CENHISP)
overwrite_hisp_race(ppmf, race = CENRACE, hisp = CENHISP)

Arguments

`ppmf`	tibble of ppmf data
`race`	Column in ppmf containing race codes
`hisp`	Column in ppmf containing 1 for Not Hispanic and 2 for Hispanic

Value

tibble with race column entries replaced if the individual is Hispanic

Examples

data(ppmf_ex)
ppmf_ex |> replace_race() |> overwrite_hisp_race()
data(ppmf_ex)
ppmf_ex |> replace_race() |> overwrite_hisp_race()

Example PPMF Data

Description

Includes Perry County, Alabama PPMF data from the April 28, 2021 PPMF data release. This is a subset taken from the 12-2 P data.

As each observation is a person, this does not cover every block in the county and due to DAS, not every block with population appears in this data.

Usage

data('ppmf_ex')
data('ppmf_ex')

Value

tibble with sample ppmf data

Examples

data('ppmf_ex')

data('ppmf_ex')

This data includes the basic race classifications used for redistricting to get to an easier to work with set of values. This does not include hisp grouping which is controlled separately by race within the census

Usage

data('races')
data('races')

Value

tibble with three columns

code: the two digit code used to code races
desc: the description of the races
group: the summary group used

Examples

data('races')

data('races')

Read PPMF data and Merge with Census 2010 Data

Description

Read PPMF data and Merge with Census 2010 Data

Usage

read_merge_ppmf(
  state,
  level,
  versions = c("19"),
  prefixes = paste0("v", versions, "_"),
  paths = Sys.getenv(paste0("ppmf", versions))
)
read_merge_ppmf(
  state,
  level,
  versions = c("19"),
  prefixes = paste0("v", versions, "_"),
  paths = Sys.getenv(paste0("ppmf", versions))
)

Arguments

`state`	state abbreviation
`level`	geography level. One of 'block', 'block group', 'tract', 'county'
`versions`	character vector of ppmf versions. Currently '19', '12', and/or '4'
`prefixes`	prefixes to give pop and vap columns in output. Default is `paste0('v', versions, '_')`
`paths`	paths to PPMF data. Default is `Sys.getenv(paste0('ppmf', versions))`

Value

sf tibble of PPMF merged with Census 2010 data

Examples

## Not run: 
# Requires Census Bureau API
de_bg <- read_merge_ppmf('DE', 'block group')

## End(Not run)
## Not run: 
# Requires Census Bureau API
de_bg <- read_merge_ppmf('DE', 'block group')

## End(Not run)

Read in PPMF Data

Description

This reads in PPMF data from a file. Use download_ppmf() if you do not have a local copy of the ppmf data.

Usage

read_ppmf(state, path, ...)
read_ppmf(state, path, ...)

Arguments

`state`	two letter state (+ DC + PR) abbreviation or two digit state fips code
`path`	where the data is saved to
`...`	additional arguments passed on to `readr::read_csv()`

Value

tibble of ppmf data

Examples

## Not run: 
# Takes a few minutes and requires read access to files
temp <- tempdir()
path <- download_ppmf('ppmf_12.csv', dir = temp)
# If you already have it downloaded, point to it with path:
ppmf <- read_ppmf('AL', path)

## End(Not run)
## Not run: 
# Takes a few minutes and requires read access to files
temp <- tempdir()
path <- download_ppmf('ppmf_12.csv', dir = temp)
# If you already have it downloaded, point to it with path:
ppmf <- read_ppmf('AL', path)

## End(Not run)

Replace Race Categories

Description

Replaces the Census's numeric categories for race with less specific racial classifications, typically useful for redistricting purposes.

Usage

replace_race(ppmf, race = CENRACE)
replace_race(ppmf, race = CENRACE)

Arguments

`ppmf`	tibble of ppmf data
`race`	Column in ppmf containing race codes

Value

tibble with race column replaced by simpler racial classifications

Examples

data(ppmf_ex)
ppmf_ex |> replace_race()
data(ppmf_ex)
ppmf_ex |> replace_race()

State Rows

Description

This data includes the 52 geographies (50 states plus D.C. and P.R.). Within the 2010 PPMF, skip and n_max indicate the relevant rows for a geography.

Usage

data('states')
data('states')

Value

tibble with sample ppmf data

Examples

data('states')

data('states')

Package 'ppmf'

Help Index

Add Standard GEOID to PPMF Data

Description

Usage

Arguments

Value

Examples

Add ppmf12 path to Renviron

Description

Usage

Arguments

Value

Examples

Add ppmf19 path to Renviron

Description

Usage

Arguments

Value

Examples

Add ppmf19r path to Renviron

Description

Usage

Arguments

Value

Examples

Add ppmf4 path to Renviron

Description

Usage

Arguments

Value

Examples

Aggregate PPMF Data

Description

Usage

Arguments

Value

Examples

Breakdown GEOID into Components

Description

Usage

Arguments

Value

Examples

Download PPMF Files

Description

Usage

Arguments

Value

Examples

Get PPMF File Links

Description

Usage

Arguments

Value

Examples

Overwrite Races with Hispanic

Description

Usage

Arguments

Value

Examples

Example PPMF Data

Description

Usage

Value

Examples

Race Classifications

Description

Usage

Value

Examples

Read PPMF data and Merge with Census 2010 Data

Description

Usage

Arguments

Value

Examples

Read in PPMF Data

Description