Title: | Ecological Inference and Higher-Dimension Data Management |
---|---|
Description: | Provides methods for analyzing R by C ecological contingency tables using the extreme case analysis, ecological regression, and Multinomial-Dirichlet ecological inference models. Also provides tools for manipulating higher-dimension data objects. |
Authors: | Olivia Lau <[email protected]>, Ryan T. Moore <[email protected]>, Michael Kellermann <[email protected]> |
Maintainer: | Michael Kellermann <[email protected]> |
License: | GPL (>= 2) | file LICENSE |
Version: | 0.2-2 |
Built: | 2025-03-07 04:03:38 UTC |
Source: | https://github.com/cran/eiPack |
Calculates the deterministic bounds on the proportion of row members within a specified column.
bounds(formula, data, rows, column, excluded = NULL, threshold = 0.9, total = NULL)
bounds(formula, data, rows, column, excluded = NULL, threshold = 0.9, total = NULL)
formula |
a formula of the form |
data |
a data frame containing the variables specified in
|
rows |
a character vector specifying the rows of interest |
column |
a character string specifying the column marginal of interest |
excluded |
an optional character string (or vector of character strings) specifying the columns to be excluded from the bounds calculation. For example, if the quantity of interest is Democratic share of the two-party vote, non-voters would be excluded. |
threshold |
the minimum proportion of the unit that row members must
comprise for the bounds to be calculated for the unit. If
|
total |
if row and/or column marginals are given as proportions,
|
A list with elements
bounds |
a list of deterministic bounds for all units in which row proportions meet the threshold |
intersection |
if the intersection of the deterministic bounding
intervals is non-empty, the intersection is returned. Otherwise,
|
Ryan T. Moore <[email protected]>
Otis Dudley Duncan and Beverley Davis. 1953. “An Alternative to Ecological Correlation.” American Sociological Review 18: 665-666.
plot.bounds
Generates a plot of central credible intervals for the
unit-level beta parameters from the Multinomial-Dirichlet ecological inference model
(see ei.MD.bayes
).
cover.plot(object, row, column, x = NULL, CI = 0.95, medians = TRUE, col = NULL, ylim = c(0,1), ylab, lty = par("lty"), lwd = par("lwd"), ...)
cover.plot(object, row, column, x = NULL, CI = 0.95, medians = TRUE, col = NULL, ylim = c(0,1), ylab, lty = par("lty"), lwd = par("lwd"), ...)
object |
output from |
row |
a character string specifying the row marginal of interest |
column |
a character string specifying the column marginal of interest |
x |
an optional covariate to index the units along the x-axis |
CI |
a fraction between 0 and 1 (defaults to 0.95), specifying the coverage of the central credible interval to be plotted for each unit |
medians |
a logical value specifying whether to plot the median
(defaults to |
col |
an optional vector of colors to be passed to
|
ylim |
an optional range for the y-axis (defaults to |
ylab |
an optional label for the y-axis (defaults to
|
lty |
an optional line type passed to |
lwd |
an optional line width argument passed to
|
... |
additional arguments passed to |
A plot with vertical intervals indicating the central credible intervals for each ecological unit.
Olivia Lau <[email protected]>
plot
, segments
, par
Generates a density plot for population level quantities of
interest output by lambda.MD
, lambda.reg
,
and lambda.reg.bayes
. For the Bayesian methods,
densityplot
plots the kernel density for the draws. For the
frequentist lambda.reg
method, densityplot
plots
the canonical Normal density conditional on the mean and standard error
output by lambda.reg
.
## S3 method for class 'lambdaMD' densityplot(x, by = "column", col, xlim, ylim, main = "", sub = NULL, xlab, ylab, lty = par("lty"), lwd = par("lwd"), ...) ## S3 method for class 'lambdaRegBayes' densityplot(x, by = "column", col, xlim, ylim, main = "", sub = NULL, xlab, ylab, lty = par("lty"), lwd = par("lwd"), ...) ## S3 method for class 'lambdaReg' densityplot(x, by = "column", col, xlim, ylim, main = "", sub = NULL, xlab, ylab, lty = par("lty"), lwd = par("lwd"), ...)
## S3 method for class 'lambdaMD' densityplot(x, by = "column", col, xlim, ylim, main = "", sub = NULL, xlab, ylab, lty = par("lty"), lwd = par("lwd"), ...) ## S3 method for class 'lambdaRegBayes' densityplot(x, by = "column", col, xlim, ylim, main = "", sub = NULL, xlab, ylab, lty = par("lty"), lwd = par("lwd"), ...) ## S3 method for class 'lambdaReg' densityplot(x, by = "column", col, xlim, ylim, main = "", sub = NULL, xlab, ylab, lty = par("lty"), lwd = par("lwd"), ...)
x |
output from |
by |
character string (defaulting to |
col |
an optional vector of colors, with length corresponding to
the number of marginals selected in |
xlim , ylim
|
optional limits for the x-axis and y-axis, passed to
|
main , sub
|
optional title and subtitle, passed to |
xlab , ylab
|
optional labels for the x- and y-axes, passed to
|
lty , lwd
|
optional arguments for line type and line width, passed
to |
... |
additional arguments passed to |
A plot with density lines for the selected margin (row or column).
Olivia Lau <[email protected]>
plot
, segments
, par
Implements a version of the hierarchical model suggested in Rosen et al. (2001)
ei.MD.bayes(formula, covariate = NULL, total = NULL, data, lambda1 = 4, lambda2 = 2, covariate.prior.list = NULL, tune.list = NULL, start.list = NULL, sample = 1000, thin = 1, burnin = 1000, verbose = 0, ret.beta = 'r', ret.mcmc = TRUE, usrfun = NULL)
ei.MD.bayes(formula, covariate = NULL, total = NULL, data, lambda1 = 4, lambda2 = 2, covariate.prior.list = NULL, tune.list = NULL, start.list = NULL, sample = 1000, thin = 1, burnin = 1000, verbose = 0, ret.beta = 'r', ret.mcmc = TRUE, usrfun = NULL)
formula |
A formula of the form |
covariate |
An optional formula of the form |
total |
if row and/or column marginals are given as proportions,
|
data |
A data frame containing the variables specified in
|
lambda1 |
The shape parameter for the gamma prior (defaults to 4) |
lambda2 |
The rate parameter for the gamma prior (defaults to 2) |
covariate.prior.list |
a list containing the parameters for normal prior distributions on delta and gamma for model with covariate. See ‘details’ for more information. |
tune.list |
A list containing tuning parameters for each block of
parameters. See ‘details’ for more information. Typically, this
will be a list generated by |
start.list |
A list containing starting values for each block of
parameters. See ‘details’ for more information. The default is
|
sample |
Number of draws to be saved from chain
and returned as output from the function (defaults to 1000). The total
length of the chain is |
thin |
an integer specifying the thinning interval for posterior draws (defaults to 1, but most problems will require a much larger thinning interval). |
burnin |
integer specifying the number of initial iterations to be discarded (defaults to 1000, but most problems will require a longer burnin). |
verbose |
an integer specifying whether the progress of the sampler
is printed to the screen (defaults to 0). If |
ret.beta |
A character indicating how the posterior draws of beta should be
handled: ' |
ret.mcmc |
A logical value indicating how the samples from the posterior
should be returned. If |
usrfun |
the name of an optional a user-defined function to obtain quantities of
interest while drawing from the MCMC chain (defaults to |
ei.MD.bayes
implements a version of the hierarchical
Multinomial-Dirichlet model for ecological inference in tables suggested by Rosen et al. (2001).
Let index rows,
index columns, and
index units. Let
be the
marginal count for column
in unit
and
be the
marginal proportion for row
in unit
. Finally, let
be the proportion of row
in column
for unit
.
The first stage of the model assumes that the vector of column
marginal counts in unit follows a Multinomial distribution of the
form:
The second stage of the model assumes that the vector of
for row
in unit
follows a Dirichlet
distribution with
parameters. The model may be fit with or
without a covariate.
If the model is fit without a covariate, the distribution of the vector
is :
In this case, the prior on each is assumed
to be:
If the model is fit with a covariate, the distribution of the vector
is :
The parameters and
are constrained to be zero for
identification. (In this function, the last column entered in the
formula is so constrained.)
Finally, the prior for is:
while and
are
given improper uniform priors if
covariate.prior.list = NULL
or
have independent normal priors of the form:
If the user wishes to estimate the model with proper normal priors on
and
, a list
with four elements must be provided for
covariate.prior.list
:
mu.delta
an matrix of
prior means for Delta
sigma.delta
an matrix of
prior standard deviations for Delta
mu.gamma
an matrix of
prior means for Gamma
sigma.gamma
an matrix of
prior standard deviations for Gamma
Applying the model without a covariate is most reasonable in situations where one can think of individuals being randomly assigned to units, so that there are no aggregation or contextual effects. When this assumption is not reasonable, including an appropriate covariate may improve inferences; note, however, that there is typically little information in the data about the relationship of any given covariate to the unit parameters, which can lead to extremely slow mixing of the MCMC chains and difficulty in assessing convergence.
Because the conditional distributions are non-standard, draws from the
posterior are obtained by using a Metropolis-within-Gibbs algorithm.
The proposal density for each parameter is a univariate normal
distribution centered at the current parameter value with standard
deviation equal to the tuning constant; the only exception is for
draws of and
, which
use a bivariate normal proposal with covariance zero.
The function will accept user-specified starting values as an argument. If the model includes a covariate, the starting values must be a list with the following elements, in this order:
start.dr
a vector of length of starting values for Dr.
Starting values for Dr must be greater than zero.
start.betas
an by precincts array
of starting values for Beta. Each row of every precinct must sum to 1.
start.gamma
an matrix of starting
values for Gamma. Values in the right-most column must be zero.
start.delta
an matrix of starting
values for Delta. Values in the right-most column must be zero.
If there is no covariate, the starting values must be a list with the following elements:
start.alphas
an matrix of starting values for Alpha. Starting values for Alpha must be greater than zero.
start.betas
an units array of
starting values for Beta. Each row in every unit must sum to 1.
The function will accept user-specified tuning parameters as an argument. The tuning parameters define the standard deviation of the normal distribution used to generate candidate values for each parameter. For the model with a covariate, a bivariate normal distribution is used to generate proposals; the covariance of these normal distributions is fixed at zero. If the model includes a covariate, the tuning parameters must be a list with the following elements, in this order:
tune.dr
a vector of length of tuning parameters for Dr
tune.beta
an by precincts array
of tuning parameters for Beta
tune.gamma
an matrix of tuning
parameters for Gamma
tune.delta
an matrix of tuning
parameters for Delta
If there is no covariate, the tuning parameters are a list with the following elements:
tune.alpha
an matrix of tuning parameters for Alpha
tune.beta
an by precincts array
of tuning parameters for Beta
A list containing
draws |
A list containing samples from the posterior distribution of the parameters. If a covariate is included in the model, the list contains:
If the model is fit without a covariate, the list includes:
|
acc.ratios |
A list containing acceptance ratios for the parameters. If the model includes a covariate, the list includes:
If the model is fit without a covariate , the list includes:
|
usrfun |
Output from the optional |
call |
Call to |
Michael Kellermann <[email protected]> and Olivia Lau <[email protected]>
Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2002. Output Analysis and Diagnostics for MCMC (CODA). https://CRAN.R-project.org/package=coda.
Ori Rosen, Wenxin Jiang, Gary King, and Martin A. Tanner.
2001. “Bayesian and Frequentist Inference for Ecological
Inference: The Case.”
Statistica Neerlandica 55: 134-156.
lambda.MD
, cover.plot
,
density.plot
, tuneMD
,
mergeMD
Estimate an ecological regression using least squares.
ei.reg(formula, data, ...)
ei.reg(formula, data, ...)
formula |
An R formula object of the form |
data |
data frame containing the variables specified in |
... |
Additional arguments passed to |
For , C regressions of the form
c_i ~ cbind(r1, r2, ...)
are performed.
These regressions make use of the accounting identities
and the constancy assumption, that for all
.
The accounting identities include
–defining the population cell fractions
such that
for every
– for
and
– for
and
Then regressing
for
recovers the population parameters
when the
standard linear regression assumptions apply, including
and
for
all
.
A list containing
call |
the call to |
coefficients |
an |
se |
an |
cov.matrices |
A list of the |
Olivia Lau <[email protected]> and Ryan T. Moore <[email protected]>
Leo Goodman. 1953. “Ecological Regressions and the Behavior of Individuals.” American Sociological Review 18:663–664.
Estimate an ecological regression using Bayesian normal regression.
ei.reg.bayes(formula, data, sample = 1000, weights = NULL, truncate=FALSE)
ei.reg.bayes(formula, data, sample = 1000, weights = NULL, truncate=FALSE)
formula |
An R formula object of the form |
data |
data frame containing the variables specified in formula |
sample |
number of draws from the posterior |
weights |
a vector of weights |
truncate |
if TRUE, imposes a proper uniform prior on the unit hypercube for the coefficients; if FALSE, an improper uniform prior is assumed |
For ,
Bayesian regressions
of the form
c_i ~ cbind(r1, r2, ...)
are
performed. See the documentation for ei.reg
for the accounting
identities and constancy assumption underlying this Bayesian linear
model.
The sampling density is given by
The improper prior is .
The proper prior is .
A list containing
call |
the call to |
draws |
A, |
Olivia Lau <[email protected]> and Ryan T. Moore <[email protected]>
Leo Goodman. 1953. “Ecological Regressions and the Behavior of Individuals.” American Sociological Review 18:663–664.
Calculates the population share of row members in a particular column as a proportion of the total number of row members in the selected subset of columns.
lambda.MD(object, columns, ret.mcmc = TRUE)
lambda.MD(object, columns, ret.mcmc = TRUE)
object |
an R object of class |
columns |
a character vector of column names to be included in calculating the shares |
ret.mcmc |
a logical value indicating how the samples from the posterior
should be returned. If |
This function allows users to define subpopulations within the
data and calculate the proportion of individuals within each of the
columns that defines that subpopulation. For example, if the model
includes the groups Democrat, Republican, and Unaffiliated, the
argument columns = c(``Democrat", ``Republican")
will calculate
the two-party shares of Democrats and Republicans for each row.
Returns either a (( * included columns)
samples) matrix as an
mcmc
object or a (
included columns
samples) array.
Michael Kellermann <[email protected]> and Olivia Lau <[email protected]>
Calculates the population share of row members in a particular column
lambda.reg(object, columns)
lambda.reg(object, columns)
object |
An R object of class |
columns |
a character vector of column names to be included in calculating the shares |
Standard errors are calculated using the delta method as implemented in
the library msm
. The arguments passed to
deltamethod
in msm
include
g
a list of transformations of the form ~ x1 / (x1 + x2 +
+ ... + xk)
, ~ x2 / (x1 + x2 + ... + xk)
, etc.. Each
is the estimated proportion of all row members in column
,
mean
the estimated proportions of the row members in the
specified columns, as a proportion of the total number of row
members, .
cov
a diagonal matrix with the estimated variance of each
on the diagonal. Each column
marginal is assumed to be independent, such that the off-diagonal
elements of this matrix are zero. Estimates come from
object$cov.matrices
, the estimated covariance matrix from
the regression of the relevant column. Thus,
cov | = | |
0 | 0 | |
0 | |
0 | |
||
0 | 0 | |
|
||
|
|
|
|
||
Returns a list with the following elements
call |
the call to |
lambda |
an |
se |
standard errors calculated using the delta method as implemented
in the library |
Ryan T. Moore <[email protected]>
Calculates the population share of row members in selected columns
lambda.reg.bayes(object, columns, ret.mcmc = TRUE)
lambda.reg.bayes(object, columns, ret.mcmc = TRUE)
object |
An R object of class |
columns |
a character vector indicating which column marginals to be included in calculating the shares |
ret.mcmc |
If TRUE, posterior shares are returned as an |
If ret.mcmc = TRUE
, draws are returned as an mcmc
object
with dimensions sample . If
ret.mcmc =
FALSE
, draws are returned as an array with dimensions samples array.
Ryan T. Moore <[email protected]>
Allows users to combine output from several chains
output by ei.MD.bayes
mergeMD(list, discard = 0)
mergeMD(list, discard = 0)
list |
A list containing the names of multiple eiMD objects generated from the same model. |
discard |
The number of draws to discard from the beginning of each chain. Default is to retain all draws. |
Returns an eiMD
object of the same format as the input.
Michael Kellermann <[email protected]>
Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2002. Output Analysis and Diagnostics for MCMC (CODA). https://CRAN.R-project.org/package=coda.
Ori Rosen, Wenxin Jiang, Gary King, and Martin A. Tanner.
2001. “Bayesian and Frequentist Inference for Ecological
Inference: The Case.” Statistica
Neerlandica 55: 134-156.
Plots the deterministic bounds on the proportion of row members within a specified column.
## S3 method for class 'bounds' plot(x, row, column, labels = TRUE, order = NULL, intersection = TRUE, xlab, ylab, col = par("fg"), lty = par("lty"), lwd = par("lwd"), ...)
## S3 method for class 'bounds' plot(x, row, column, labels = TRUE, order = NULL, intersection = TRUE, xlab, ylab, col = par("fg"), lty = par("lty"), lwd = par("lwd"), ...)
x |
output from |
row |
a character string specifying the row of interest |
column |
a character string specifying the column of interest |
labels |
a logical toggle specifying whether precinct labels should be printed above interval bounds |
order |
an optional vector of values between 0 and 1 specifying the order (left-to-right) in which interval bounds are plotted |
intersection |
a logical toggle specifying whether the intersection of all plotted bounds (if it exists) should be plotted |
xlab , ylab , ...
|
additional arguments passed to |
col , lty , lwd
|
additional arguments passed to |
A plot with vertical intervals indicating the deterministic bounds on the quantity of interest, and (optionally) a single horizontal interval indicating the intersection of these unit bounds.
Ryan T. Moore <[email protected]>
bounds
In ei.MD.bayes
, users have the option to save parameter
chains for the unit-level betas to disk rather than returning them to
the workspace. This function reconstructs the parameter chains by
reading them back into R and producing either an array or an
mcmc
object.
read.betas(rows, columns, units, dir = NULL, ret.mcmc = TRUE)
read.betas(rows, columns, units, dir = NULL, ret.mcmc = TRUE)
rows |
a character vector of the row marginals to be read back in |
columns |
a character vector of the column marginals to be read back in |
units |
a character of numeric vector with the units to be read back in |
dir |
an optional character string identifying the directory in
which parameter chains are stored (defaults to |
ret.mcmc |
a logical value specifying whether to return the
parameters as an |
If ret.mcmc = TRUE
, an mcmc
object with row names
corresponding to the parameter chains. If ret.mcmc = FALSE
, an
array with dimensions named according to the selected rows
,
columns
, and units
.
Olivia Lau <[email protected]>
ei.MD.bayes
,mcmc
Precinct-level observations for a hypothetical jurisdiction with four proposed districts.
data(redistrict)
data(redistrict)
A table containing 150 observations and 9 variables:
precinct identifier
proposed district number
average age
percent homeowners
number of black voting age persons
number of white voting age persons
number of hispanic voting age persons
total number of voting age persons
Number of votes for the Democratic candidate
Number of votes for the Republican candidate
Number of non voters
Daniel James Greiner
Registration data for White, Black, and Native American voters in eight counties of south-eastern North Carolina in 2001.
data(senc)
data(senc)
A table containing 212 observations and 18 variables:
county name
precinct name
number of registered voters in precinct
number of White registered voters
number of Black registered voters
number of Native American registered voters
number of registered Democrats
number of registered Republicans
number of registered voters without major party affiliation
number of White registered Democrats
number of White registered Republicans
number of White registered voters without major party affiliation
number of Black registered Democrats
number of Black registered Republicans
number of Black registered voters without major party affiliation
number of Native American registered Democrats
number of Native American registered Republicans
number of Native American registered voters without major party affiliation
Excerpted from North Carolina General Assembly 2001 redistricting data, https://www.ncleg.gov/Redistricting/BaseData2001
Tuning parameters for hyperpriors in RxC EI model
data(tuneA)
data(tuneA)
A table containing 3 rows and 3 columns.
A vector containing tuning parameters for the precinct level parameters in the RxC EI model.
data(tuneB)
data(tuneB)
A vector of length 3 x 2 x 150 containing the precinct level tuning parameters for the redistricting sample data.
data(tuneB) tuneB <- array(tuneB[[1]], dim = c(3, 2, 150))
data(tuneB) tuneB <- array(tuneB[[1]], dim = c(3, 2, 150))
An adaptive algorithm to generate tuning parameters for the MCMC
algorithm implemented in ei.MD.bayes
. Since we are
drawing each parameter one at a time, target acceptance rates are between 0.4 to 0.6.
tuneMD(formula, covariate = NULL, data, ntunes = 10, totaldraws = 10000, ...)
tuneMD(formula, covariate = NULL, data, ntunes = 10, totaldraws = 10000, ...)
formula |
A formula of the form |
covariate |
An R formula for the optional covariate in the form
|
data |
data frame containing the variables specified in |
ntunes |
number of times to iterate the tuning algorithm |
totaldraws |
number of iterations for each tuning run |
... |
additional arguments passed to |
A list containing matrices of tuning parameters.
Olivia Lau <[email protected]>