Thursday, April 14, 2011

How should I detect and handle the single PSU in a stratum for NLAAS or CPES Latino groups with Stata?

Analysts who use Stata svy commands might get an error message "missing standard errors because of stratum with single sampling unit" and get no standard errors when Stata encounters a single PSU in a stratum. This situation happens when analysts extract either the Latino or Asian subpopulation from the whole NLAAS data set and only analyze the data based on either separated data set with Stata svy command.

The STATA error message arises because the full sampling error coding for NLAAS is joint for the Latino and Asian samples and several NLAAS sampling error strata include sampling error clusters that only contain Latino or Asian respondents. As a consequence, if the user conditionally restricts the input data for the analysis to only respondents from an NLAAS subpopulation, Stata detects a sampling error stratum in which all cases belong to a single sampling error cluster.

If you are conducting analysis that is restricted to subpopulations of respondents from the full NLAAS or CPES data sets, the following steps are recommended:

First of all, analysts may use the svydes command to show how respondents are distributed to sampling error strata and clusters. The Stata output will illustrate the NLAAS sampling error calculation model and the number of PSUs included in each stratum.

Theoretically, the preferred approach is to perform an unconditional subpopulation analysis based on the full NLAAS data set, which has both Latino and Asian subpopulations. To perform the analysis only for Latinos (or any other ancestry of demographic subpopulation of interest), first create an indicator variable that has a value of "1" for all eligible cases you wish to include in your analysis and a value of "0" for all other NLAAS cases. Then use the STATA subpop() option to restrict your analysis to the chosen subpopulation of cases. This approach results in correct estimates of the subpopulation statistics and the correct sampling error for these estimates. The "single PSU in a stratum" problem should be solved.

By example, to analyze NLAAS data for the Latino subpopulation, the subpop variable (e.g. latinos) should be equal to 1 for Latinos and 0 for Asians. Example syntax for the this analysis is:

svyset SECLUSTR [pweight= NLAASWG] , strata (SESTRAT)
svy, command subpop (latinos):

Although the unconditional approach to subpopulation analysis of the NLAAS/CPES data is the correct method, we recognize that many analysts may be working exclusively with either the NLAAS Asian or Latino data. The unconditional subpopulation analysis method described above requires the analyst to process all NLAAS cases even though statistical analyses are focused only on one of the major subpopulations. In this case analysts may employ and approximate method and use one of following Stata's ad hoc options, singleunit, for dealing with sampling error strata in which the subpopulation occurs in a single sampling error cluster:

  • Singleunit (certainty): it means that the singleton PSUs be treated as certainty PSUs. Certainty PSUs are PSUs that were selected into the sample with a probability of 1 and do not contribute to the standard error.
  • Singleunit (scaled): it gives a scaled version of the certainty option. The scaling factor comes from using the average of the variances from the strata with multiple sampling units for each stratum with one PSU.
  • Singleunit (centered): it centers strata with one sampling unit at the grand mean instead of the stratum mean.

For analyzing the separate Latino data set, the syntax with singleunit (centered) command will be:

svyset SECLUSTR [pweight= NLSWTLA] , strata (SESTRAT) singleunit (centered)

1 comment:

  1. Your website is really cool and this is a great inspiring article.
    Kitchen Faucets