Tuesday, April 17, 2012

How should I handle no subpopulation (Asian or Latino) members in a stratum for only NLAAS Asian or Latino groups with Stata?



In FAQ #67 we discussed "How should I detect and handle the single PSU in a stratum for NLAAS or CPES Latino groups with Stata?" This FAQ is going to introduce how we should handle no subpopulation (Asian or Latino) members in a stratum for NLAAS Asian or Latino groups and perform an unconditional subpopulation analysis based on the full NLAAS data set.
  • Step 1: Create a copy of the race/ethnicity variable, and set values for the groups not in your subpopulation equal to one of the ethnicity values for groups in your subpopulation. 
  • Step 2: As FAQ #76 suggested, to perform the analysis only for Latinos (or any other ancestry of demographic subpopulation of interest), first create an indicator variable that has a value of "1" for all eligible cases you wish to include in your analysis and a value of "0" for all other NLAAS cases. 
  • Step 3: Use the STATA subpop() option to restrict your analysis to the chosen subpopulation of cases. 
Here we provide a case study along with two common questions/problems analysts usually need to deal with:

Let's say we would like to test if there is a relationship between races/ancestries and gender for NLAAS Latino groups. The we can generate the cross-table of RANCEST and Sex variables and compute chi-square statistics for NLAAS Latino groups. However, there are no Latino groups in 12 strata so chi-square statistics will not be computed. 

  • Step 1: Create a new RANCEST2 variable, recoded all Asian groups’ values (RANCEST=1, 2, 3, or 4) as the same as Cuban group (RANCEST=5). 
  • Step 2: Create an indicator variable LATINO  hat has a value of "1" for all Latino groups a value of "0" for all other NLAAS cases.
  • Step 3: Use the STATA subpop (LATINO) option with corresponding cluster, strata, and weight variables.

. svyset SECLUSTR [pweight=NLAASWGT], strata(SESTRAT)

      pweight: NLAASWGT
          VCE: linearized
  Single unit: missing
     Strata 1: SESTRAT
         SU 1: SECLUSTR
        FPC 1: <zero>

. generate RANCEST2 = 0

. replace RANCEST2 = 5 if RANCEST<=5
(2672 real changes made)

. replace RANCEST2  = 6 if RANCEST==6
(495 real changes made)

. replace RANCEST2  = 7 if RANCEST==7
(868 real changes made)

. replace RANCEST2  = 8 if RANCEST==8
(614 real changes made)

. generate LATINO = 0

. replace LATINO = 1 if RANCEST>=5
(2554 real changes made)

. svy, subpop (LATINO): tab RANCEST2 SEX
(running tabulate on estimation sample)

Number of strata   =        57                  Number of obs      =      3956
Number of PSUs     =       114                  Population size    =  27942479
                                                Subpop. no. of obs =      2554
                                                Subpop. size       =  21654900
                                                Design df          =        57

----------------------------------
          |          Sex         
 RANCEST2 |   MALE  FEMALE   Total
----------+-----------------------
        5 |  .0243   .0219   .0463
        6 |  .0489   .0516   .1005
        7 |  .3052   .2611   .5663
        8 |  .1366   .1503   .2869
          |
    Total |   .515    .485       1
----------------------------------
  Key:  cell proportions

  Pearson:
    Uncorrected   chi2(3)         =   13.3482
    Design-based  F(2.23, 126.94) =    4.3779     P = 0.0117

Note: 12 strata omitted because they contain no subpopulation members.

Below is the output of analyses if we skip the Step 1. 

You can see that the chi-square statistics was not computed because 12 strata contained no subpopulation (Latino) members.

. svyset SECLUSTR [pweight=NLAASWGT], strata(SESTRAT)

      pweight: NLAASWGT
          VCE: linearized
  Single unit: missing
     Strata 1: SESTRAT
         SU 1: SECLUSTR
        FPC 1: <zero>

. generate LATINO = 0

. replace LATINO = 1 if RANCEST>=5
(2554 real changes made)

. svy, subpop (LATINO): tab RANCEST SEX
(running tabulate on estimation sample)

Number of strata   =        57                  Number of obs      =      3956
Number of PSUs     =       114                  Population size    =  27942479
                                                Subpop. no. of obs =      2554
                                                Subpop. size       =  21654900
                                                Design df          =        57

----------------------------------
Race/Ance |          Sex         
stry      |   MALE  FEMALE   Total
----------+-----------------------
VIETNAME |      0       0       0
FILIPINO |      0       0       0
  CHINESE |      0       0       0
ALL OTHE |      0       0       0
    CUBAN |  .0243   .0219   .0463
PUERTO R |  .0489   .0516   .1005
  MEXICAN |  .3052   .2611   .5663
ALL OTHE |  .1366   .1503   .2869
          |
    Total |   .515    .485       1
----------------------------------
  Key:  cell proportions

  Table contains a zero in the marginals.
  Statistics cannot be computed.

Note: 12 strata omitted because they contain no subpopulation members.

Below is the output if you conduct analyses on only NLAAS Latino groups with corresponding cluster, strata, and weight variables after dropping all Asian groups from the data set. 

You will see distributions among different race/ancestry and gender groups and chi-square statistics computed along using the Latino-specific weight variable (NLSWTLAT) are different from the case study we shown earlier. These differences are due to the fact that we used different weights in the two approaches. The overall NLAAS weight (NLAASWGT) adjusts the sample to a different population than the Latino-specific weight. 

We recommend that you should NEVER simply delete cases that are not in a particular subpopulation. After you create your indicator variable (see Step 2), you should always use the subpop() option rather than dropping cases or using if modifiers.

. drop if NLSWTLAT==.
(2095 observations deleted)

. svyset SECLUSTR [pweight=NLSWTLAT], strata(SESTRAT)

      pweight: NLSWTLAT
          VCE: linearized
  Single unit: centered
     Strata 1: SESTRAT
         SU 1: SECLUSTR
        FPC 1: <zero>

. svy: tab RANCEST SEX
(running tabulate on estimation sample)

Number of strata   =        57                  Number of obs      =      2554
Number of PSUs     =       110                  Population size    =  21654900
                                                Design df          =        53

----------------------------------
Race/Ance |          Sex          
stry      |   MALE  FEMALE   Total
----------+-----------------------
    CUBAN |  .0238   .0224   .0463
PUERTO R |  .0517   .0487   .1005
  MEXICAN |  .2917   .2746   .5663
ALL OTHE |  .1478   .1391   .2869
          |
    Total |   .515    .485       1
----------------------------------
  Key:  cell proportions

  Pearson:
    Uncorrected   chi2(3)         =    0.0000
    Design-based  F(2.30, 122.11) =    0.0000     P = 1.0000

CPES Team

No comments:

Post a Comment