Analysis and Variance Estimation with MEPS
The Household Component of the Medical Expenditure Panel Survey (hereafter, MEPS) is a complex, multistage probability sample that incorporates stratification, clustering, and oversampling of some subpopulations (Black, Hispanic, and Asian) in some years. For more information about the MEPS sample design, users are advised to review the user note on MEPS sample design. Because of the complex sample design, users of MEPS data must make use of sampling weights to produce representative estimates. Analysts are advised to review the user note on sampling weights with MEPS data for additional information on the use of weights and on the different weights available.
While appropriate use of sampling weights will produce correct point estimates (e.g., means, proportions), statistical techniques that account for the complex sample design are also necessary to produce correct standard errors and statistical tests. Specifically, variables to account for the complex sample design (STRATANN/STRATAPLD and PSUANN/PSUPLD) are available in the MEPS dataset and must be used to obtain appropriate variance estimates (standard errors) when computing annual estimates, pooled estimates, or multivariate estimates.
MEPS Technical Variables for Analysis and Variance Estimation
Three technical variables are needed for analysis of the MEPS data:- A sampling weight (e.g., PERWEIGHT, SAQWEIGHT, or DIABWEIGHT) must be chosen, based on the sampling universe of the variables. The sampling weight represents the inverse probability of selection into the sample, with adjustment for non-response, as well as post-stratification adjustments for age, race/ethnicity, and sex. Analysts should review variable descriptions of the variables of interest and the user note on SAMPLING WEIGHTS for more information about which weight to use. For analyses of pooled data, divide the sampling weight by the number of samples pooled together so that population estimates based on the pooled data will generalize to the average annual population over the pooled period.
- STRATANN is an integrated variable that represents the impact of the sample design stratification on the estimates of variance and standard errors; it is appropriate for generating annual estimates for all years. For pooled analyses, analysts should use the pooled variance sampling strata variable, STRATAPLD.
- PSUANN is an integrated variable that represents the impact of the sample design clustering on the estimates of variance and standard errors; it is appropriate for generating annual estimates for all years. Prior to 2002, MEPS variance primary sampling units were developed independently from year to year. For pooled analyses, analysts should use the pooled variance sampling PSU variable, PSUPLD.
Guidance for Analysis of Longitudinal MEPS Data
The original MEPS longitudinal panel files include longitudinal weights to support longitudinal analyses of the data. The MEPS longitudinal weights can be used to generate nationally-representative estimates of person-level change over the time period covered by the panel (e.g., from 2017 to 2018) or cross-sectional estimates for the two-year period covered by the panel (e.g., 2017-2018) or for each of the years represented in the panel individually (e.g., 2017 or 2018). MEPS includes two longitudinal sampling weights:- LONGWT: A longitudinal weight intended for use with most longitudinal variables derived from the MEPS household interview, generalizable to the civilian, non-institutionalized population present for the period of time covered by the longitudinal panel.
- LSAQWT: A longitudinal weight intended for use with longitudinal variables derived from the Self-Administered Questionnaire or the Preventive Self-Administered Questionnaire, generalizable to civilian, non-institutionalized adults 18 and older present for the period of time covered by the longitudinal panel.
There are no longitudinal weights for the Diabetes Care Supplement or longitudinal family-level weights.
IPUMS MEPS does not currently offer integrated versions of the MEPS longitudinal sampling weights. However, data users may find it useful to merge the IPUMS MEPS data with the longitudinal MEPS data in order to capitalize on features of the IPUMS MEPS data such as consistent variable names and codes or event summary variables generated by the IPUMS MEPS Variable Builder.
Users who wish to merge IPUMS MEPS data to the longitudinal weights and other information from the MEPS longitudinal panel files can refer to our user note on linking IPUMS and original MEPS data for guidance and sample code, specifically, the section on linking person-level data. Note that not all records will link because the longitudinal panel datasets exclude persons who were not key and in-scope but these persons are included in the annual full-year consolidated files that provide the bulk of the integrated data currently available through IPUMS MEPS (see our glossary of key MEPS concepts and terms). In addition to the two longitudinal weights, users may find it useful to include variables useful for sample selection, such as ALL5RD (an indicator of whether a person was in-scope and data was collected in all 5 MEPS interview rounds), YEARIND (an indicator of whether the person was in both years or in one year or the other), and DIED (an indicator of whether the person died during the MEPS panel).
NOTE: The longitudinal panel files include survey design variables corresponding to strata and primary sampling unit (PSU) that are identical to the ones included in the annual files. These variables, STRATANN and PSUANN can be downloaded from the IPUMS MEPS website (and are automatically included in every extract). When analyzing a data set made up of multiple longitudinal panels pooled together, please use the pooled strata and PSU variables, STRATAPLD and PSUPLD, which are also automatically included in every IPUMS MEPS extract. Additionally, users analyzing several panels pooled together are advised to divide LONGWT or LSAQWT by the number of panels pooled together to generate average annual or biannual, rather than cumulative, estimates.
General Syntax to Account for Sample Design
The following general syntax will allow users to account for sampling weights and design variables when using STATA, SAS, or SAS-callable SUDAAN to estimate, for example, means using MEPS data. This example uses PERWEIGHT, STRATANN and PSUANN; analysts should substitute alternate variance and weight variables as appropriate for their analyses.STATA:
svyset psuann [pweight=perweight],strata(stratann)
svy: mean var1
SAS:
proc sort data = datasetname;
by stratann psuann;
run;
proc surveymeans data = datasetname;
weight perweight;
strata stratann;
cluster psuann;
var var1;
run;
SAS-Callable SUDAAN:
proc sort data = datasetname;
by stratann psuann;
run;
proc descript data = datasetname filetype = sas design = wr;
nest stratann psuann;
weight perweight;
var var1;
print nsum wsum mean semean / nohead;
run;
Subsetting IPUMS MEPS Data
Often, analysts are interested in restricting analyses to a specific population (e.g., women ages 18 to 64 or Hispanic/Latino persons). In these situations, many analysts will then either exclude all other cases in the database or use an if-statement during analysis. While correct point estimates will still be produced if the remaining cases are properly weighted, standard errors may be incorrectly computed.
If the analyst is interested in a specific subpopulation, it is necessary to use analytic techniques that do not compromise the sample design information. Specifically, it is typically recommended to use the full database with a statistical package (such as STATA, SAS, or SUDAAN) that can accommodate subpopulation analysis.
Syntax for Subpopulation Analysis
The following syntax demonstrates, generally, how an analyst can conduct subpopulation analysis using IPUMS MEPS data without compromising the design structure of the data. This approach has the effect of producing estimates for the population of interest, while incorporating the full sample design information for variance estimation.
STATA
svyset psuann [pweight=perweight], strata(stratann)
svy, subpop(if age >= 65): mean var1
SAS
subpopvar = 1 if age ge 65;
else subpopvar = 0;
proc sort data = datasetname;
by stratann psuann;
run;
proc surveymeans data = datasetname;
weight perweight;
strata stratann;
cluster psuann;
domain subpopvar;
var var1;
run;
SAS-callable SUDAAN
proc sort data = datasetname;
by stratann psuann;
run;
proc descript data = datasetname filetype = sas design = wr;
nest stratann psuann;
weight perweight;
subpopn age >= 65/NAME = "Population 65 years and older";
print nsum wsum mean semean / nohead;
run;
References
Agency for Healthcare Research and Quality. (2016). MEPS HC-171 2014 Full Year Consolidated Data File. http://meps.ahrq.gov/data_stats/download_data/pufs/h171/h171doc.pdf"