# Analysis and Variance Estimation with MEPS

The Household Component of the Medical Expenditure Panel Survey (hereafter, MEPS) is a complex, multistage probability sample that incorporates stratification, clustering, and oversampling of some subpopulations (Black, Hispanic, and Asian) in some years. For more information about the MEPS sample design, users are advised to review the user note on SAMPLE DESIGN. Because of the complex sample design, users of MEPS data must make use of sampling weights to produce representative estimates. Analysts are advised to review the user note on SAMPLING WEIGHTS with MEPS data for additional information on the use of weights and on the different weights available.

While appropriate use of sampling weights will produce correct point estimates (e.g., means, proportions), statistical techniques that account for the complex sample design are also necessary to produce correct standard errors and statistical tests. Specifically, variables to account for the complex sample design (**STRATANN**/**STRATAPLD** and **PSUANN**/**PSUPLD**) are available in the MEPS dataset and must be used to obtain appropriate variance estimates (standard errors) when computing annual estimates, pooled estimates, or multivariate estimates.

### MEPS Technical Variables for Analysis and Variance Estimation

Three technical variables are needed for analysis of the MEPS data:- A sampling weight (i.e.,
**PERWEIGHT**,**SAQWEIGHT**, or**DIABWEIGHT**) must be chosen, based on the sampling universe of the variables. The sampling weight represents the inverse probability of selection into the sample, with adjustment for non-response, as well as post-stratification adjustments for age, race/ethnicity, and sex. Analysts should review variable descriptions of the variables of interest and the user note on SAMPLING WEIGHTS for more information about which weight to use. **STRATANN**is an integrated variable that represents the impact of the sample design stratification on the estimates of variance and standard errors; it is appropriate for generating annual estimates for all years. Prior to 2002, MEPS variance strata were developed independently from year to year. Beginning in 2002, the MEPS variance strata were developed to remain constant within sample design periods, each of which corresponds to the introduction of a new sample design period in NHIS the year prior. STRATANN is constant in 2002-2007 (through panel 11) and 2007-forward (for panels 12 and later). For pooled analyses including years before 2002, IPUMS offers the pooled variance sampling strata variable STRATAPLD. Analyses limited to the years 2002 and later may use STRATANN for annual and pooled analyses.**PSUANN**is an integrated variable that represents the impact of the sample design clustering on the estimates of variance and standard errors; it is appropriate for generating annual estimates for all years. Prior to 2002, MEPS variance primary sampling units were developed independently from year to year. Beginning in 2002, the MEPS variance primary sampling units remain constant within sample design periods, each of which corresponds to the introduction of a new sample design period in NHIS the year prior. PSUANN is constant in 2002-2007 (through panel 11) and 2007-forward (for panels 12 and later). For pooled analyses including years before 2002, IPUMS offers the pooled primary sampling unit variable PSUPLD. Analyses limited to the years 2002 and later may use PSUANN for annual and pooled analyses.

### General Syntax to Account for Sample Design

The following general syntax will allow users to account for sampling weights and design variables when using STATA, SAS, or SAS-callable SUDAAN to estimate, for example, means using MEPS data. This example uses PERWEIGHT, STRATANN and PSUANN; analysts should substitute alternate variance and weight variables as appropriate for their analyses.##### STATA:

```
svyset psuann [pweight=perweight],strata(stratann)
svy: mean var1
```

##### SAS:

```
proc sort data = datasetname;
by stratann psuann;
run;
proc surveymeans data = datasetname;
weight perweight;
strata stratann;
cluster psuann;
var var1;
run;
```

##### SAS-Callable SUDAAN:

` proc sort data = `*datasetname*;
by *stratann* *psuann*;
run;
proc descript data = *datasetname* filetype = sas design = wr;
nest *stratann* *psuann*;
weight *perweight*;
var *var1*;
print nsum wsum mean semean / nohead;
run;

### Subsetting IPUMS MEPS Data

Often, analysts are interested in restricting analyses to a specific population (e.g., women ages 18 to 64 or Hispanic/Latino persons). In these situations, many analysts will then either exclude all other cases in the database or use an if-statement during analysis. While correct point estimates will still be produced if the remaining cases are properly weighted, standard errors may be incorrectly computed.

If the analyst is interested in a specific subpopulation, it is necessary to use analytic techniques that do not compromise the sample design information. Specifically, it is typically recommended to use the full database with a statistical package (such as STATA, SAS, or SUDAAN) that can accommodate subpopulation analysis.

### Syntax for Subpopulation Analysis

The following syntax demonstrates, generally, how an analyst can conduct subpopulation analysis using IPUMS MEPS data without compromising the design structure of the data. This approach has the effect of producing estimates for the population of interest, while incorporating the full sample design information for variance estimation.

#### STATA

```
svyset psuann [pweight=perweight], strata(stratann)
svy, subpop(if age >= 65): mean var1
```

#### SAS

```
subpopvar = 1 if age ge 65;
else subpopvar = 0;
proc sort data = datasetname;
by stratann psuann;
run;
proc surveymeans data = datasetname;
weight perweight;
strata stratann;
cluster psuann;
domain subpopvar;
var var1;
run;
```

#### SAS-callable SUDAAN

```
proc sort data = datasetname;
by stratann psuann;
run;
proc descript data = datasetname filetype = sas design = wr;
nest stratann psuann;
weight perweight;
subpopn age >= 65/NAME = "Population 65 years and older";
print nsum wsum mean semean / nohead;
run;
```

### References

Agency for Healthcare Research and Quality. (2016). MEPS HC-171 2014 Full Year Consolidated Data File. http://meps.ahrq.gov/data_stats/download_data/pufs/h171/h171doc.pdf"