Frequently Asked Questions (FAQ)
What is IPUMS MEPS?
What is in the future for IPUMS MEPS?
How do IPUMS MEPS files differ from the MEPS public use files already in distribution?
How does IPUMS MEPS add value to MEPS data?
Where should a new user start?
How do I get access to IPUMS MEPS data?
What are microdata?
What are "weights"?
What does "universe" mean in the variable descriptions?
Why is a variable from the MEPS that I have worked with before not included in IPUMS MEPS?
Can I combine IPUMS MEPS data with other MEPS variables needed for my research?
How is a record uniquely identified?
Data Limitations and Cautions to Users
What are the major limitations of the data?
Are there aspects of IPUMS MEPS data about which to be particularly careful?
Is help available if I encounter problems using IPUMS MEPS?
How are the data delivered?
What is the data format?
How do I get data from the MEPS data extraction system?
How long does a data extract take?
How does "sample selection" work on the IPUMS MEPS website?
Can I get the original data?
General information about the project
What is IPUMS MEPS? [top]
The IPUMS Medical Expenditure Panel Survey (IPUMS MEPS) is a harmonized set of data and documentation based on material originally included in the public use files of the U.S. Medical Expenditure Panel Survey and distributed for free over the internet. IPUMS MEPS variables are given consistent codes and have been thoroughly documented to facilitate cross-temporal comparisons.
IPUMS MEPS is not a collection of compiled statistics; it is composed of microdata. Each record represents a person, with all characteristics of that person numerically coded. Because the data refer to individuals and not tables, researchers commonly use a statistical package to analyze the records in the IPUMS MEPS database. A data extraction system enables users to select only the survey years and variables they require.
What is in the future for IPUMS MEPS? [top]
The IPUMS MEPS Project is funded by a grant from the National Institute of Child Health and Human Development (NICHD). We currently offer 1,000 annual person-level variables from the 1996-present full-year consolidated files. Future goals are to facilitate easier use of MEPS data longitudinally and explore additional file types.
We hope to continue the project beyond our present five-year funding period, but we will have to secure further funding as our current grant expires. To be successful, we need to demonstrate the existence of a large body of works based on IPUMS MEPS data or documentation. If you use IPUMS MEPS to create educational materials, satisfy a course requirement, or prepare a report, presentation, publication, or thesis, please tell us about it, by adding to our bibliography site.
How do IPUMS MEPS files differ from the MEPS public use files already in distribution? [top]
Public use files for the MEPS are the basis for IPUMS MEPS data. These original public use files also include variables not yet included in the IPUMS MEPS public database. Researchers can access the original files through the MEPS website. Currently, IPUMS MEPS only includes annual variables offered on the person-level Full-Year Consolidated data files.
The IPUMS MEPS Project recodes the original public use data to increase consistency over time. For the most part, IPUMS MEPS does not use the same variable names included in the original public use data; variables have been renamed to increase consistency over time and within subject categories. Detail from the original public use variables is preserved in IPUMS MEPS integrated variables, but codes and value labels are often different in the IPUMS MEPS version of a variable.
How does IPUMS MEPS add value to MEPS data? [top]
IPUMS MEPS includes many variable features not available for the original MEPS public use files. By using the IPUMS MEPS data extraction system, analysts can select the years and variables they are interested in and work with a single dataset, without having to link or combine multiple files. IPUMS MEPS provides on-line documentation that describes variable meaning and addresses comparability issues, along with providing information about years available, universes, codes and frequencies, question wording, and appropriate weights for each included variable.
The large number of topics covered by the MEPS make it difficult for researchers to determine from the public use files which variables are available across time. Changes in variable names, even when the question wording remains the same, pose a further challenge. IPUMS MEPS provides consistent variable names and displays which variables are available, by year and topic, in a user-friendly display on its website. Once researchers identify the years and variables relevant to their research project, they can create a data extract or analyze online the variables they choose.
Where should a new user start? [top]
The natural starting points are the "Select Data" or "Browse and Select Data" links on the top banner and the left navigation bar. These links open the variables page, the primary tool for exploring the contents of IPUMS MEPS. By default, the variables page displays one variable group at a time for all years in the data series. You can filter the information at any point to include only the years of interest to you ("Select samples"). More detailed information on using the variable menu is available.
When you select samples, the page will display only variables present in those survey years. An "x" indicates the availability of a variable for a given year in the current IPUMS MEPS database.
On the variables page, clicking on a variable name brings up the variable's documentation. The information about the variable is contained in a number of tabs. The default tab is the overview of the codes for the variable. These categories can suggest the types of research possible with a given sample. Via the "codes" tab, users can also view the unweighted frequencies for each response category in each year. The "description" tab includes a brief description of the variable as well as additional information on such topics as data collection, definitions, and related variables will display if a user clicks on a "show more" link. By clicking on hyperlinks within a variable description, you can access similar information for closely related variables. The "comparability" tab discusses comparability issues across years. Other tabs report the years the variable is available, the variable universe (i.e., who was asked the question), and the appropriate weight(s) to use.
If you have a specific substantive interest, such as optometrist visits, you may wish to use the "Search Variables" feature on the variables page. Entering a word (such as "optometrists") and hitting the "search" button will bring up a list of all variables that include that term in the variable name, label, and, if you wish, variable descriptions and categories. Thus, for example, searching on "optometrists" brings up a list of variables that are otherwise spread between the two topical groups of "office-based provider expenditures" and "ambulatory care expenditures".
Throughout the variable documentation system, there are buttons to "Add to cart." Any variables you select in this way are put in your data cart to include in a data extract. Your selections only last for the current web session.
The Data Cart in the upper right of the variables page keeps track of your variable and sample selections. Once you have made some selections, you can click on "View Cart" to review your choices. If you have selected variables and samples, you can enter the data extract system. To make a data extract, you must be registered to use IPUMS MEPS data. You may, however, browse the website and explore the steps involved in making a data extract without actually logging in and producing an extract. Instructions for using the extraction system are below.
Before beginning analysis of IPUMS MEPS data, users are advised to review the material in the "User Guide" section. These documents discuss such issues as variance estimation, sample design, and the use of weights. The user notes also provide counts of the number of person and household records in the IPUMS MEPS database for each year.
How do I get access to IPUMS MEPS data? [top]
To get access to the data for downloading a customized data extract, users must agree to specified conditions of responsible use, which are similar to the conditions for using the MEPS public use files.
For purposes of internal recordkeeping, and to provide the IPUMS MEPS staff with a clear sense of the user constituency (to improve outreach and better serve users), registration also requires users to provide some information about themselves, such as their discipline, academic or non-academic status, and institutional affiliation. Registered users are automatically added to the IPUMS MEPS e-mail list and receive occasional newsletters reporting data releases and new website features. To register for access to the data, go to the IPUMS MEPS registration webpage.
What are microdata? [top]
Microdata are composed of individual records containing information collected on persons and households. The unit of observation is the individual. The responses of each person to the different survey questions are recorded in separate variables.
Microdata stand in contrast to more familiar "summary" or "aggregate" data. Aggregate data are compiled statistics, such as a table of marital status by sex for some locality. There are no such tabular or summary statistics in the IPUMS MEPS data.
Microdata are flexible. One need not depend on published statistics from a survey that compiled the data in a certain way, if at all. Users can generate their own statistics from the data in any manner desired, including performing individual-level multivariate analyses.
See an image of microdata here. All IPUMS microdata are in this general format.
What are "weights"? [top]
MEPS data are collected through a complex stratified sampling scheme that includes oversampling of some population subgroups. This means that persons and households with some characteristics are over-represented in the samples, while others are underrepresented. To obtain representative statistics using IPUMS MEPS data, users must apply weights.
IPUMS MEPS contains several weights. Which weight to use depends on the unit of analysis (household or person) and the sampling approach for the variable(s) in question (e.g., all persons versus a sample adult or sample child drawn from each family).
Each variable description contains a tab specifying the weight that should be used with that variable in each year, if that variable were analyzed in isolation. If multiple variables using different sampling strategies and weights are combined in one table or in a multivariate analysis, then the weight employed should fit the variable with the most restrictive sampling scheme.
For more information about the use of weights with IPUMS MEPS data, consult the User Note on Weights.
What does "universe" mean in the variable descriptions? [top]
The universe is the population at risk of having a response for the variable in question. In most cases, these are the households or persons to whom the survey question was asked, as reflected on the survey questionnaire. For example, employment variables do not include children, since the MEPS does not ask children under the age of 16 about employment.
Cases that are outside of the universe for a variable are labeled "NIU" (Not In Universe) on the codes page. A change in a variable's universe across years is a common data comparability issue.
Why is a variable from the MEPS that I have worked with before not included in IPUMS MEPS? [top]
As of the spring of 2019, IPUMS MEPS includes more than 1,400 integrated variables from Full Year Consolidated files covering the period 1996-2016. During the remaining years of the current IPUMS MEPS grant period, we anticipate delivering all round-level variables from the 1996-present Full Year Consolidated files.
In a few cases, the MEPS survey contained questions about topics not included in the original MEPS public use files. The survey responses may never have been processed, or the responses may be included only as part of a composite recoded variable, or the variable may have been left out of the public use files due to confidentiality concerns. Because the MEPS public use files are the raw material used to create the IPUMS MEPS database, variables missing from the public use files are missing from IPUMS MEPS.
Can I combine IPUMS MEPS data with other MEPS variables needed for my research? [top]
Interested users can combine variables from IPUMS MEPS and MEPS public use files. Variables from the original MEPS public use files (that are not yet in the IPUMS MEPS system) can be linked to an IPUMS MEPS data extract. IPUMS MEPS has created linking keys from the series of original MEPS variables that are used to uniquely identify persons (MEPSPID). MEPSID is an IPUMS MEPS variable that combines the original MEPS values for DUID, PID, and PANEL to create a single unique identifier. Analysts interested in linking IPUMS MEPS and other MEPS variables may use these unique identifiers as linking keys.
How is a record uniquely identified? [top]
Three variables constitute a unique identifier for each person record in MEPS: DUID, PID, and PANEL (dwelling unit identifier, person identifier, and panel number). These variables are all available in IPUMS MEPS, but the IPUMS MEPS variable MEPSID combines them into a single unique identifier variable. These unique identifiers can be used as linking keys to merge variables from the MEPS public use files to IPUMS MEPS data.
Data Limitations and Cautions to Users
What are the major limitations of the data? [top]
The data consist entirely of records for individual persons from the public use files of the MEPS. IPUMS MEPS does not deliver aggregate or published statistics from the survey. Researchers interested in aggregate data will find it on the Agency for Health Care Research and Quality (AHRQ) MEPS website.
The number of persons and households in the survey varies from year to year, but, on average, the survey covers about 30,000 persons in about 12,000 households each year. Exact figures on the number of households and persons included in IPUMS MEPS in each year are available in the user guide note on sample sizes. To achieve adequate sample sizes for some subgroups, researchers may wish to combine data from two or more survey years.
Because the MEPS data are public-use, measures have been taken to assure confidentiality. Names and other identifying information are suppressed. You cannot find specific individuals in the IPUMS MEPS data or use these data for genealogical research. Moreover, because the MEPS uses population samples to generate the data, there is no guarantee that any given individual will be in the dataset. Finally, the registration form requires potential users to commit to using the data responsibly, including utilizing the data for statistical reporting and analysis only and making no effort to identify particular individuals in the data.
Geographic detail in the MEPS public use files and thus in IPUMS MEPS is limited to the identification of census regions. Researchers can access more geographic detail and add it to an IPUMS MEPS data extract by working with the staff of AHRQ Data Center. If your research proposal is approved, you can access restricted data (including geographic identifiers) through on-site analysis at the AHRQ Data Center in Rockville, MD or at a Census Restricted Data Center, via remote access.
Are there aspects of IPUMS MEPS data about which to be particularly careful? [top]
IPUMS MEPS is an integrated dataset based on the MEPS public use files. However, IPUMS MEPS coding schemes follow different conventions than MEPS in many instances. For example, MEPS uses the convention of 1 = Yes and 2 = No, while IPUMS MEPS uses the convention 1 = No and 2 = Yes. To pick another example, values of "-1" in the original MEPS public use data files are converted to non-negative values (usually beginning with a 0 or a 9, to indicate "not in universe" cases) in IPUMS MEPS. Moreover, to achieve comparably coded variables over time, IPUMS MEPS has recoded most variables from the original MEPS coding schemes. Users are strongly urged to review the IPUMS MEPS documentation carefully and to not assume that variable values will be coded the same in IPUMS MEPS as they were in the MEPS files.
The MEPS uses a complex sampling scheme, so all IPUMS MEPS samples are weighted. Put another way, individuals in the data do not all represent an identical number of persons in the population in a given year. It is therefore necessary to use the appropriate weight variables when analyzing these samples. A user guide information on sampling weights discusses the proper use of weights with IPUMS MEPS data. In addition, the "Weights" section at the top of each variable description specifies the suggested IPUMS MEPS weight to use with that variable, by year.
The MEPS does not contain the full universe of persons in the U.S. population. Rather, the survey samples the civilian non-institutionalized population and thus excludes such persons as residents of nursing homes and members of the armed forces living in barracks. A user guide note on sample design contains information about the MEPS sampling scheme and changes in MEPS sampling over time. A user guide note on analysis and variance estimation discusses appropriate practices using IPUMS MEPS data.
It is important to examine the documentation for the variables you are using. The codes and labels for variable categories do not tell the whole story. Two features of the variable documentation merit special attention. First, examine the universe for a variable (the population at risk of answering the question), which can differ subtly or markedly across years. Second, read the comparability discussions for the variables in which you are interested.
Is help available if I encounter problems using IPUMS MEPS? [top]
Users who encounter problems with the IPUMS MEPS extract system, data, or documentation can e-mail firstname.lastname@example.org to receive assistance. The IPUMS MEPS Project staff also welcomes feedback from users who encounter errors, inconsistencies, or lack of clarity in the data and documentation. Users who contact us with information about a legitimate and substantial error in the data or documentation will be sent a complimentary IPUMS mug.
How are the data delivered? [top]
IPUMS MEPS data are delivered through our data extraction system. Users select the variables and years they are interested in, and the system creates a custom-made extract containing only this information. The system will pool data from multiple survey years into a single data file.
Data are generated on our server. The system sends out an email message to the user when the extract is completed. The user must download the extract and analyze it on a local machine. Instructions for downloading and reading the data are available here. Users must login as an IPUMS MEPS data user (with your e-mail address and password) or to register as an IPUMS MEPS data user to create and access data extracts.
Instructions for using the extraction system are below.
What is the data format? [top]
IPUMS MEPS produces fixed-column ASCII data.
In addition to the ASCII data file, the system creates a statistical package syntax file to accompany each extract. The syntax file is designed to read in the ASCII data while applying appropriate variable and value labels. The statistical packages Stata, SAS, and SPSS are supported. You must download the syntax file with the extract, or you will be unable to read the data. The syntax file requires minor editing to identify the location of the data file on your local computer; directions regarding these minor edits are available here. You may also choose to download your data as a pre-formatted file ready for use in Stata, SAS, or SPSS, or in CSV format.
A codebook file is also created with each extract. This codebook file records the characteristics of your extract and should be downloaded for recordkeeping.
All data files are created in gzip compressed format. You must decompress the file to analyze it. Most data decompression utilities will handle the files. For example, if you are using Windows Vista or 7, right-click the file name and choose the option to "Extract all." Among the available free software for decompressing files are WinGzip (for Windows) and MacGZIP (for Macs).
IPUMS MEPS offers rectangular and hierarchical data files through the extract system.
The IPUMS MEPS data access system delivers rectangular data at the person level by default. Only person records (selected through the "Annual" drop-down menu on the main variable selection page of the extract builder) are included in these extracts.
IPUMS MEPS also delivers data files rectangularized at the round level in which person information is repeated on each individual's round records. With rectangularization to the round level, there are no separate person-level records in the data extract. No information is lost.
The rectangular person format default can, however, be overridden to yield hierarchical data consisting of person records followed by activity records of each person. Users who request hierarchical data will need to select variables for inclusion in their extracts from the round records. This is done from the same variable selection page in the extract builder as the selection of annual (person-level) variables.
How do I get data from the MEPS data extraction system? [top]
The data extraction system is a flexible tool. There is no need to download variables or survey years you do not expect to use for your current analysis.
Browsing and selecting data
You begin by browsing and selecting data, which you can do by clicking "Select Data" from the top menu or "Browse and Select Data" from the left menu. This takes you to the variables page, where you can browse or search variables and limit your view to certain samples. Select variables by checking the boxes next to each variable's name, or select samples by clicking the grey "Select Samples" button at the top of the screen. When you are done, click the green "VIEW CART" button at the top right corner of your screen. This takes you to your Data Cart.
The data cart
Your data cart provides you with the list of variables you just selected (together with certain variables that are automatically included with all extracts). From here you can remove variables from your data cart, return to the variables page to add more variables, or add/change your sample selections. When you are ready to make an extract, click the green "CREATE DATA EXTRACT" button. This takes you to the extract system.
The extract system
On the first extract page, you will see a summary of your data extract, including the number of variables, samples, estimated size, and data format. Note that some variables are preselected for you. The data extract system automatically supplies variables that indicate the sample year (YEAR) and panel (PANEL), are needed for variance estimation (PSU and STRATA), uniquely identify records (MEPSID), and are used for weighting the variables and years selected. On this page, you may also select data quality flags if they are available for any variables in your cart.
On the first page, you may also include a brief summary of your extract; these are useful if you want to modify or resubmit your extract in the future. The system records every extract you make. You can reload and modify an old extract, dropping or adding variables or survey years. To explore past extracts, go to the Download or Revise Extracts page and click on the "Revise" link.
After selecting data quality flags and entering a description of your extract, you will be prompted for your e-mail address (which provides us with a means of contacting you) and your password. To use the extract system, users must register. Users are automatically granted access create data extracts when they create an account and agree to all conditions for use.
How long does a data extract take? [top]
The time needed to make an extract differs, depending on the number and size of samples requested and the load on our server. Creating an extract generally takes only a few minutes. The system sends an email upon completion of the extract, so there is no need to stay active on the IPUMS MEPS site during the creation of the extract.
How does "sample selection" work on the IPUMS MEPS website? [top]
When a user first enters the variable documentation system, data samples from all years are selected by default. Every variable in the system will display on relevant screens.
Users can filter the information displayed by selecting only the sample years of interest to them. Only the variables available in the selected sample years will then appear in the variable lists. The integrated variable descriptions and codes pages will be filtered to display only the linked survey text and codes and frequencies corresponding to the selected samples. Sample selections can be altered at any time in your session. Selections do not persist beyond the current session.
Can I get the original data? [top]
As noted, the raw material for the IPUMS MEPS database comes from the MEPS public use files provided by the Agency for Health Care Research and Quality. These original data are available on the AHRQ MEPS data and documentation page. The AHRQ's MEPS public use files also include variables and supplements not yet covered by IPUMS MEPS.