User Notes
Changes to MEPS Missing Data Codes
In the MEPS public use data files, specific codes are used to indicate missing data, with each code indicating a different reason for missingness. This user note refers to the missing data codes from the original data files as "MEPS Codes" and the data offered by IPUMS as "IPUMS MEPS." The table below shows the codes that are used in the original MEPS data, their definitions, the years in which the code was used, and the IPUMS MEPS missing data label that is usually mapped to that MEPS missing data code. In the IPUMS MEPS data, the numeric code for missing data varies depending on the variable so labels are used in this user note when referring to missing data codes in the IPUMS MEPS data.
The original MEPS data code of -1 "Inapplicable" is recoded and assigned a value of "Not in Universe," or NIU, in the IPUMS MEPS data for all years. The original MEPS data code of -2 "Determined in previous round" is recoded and assigned a value of "Determined in previous round" in the IPUMS MEPS data for all years. The original MEPS data codes -3 "No data in round" (only used in 1996), -9 "Not ascertained" (used from 1996-2017), -13 "Initial wage imputed" (used from 2004-present), and -15 "Cannot be computed" (used from 2018-present) are recoded and assigned a value of "Unknown-not ascertained" in the IPUMS MEPS data. The original MEPS data codes of -7 "Refused" and -8 "DK" are recoded and assigned values of "Unknown-refused" and "Unknown-don't know", respectively, in the IPUMS MEPS data for all years. The original MEPS data code of -10 represents a top-code that changes each year for hourly wage data. Users who are interested in how -10 codes from the original MEPS data are recoded in the IPUMS MEPS data should read the documentation for WAGEHRRD and WAGENEWRD.
MEPS Code and Label | Definition | Years Used | IPUMS Label |
---|---|---|---|
-1 (INAPPLICABLE) |
Question was not asked due to skip pattern | 1996 forward | NIU (Not in universe) |
-2 (DETERMINED IN PREVIOUS ROUND) |
Question was not asked in round because there was no change in current main job since previous round | 1996 forward | Determined in previous round |
-3 (NO DATA IN ROUND) |
Person has no data in round | 1996 | Unknown-not ascertained |
-7 (REFUSED) |
Question was asked and respondent refused to answer question | 1996 forward | Unknown-refused |
-8 (DK) |
Question was asked and respondent did not know answer or the information could not be ascertained | 1996 forward | Unknown-don't know |
-9 (NOT ASCERTAINED) |
Interviewer did not record the data | 1996 - 2017 | Unknown-not ascertained |
-10 (HOURLY WAGE >= $XX.XX1) |
Hourly wage was top-coded for confidentiality | 1997 forward | See WAGEHRRD and WAGENEWRD |
-13 (INITIAL WAGE IMPUTED) |
Hourly wage was previously imputed so an updated wage is not included in this file | 2004 forward | Unknown-not ascertained |
-15 (CANNOT BE COMPUTED) |
Value cannot be derived from data | 2018 forward | Unknown-not ascertained |
1The top-code for the hourly wage changes each year. |
Introduction and Handling of -15 CANNOT BE COMPUTED Codes
In 2018 the design of the MEPS data collection instrument changed. As part of this change the missing data code -9 (NOT ASCERTAINED) was removed and the missing data code -15 (CANNOT BE COMPUTED) was introduced. The main takeaways for IPUMS MEPS users is that the meanings of "Unknown-don't know" and "Unknown-not ascertained" changed slightly in 2018 compared to earlier years and that unless you are analyzing missing data, this change is unlikely to affect your analysis. The most noticeable result of this change will be that frequencies of IPUMS "Unknown-don't know" codes will be higher in 2018 forward than in years prior to 2018 for some variables.
"Unknown-not ascertained" codes were used for a variety of reasons prior to 2018 (see Table 2 for reasons of why "Unknown-not ascertained" was used and an example variable). In years prior to 2018, "Unknown-not ascertained" was used to ensure confidentiality, indicate skipped questions on pencil-and-paper questionnaires such as the Self-Administered Questionnaire (SAQ), indicate a value could not be assigned for constructed variables due to lack of information, indicate a difference in a respondent's eligibility between edited variables and information in CAPI, and other instances where the information could not be ascertained. Starting with the 2018 sample, the IPUMS MEPS label "Unknown-not ascertained" retains all of the explicit definitions of missingness listed above, but now all other instances where the information could not be ascertained are coded to the IPUMS MEPS label "Unknown-don't know," which expands the meaning of that label from only cases where the respondent chose "don't know" to also include cases where the information from the question was not ascertained. The documentation for the MEPS-HC 2018 Full Year Consolidated Data File states, "Cases that used to contain -9 (NOT ASCERTAINED) in MEPS variables are now distributed between -8 (DK) and -15 (CANNOT BE COMPUTED). Most of the cases that were previously -9 (NOT ASCERTAINED) will now be assigned -8 (DK)."
"Unknown-not ascertained" Reason | Example IPUMS Variable |
---|---|
Confidentiality | ARTHGLUPAGE |
Skipped question on pencil-and-paper survey | ADHECR |
Lack of information | MARSTRD |
Edited variables different from information in CAPI | HYPERTENEV |
Based on the MEPS documentation and comparisons of the 2018 FYC data to earlier data, IPUMS MEPS staff decided not to introduce a new code for the -15 MEPS code. In the majority of cases -15 codes are assigned to "Unknown-not ascertained," which is the same way MEPS -9 codes were assigned in earlier years. On rare occasion, IPUMS assigns MEPS -15 codes as "NIU" codes. We only do this when IPUMS has defined a variable's universe based on excluding "Not Ascertained" codes to minimize the disruption to analyses that include data for both 2018 and earlier years. The expanded meaning of "Unknown-don't know" in 2018 to encompass some of what was previously included in "Unknown-not ascertained" will likely result in higher frequencies of "Unknown-don't know" for some variables.