Income Data in MEPS


Data Collection

MEPS primarily collected data about income sources and amounts on the Income Section of the Household Component questionnaire, fielded during rounds 3 and 5. Additional financial data about household members' real estate, businesses, vehicles, investments, other assets, and debts were collected in the Assets Section of the questionnaire, fielded during round 5. Employment data, including hourly wage information, were collected in every round.

Before being asked about income sources and amounts, respondents were prompted to retrieve and refer to any "tax materials that you may have." Reported income amounts reflect the amount of income reported on annual tax returns by the person and, if applicable, the secondary filer on joint tax returns.

Survey Instrument

While the survey instrument used to collect income data for MEPS targets certain persons to answer specific income questions (see Target Populations section for more information), it also provides a mechanism for all persons in a household to report receiving income from any source. The final questionnaire item in the income section asks if anyone in the family receives income from other sources that they haven't already discussed; as part of this question, field interviewers show a card that includes examples of what other types of income the person may report. The examples of other income sources listed on the show card change over time. Because all persons could technically report income for any of the sources asked about in the survey via this "other" income source mechanism, IPUMS has defined the universe for all income variables as "All Persons." It is not clear if income types explicitly asked about in the survey (e.g., interest income) only reported when asked about "other" income are classified as other income (INCOTH) or reassigned to a specific income type (e.g., interest income (INCINT)).

Target Populations

Despite all persons having the opportunity to report income from any source, the survey instrument targets specific persons to report information on certain income sources. Some income sources are asked about using "family-style" questions, where the family respondent was asked if anyone in the family received income from a source. However, questions about other types of income sources were asked of specific persons based on their tax filing status (specifically the type of tax form they completed or planned to complete) and, beginning in 2002, their age. For 2002 and later MEPS-HC data, there are three target populations for income questions.

  1. Primary filers (any tax form) AND all persons aged 16 and older (regardless of tax filing status) were explicitly asked questions about the following income sources:
  2. Primary filers using a 1040 long form, other form (excluding 1040A and 1040EZ), or an unknown type of form, AND all persons aged 16 and older (regardless of tax filing status) were explicitly asked about the following income sources:
  3. Using "family style" questions, all persons were eligible to report income from the following sources:
    The target populations have changed slightly over time. In 2002 and later years, all persons aged 16 and older were asked income questions, regardless of their tax filing status. Additionally, in these years, persons who had filed or indicated they would file a tax return as a primary filer were asked income questions, regardless of age. Prior to 2002, only persons who were the primary filer of a tax return and who knew the type of form they file were targeted for most income questions. For 1996-2001, there are four target populations for MEPS income questions. Income sources with different tax-form requirements (in addition to the change to include persons filing an unknown type of tax form and all persons aged 16 and older) between the pre-2002 and 2002-forward period are noted with an asterisk.

    1. Primary filers, regardless of tax form (excluding filers who did not know the type of tax form they file):
    2. Primary filers using a 1040 long form, short form 1040A, or other form (excluding 1040EZ forms and filers who do not know the type of tax form they file):
    3. Primary filers using a 1040 long form or other form (excluding 1040A and 1040 EZ forms and filers who do not know the type of tax form they file):
    4. Using "family style" questions, all persons were eligible to report income from the following sources:
      Key Changes to Income Data over Time

      As suggested by changes to the target population over time, the skip patterns for the income section of the questionnaire were revised substantially in 2002. Primary filers who did not know the type of tax form they filed or will file and all persons aged 16 and older, regardless of tax filing status, were included. AHRQ notes that these updates increased response rates for many income types and led to improvements in the imputation process, and that while these changes had a larger impact on persons ages 65 and older, they had little effect for persons under the age of 65. Additionally, farm and non-farm business income were combined into a single variable (INCBUS), and non-income questions about tax forms (e.g., total medical expenses) were removed in 2002. Prior to 2002, persons were also asked to report what percentage of their income was taxable for several income sources (unemployment, interest, social security, and IRA); these questions were also eliminated in 2002. Prior to 2013, the list of income sources included income from state or local tax refunds, but these sources of income were removed from the show card in 2013 and later years.

      Data Editing & Imputation

      AHRQ documentation indicates that person-level income amounts are edited or imputed for all FYC records. Generally, complete responses were not edited further, except in rare situations (see Other Editing section below). Income variables were edited sequentially, to include information about previously edited income amounts and maintain patterns of correlation across income sources. The order for editing income sources was wages/salary, interest, business, farm (1996-2001 only), dividends, tax refunds (1996-2012 only), alimony, property sales, trusts and rents, retirement/pensions, IRAs, social security, unemployment, worker's compensation, veteran's income, regular cash income, other income sources, child support, Supplemental Security Income, and public assistance.

      Top and Bottom-Coding of Income Data

      For each income source, top codes were applied to the top percentile of all cases for the income source. This includes negative amounts that exceeded income thresholds in absolute value; for example, if the top percentile of a given income source was an absolute value of $15,000, the values of that income source would be top-coded at $15,000 and bottom-coded at -$15,000. In cases where less than one percent of all persons received an income source, all income amounts for recipients of that income source were top-coded. Total person-level income is constructed by summing all person-level income components and is also top-coded using the same method applied to the individual income sources. After summed total person-level income was top-coded, component income amounts were adjusted to make the sum of the individual income sources consistent with total person-level income.

      AHRQ provides the following details about their process for masking top-coded income amounts:

      Top-coded income amounts were masked using a regression-based approach. The regressions relied on many of the same variables used in the hot-deck imputations, with the dependent variable in each case being the natural logarithm of the amount that the income component was in excess of its top-code threshold. Predicted values from this regression were reconverted from logarithms to levels using a smearing correction, and these predicted amounts were then added back to the top-code thresholds. This approach preserves the component-by-component weighted means (both overall and among top-coded cases), while also preserving much of the income distribution conditional on the variables contained in the regressions. At the same time, this approach ensures that every reported amount in excess of its respective threshold is altered on the public use file. The process of top-coding income amounts in this way inevitably introduces measurement error in cases where income amounts were reported correctly by respondents. Note, however, that top-coding can also help to reduce the impact of outliers that occur due to reporting errors. Retrieved from: /h181/h181doc.shtml#IncomeTax on March 12, 2019.

      Tax Form Information

      Persons were encouraged to retrieve any federal tax return information when answering the income section. Using these forms, the original data include logical edits to assign separate income amounts to married persons filing a joint federal tax return who replied to income questions with a combined total of both persons' incomes from a source.

      Other Editing

      Income amounts that were less than $1, but greater than $0 were treated as missing and hot-decked from persons with a positive amount of the corresponding income. AHRQ documentation notes that a limited number of outlier responses were edited as well, specifying that this primarily was done for public income sources where the reported amounts exceeded the possible amount a person could receive. Finally, for wage and salary income only, round-level employment were used to assign wage/salary amounts to persons who reported $0 (or missing) wage/salary income, but indicated they had worked for pay.

      Unedited Variables

      Tax variables, food stamp variables, the SSI disability flag (when available), and the welfare participation flag are unedited. AHRQ notes that there has been no effort to address inconsistencies between program participation and tax variables with other MEPS data, and instructs users to be careful when using unedited variables.

      Imputation Methods

      For all years of MEPS data, income variables were revised using logical editing and weighted, sequential hot-decks to replace missing income responses; this includes values missing because of item nonresponse as well as persons who were included in the annual file, but who were not present in the rounds when income questions were asked. Imputations used additional information from the survey, including age, education, employment status, race, sex, and region. Some income types used additional relevant information for imputations. Wage and salary income imputations used data on past year employment and number of weeks worked, if available. Information on child support income included family-level information about marital status and number of children; Supplemental Security Income and public assistance income amounts also utilized simulated program eligibility indicators that compared state-level program eligibility with family composition and income data.

      General Imputation Approach

      Reported income amounts are generally not edited; the second step assigns income amounts to persons providing a broad income range rather than a specific dollar amount via weighted, sequential hot-deck imputation. Next, missing values are set to zero for persons for whom it would be illogical to have income from this source or for whom there are too few recipients for hot-decking. Then, persons who did not report a specific dollar amount but indicated receipt of an income source are assigned values using a weighted, sequential hot-deck imputation and a donor pool restricted to persons with nonzero income values. Finally, persons with missing data on both income receipt and amounts are assigned values using a weighted, sequential hot-deck imputation and a donor pool that includes persons receiving both zero and nonzero income amounts. The individual steps of the imputation strategy are described in greater detail under the Imputation Flags section.

      Imputation Flags

      Detailed imputation flags are included to indicate the method by which person-level income amounts were derived.

      Complete responses (flag value of 1)

      Imputation flag variables with a value of 1 indicate that the person provided a complete response, which is reported in the corresponding income variable. The exact response provided may not be used if the person's income was top- and bottom-coded, or if the response was otherwise edited2.

      Converted bracket responses (flag value of 2)

      Imputation flag variables with a value of 2 indicate that the person provided their income in a bracketed range, rather than a specific dollar amount. Specific dollar amounts for these persons were imputed using weighted sequential hot-decking, using donors who report specific dollar amounts within the same broad income brackets. Some persons are assigned zero dollars of income via this method; these are married couples filing jointly that report a bracketed joint income and indicate that one spouse earned 0% of the income range listed.

      Missing values imputed to 0 (flag value of 3)

      Imputation flag variables with a value of 3 indicate that the person had missing amount information for this income source, and that the value of that income source was set to zero. This may include cases with too few recipients to warrant hot-deck imputations of positive values (e.g., males receiving alimony), or persons who were logically assigned to a value of zero (e.g., persons with missing wage/salary data who reported not working for pay in the employment section of the survey).

      Employment data used for non-zero incomes (flag value of 4)

      QINCWAGE is the only imputation flag variable that may have a value of 4. Persons with a value of 4 for QINCWAGE did not provide a valid dollar amount or dollar range, but for whom related information from the employment section of the survey could be used to construct annualized wages. This uses information on hourly wage, number of hours worked, and number of weeks worked. Persons without a full year of data were assumed to be fully employed during the remainder of the year if employed at the time they provided data; persons who died or were institutionalized were assigned zero dollars in wages/salary for the period that they were not in MEPS.

      Conditional hot-decked income amounts (flag value of 5)

      Imputation flag variables with a value of 5 indicate that the person reported receipt of income from a specific source, but did not provide a specific or bracketed dollar amount; values were assigned to these cases via a conditional hot-deck. The donor pool was restricted to persons with non-zero dollar amounts of this income source.

      Unconditional hot-decked income amounts (flag value of 6)

      Imputation flag variables with a value of 6 indicate that the person had missing income information, and that values were assigned using an unconditional hot-deck. Unconditionally hot-decked income values were assigned using a donor pool of persons receiving both zero and nonzero income amounts; this was employed where there was little or no information about the person's income source. For imputation of wage and salary income specifically, this included both workers and non-workers as donors.

      Edited using NHIS data (flag value of 7; only available 1999-2001)

      Imputation flag variables with a value of 7 are only available in 1999-2001, and indicate that a person's response was edited using NHIS data. In 1999-2001, MEPS income values were edited using NHIS data via a cold-deck imputation strategy, described below.

      Use of NHIS Data in Imputation

      The MEPS sample is drawn from the previous year's NHIS sample; because of this, individuals and responding units can be matched between the MEPS and NHIS datasets. This linkage is used as part of the income imputation strategy for MEPS income data.


      Hot-deck imputation is the use of a donor response from the same dataset that contains the missing response (in this case the MEPS-HC FYC data).

      In all years, hot-deck imputation is used to assign income values from a donor pool of MEPS participants with valid income responses to MEPS participants with missing data. In 1998 and later years, receipt of income data in a matched NHIS tax filing unit is used to inform whether or not a person received income from a specific source, which, when possible, then informs the hot-deck imputation strategy (i.e., conditional or unconditional). The donor pool consists of MEPS participants, but recipients receive conditional or unconditional hot-decks based on information about receipt of income types in NHIS. Not all income sources in MEPS have an equivalent measure in NHIS; matching of cases for income receipt information is only used for income from interest, dividends, business, pensions, and social security.


      Cold-deck imputation is the use of a donor response from a different dataset than the dataset containing the missing response (in this case, 1995 NHIS data that can be linked to the 1996 MEPS are used as the donor pool for 1999-2001 MEPS-HC FYC data).

      In 1999-2001, matched data between NHIS and MEPS were used in the hot-deck imputations described above, as well as cold-deck imputations for persons who were not filing a tax return or did not know the type of tax form they use to file. Prior to 2002, only primary filers who knew the type of form they were filing were explicitly asked questions about income. This resulted in high rates of missing data for persons who did not file tax returns or who did not know the type of tax form they filed.

      Data from participants who were matched between the 1995 NHIS3 and of the MEPS panel that entered in 1996 were used to create a donor pool of non-filers and persons who did not know the type of tax form they filed for cold-deck imputations; the group of participants matched between the 1995 NHIS and 1996 MEPS were used for all cold deck imputations in 1999-2001. Given differences in income patterns between filers and non-filers, cold-decking using the same linked NHIS-MEPS pool that includes non-filers and unknown tax form filers was preferable to hot-decking with MEPS data of the same year that only included filers. Cold-decks were run prior to the hot-decks for each variable, and persons with a value imputed by cold-deck could not be donors in subsequent hot-decks. Cold-decks were used for interest, dividends, pension, and social security incomes; dollar amounts from the 1995 NHIS data were adjusted for inflation. AHRQ documentation notes that a similar cold-deck imputation process was also used for certain filers of the 1040EZ tax form, another group with high rates of missing data because of income data skip patterns.

      Imputation based on Current Population Survey (CPS) data

      In 1996-1998, social security income amounts were imputed for persons ages 65 and older who initially reported neither social security nor earnings income. AHRQ staff compared these income amounts to those reported by the same age group in the CPS Annual Social and Economic Supplement (ASEC), and found that, in CPS ASEC data, such cases are quite rare. AHRQ staff utilize an imputation strategy based on CPS ASEC data to address the likely underestimation of social security income for persons aged 65 and older with no earnings income. The strategy employed a probabilistic model based on CPS ASEC data for the subsequent year (e.g., the 1997 CPS ASEC was used for the 1996 MEPS) to select persons/couples whose social security income is changed from zero to a positive imputed amount.

      1The family style question about "other" income sources may be the mechanism by which persons who are not explicitly targeted to answer questions about an income source report income from that source. Back to text

      2This is rare, but includes persons reporting zero dollars of wage/salary income despite being employed for pay in the last year, or persons aged 65 and older who reported zero dollars of wage/salary income and zero dollars of social security income in 1996-1998. Back to text

      31995 was the most recent year of NHIS data that reported income amounts by source instead of simple receipt of each income source; income amounts for the donor pool were based on NHIS responses. Back to text