A Comparison of Annual Earnings Data in the Current Population Survey and in the Social Security Administration's Detailed Earnings Record
Research and Statistics Note No. 2024-01 (released March 2024)
Patrick J. Purcell is with the Office of Research, Evaluation, and Statistics, Office of Retirement and Disability Policy, Social Security Administration.
Contents of this publication are not copyrighted; any items may be reprinted, but citation is requested. The findings and conclusions presented in this note are those of the author and do not necessarily represent the views of the Social Security Administration.
Introduction
CPS | Current Population Survey |
CPS/ASEC | Current Population Survey Annual Social and Economic Supplement |
DER | Detailed Earnings Record |
OLS | ordinary least squares |
SIPP | Survey of Income and Program Participation |
SSA | Social Security Administration |
For earnings data, primary sources include household surveys conducted by the Census Bureau and administrative data files maintained by the Social Security Administration (SSA). Under a cooperative agreement, self-reported earnings data from Census Bureau surveys are linked with the amounts reported for those workers by their employers and recorded in the SSA files (Genadek, Hokayem, and Pendergast 2021).1 This linkage allows researchers to assess the accuracy of the earnings and benefits amounts reported by workers in household surveys, which are subject to reporting error. Research conducted with linked data sets helps the Census Bureau to improve its surveys and SSA to administer its programs more efficiently.2
There are several possible sources of error when researchers use only survey data. First, individuals have become less willing to participate in surveys in recent years, and respondents have been less likely to answer certain questions (Meyer, Mok, and Sullivan 2015; Bollinger and others 2019). Rising nonresponse rates could bias research results that use survey data alone. Imputation errors can also occur when using only survey data. For example, when respondents decline to answer questions about their earnings, the Census Bureau uses statistical methods to impute the missing information by examining the responses of other participants with similar demographic traits who answered the relevant questions. Bollinger and Hirsch (2006) found that including imputed values of earnings in regressions relating earnings to other attributes resulted in biased coefficients. However, by using linked survey and administrative data, imputed values can be replaced with values from administrative records.
Using administrative data sets alone also has limitations. Administrative data that are specific to participants in a particular program or to a geographic area may not be nationally representative of the population. Inferences about people who are not enrolled in the program or who live in other locations cannot be drawn from data that are not nationally representative. Moreover, administrative data often include information relevant only to a particular government program or policy. For example, SSA does not collect information about beneficiaries' education levels because educational attainment does not affect its program qualifications. However, because education has a strong statistical relationship to lifetime earnings and other social and economic outcomes, the Census Bureau regularly collects educational attainment information in its household surveys. Using Census Bureau survey data linked to SSA records allows researchers to include educational attainment data when studying outcomes such as lifetime earnings and Social Security benefits. Linked data files provide more and better data for studying the relationship between demographic characteristics and economic outcomes than either survey data or administrative data alone can provide.
Household surveys linked to administrative records combine the strengths of both sources, while also reducing the limitations of each source separately. The Census Bureau collects important demographic and economic information in its household surveys, such as the Current Population Survey (CPS), but surveys are also subject to recall error and misreporting. Administrative data files, including SSA's Master Earnings File and Master Beneficiary Record, contain more accurate records of earnings and benefits. Linking survey files to administrative records combines surveys' rich demographic details with administrative records' greater accuracy on certain topics.
One way that researchers use linked data is to evaluate measurement error, nonresponse, and imputation effects on survey data accuracy. Davies and Fisher (2009) reported that further research is needed on the extent to which self-reported earnings in household surveys agree with earnings recorded in administrative records from SSA. They noted that comparing earnings in household surveys with earnings in administrative records could lead to improved methods for imputing missing data and more accurate analyses of proposed revisions to the Old-Age, Survivors, and Disability Insurance and Supplemental Security Income programs. Following their suggestion, this research and statistics note compares wage and salary earnings data in the CPS and in SSA's data files.
Previous Research
Several researchers have used either CPS or Survey of Income and Program Participation (SIPP) data linked to SSA data files to examine how reported earnings in the surveys compare with earnings in administrative records. In most of these studies, differences between survey and administrative data were assumed to represent survey measurement error. Following standard practice, this note compares earnings reported in the CPS with earnings recorded in SSA's Detailed Earnings Record (DER), and differences are assumed to represent survey measurement error. Nevertheless, readers are cautioned that all sources of data, including administrative records, are subject to error.
Several studies that compared self-reported earnings in surveys with earnings in SSA's records found that survey measurement error is not random and varies with observable worker characteristics. Bound and Krueger (1991) compared a sample of heads of households' earnings from the CPS with their earnings from SSA records over 2 years. They found that CPS measurement error is nonrandom and that for men, error is negatively correlated with actual earnings. Bollinger (1998) compared earnings in the March 1978 CPS with SSA data and found that low earners significantly overreported their earnings. He concluded that overreporting among low earners is largely responsible for survey measurement error. Roemer (2002) compared CPS and SIPP earnings data with SSA earnings records and found that the distribution of annual earnings differs between the CPS and the SIPP. The CPS Annual Social and Economic Supplement (CPS/ASEC) showed an excess of high earnings and a shortage of low earnings while the SIPP showed the opposite. He attributed the shortage of low earnings in the CPS mainly to underestimates of earnings among part-year or part-time workers.
To study the nonresponse and imputation effects in the CPS on earnings measures, Bollinger and others (2019) examined CPS data linked to SSA earnings records for 2006–2011. They found that earnings were more frequently missing among lower earners and higher earners than among those with earnings closer to the median. The authors noted that if nonresponse were random, researchers could use a respondent-only sample reweighted by the inverse probability of being in the respondent sample; however, this adjustment is not sufficient when nonresponse is nonrandom. They suggested that researchers use linked survey and administrative data and replace survey earnings with administrative earnings. They also concluded that even if nonresponse were random, earnings estimates could be biased if they include imputed values for missing data.
Pedace and Bates (2000), Gottschalk and Huynh (2005), and Cristia and Schwabish (2007) examined SIPP data linked to SSA earnings records and found that reporting error is negatively correlated with earnings level (as measured in the administrative data). All three studies found that respondents with lower earnings tend to overreport earnings and those with higher earnings tend to underreport earnings. Kim and Tamborini (2012) found that SIPP respondents' misreports of earnings are nonrandom and that low earners tend to overreport earnings and high earners tend to underreport earnings, confirming earlier studies. They also found that low-earning Black workers overreport earnings more than low-earning White workers. Kim and Tamborini (2014) also used linked SIPP and SSA data and found that at higher earnings levels, higher-educated workers were less likely to underreport their earnings than were less-educated workers, but at lower earnings levels, higher-educated workers were more likely to overreport their earnings than were less-educated workers. Abowd and Stinson (2013) found that earnings data in SIPP and administrative records from SSA were similar except for imputed earnings in the SIPP, where the authors found greater survey measurement error.
Data and Methods
The data analyzed for this note consist of records from the Census Bureau's CPS/ASEC linked to SSA's DER. The CPS has been conducted since 1948 and is extensively documented on the Census Bureau's and the Bureau of Labor Statistics' websites. The CPS/ASEC is conducted annually in March, and the Census Bureau publishes detailed technical documentation for the supplement every year (Census Bureau 2022). Rothbaum and Berchick (2019) described recent changes to the CPS/ASEC income questions, including the addition of follow-up questions that allow respondents to report income in ranges rather than as specific amounts. Administrative files from SSA are available only on a restricted-use basis and are not documented as extensively as are Census Bureau public-use data files. Olsen and Hudson (2009) described SSA's earnings data, explained how the data are collected and stored, and identified some of the limitations and complexities of using the data for research purposes.
This analysis compares annual wage and salary earnings data reported by workers aged 18–69 in the CPS/ASEC with those recorded for the same workers in the DER.3 Wage and salary earnings in the DER are derived from the Form W-2 that employers submit to SSA.4
Specifically, this note compares the CPS variable WSAL_VAL (total wage and salary earnings) with a DER variable derived from Box 5 of Form W-2.5 Census Bureau (2022) defines wages and salaries as “total money earnings received for work performed as an employee during the income year. It includes wages, salary, Armed Forces pay, commissions, tips, piece-rate payments, and cash bonuses earned, before deductions are made for taxes, bonds, pensions, union dues, etc.” Box 5 of Form W-2 consists of all wages, salaries, and tips subject to the Medicare payroll tax. This includes amounts deferred into 401(k) plans, which are excluded from income tax but are subject to Medicare payroll tax.6
The DER data in this study are linked to nine CPS/ASEC files for selected years from 2005 to 2021, the most recent linked file available when this note was written. This analysis does not include self-employment income, which workers report directly to the Internal Revenue Service on Form 1040, because it is subject to different reporting requirements and different potential sources of error than wage and salary income. In this note, the term “earnings” refers to wage and salary income.
In the sections to follow, 10 charts, with accompanying discussion, present descriptive statistics on CPS/ASEC respondents and on the differences in the earnings data between the CPS and the DER by selected worker characteristics.
The first three charts show key characteristics of the March CPS/ASEC sample for 2005, 2006, 2010, 2011, 2015, 2016, and 2019 through 2022.7 These charts represent the entire CPS public-use file each year, rather than just the subset of each file linked to SSA records. Chart 1 shows the total number of households in the CPS sample and the number and percentage of sample households that were interviewed. Chart 2 shows the proportion of households that were not interviewed because the occupants could not be contacted or they declined to participate in the survey. Chart 3 shows the annual numbers and percentages of CPS respondents whose earnings were imputed by the Census Bureau because they did not answer the relevant survey questions.
Charts 4–10 show the percentage difference between CPS results and DER records on earnings for selected CPS files from March 2005 through March 2021. The percentage difference between earnings data in the CPS and in the DER—defined here as (CPS − DER) ÷ DER—was calculated for each respondent who had both CPS and DER earnings recorded.8 The percentage differences were then sorted from the largest positive difference to the largest negative difference. Each chart shows three measures of percentage difference between CPS and DER earnings data: the median difference and the 75th and 25th percentile differences (that is, the interquartile range). This analysis focuses on the percentage differences between the CPS and DER earnings data for each respondent. Therefore, the relevant median difference is the median of all respondents' CPS and DER earnings differences. This is not the same as the difference in medians between CPS and DER earnings. This distinction is important for interpreting the results of the analysis correctly.
Finding the median (or other percentile) percentage difference between CPS and DER earnings data involves sorting the differences for each person from the largest positive difference (where the CPS earnings value is higher than the DER earnings value) to the largest negative difference (where the CPS earnings value is lower than the DER earnings value). A positive difference in CPS earnings − DER earnings indicates higher earnings recorded in the CPS than in the DER. A negative difference in CPS earnings − DER earnings indicates lower earnings recorded in the CPS than in the DER. For convenience, I refer to positive differences as overreporting of earnings in the CPS and negative differences as underreporting in the CPS, regardless of whether the CPS earnings data were reported or imputed.
Chart 4 shows the differences between CPS and DER earnings data for all individuals with earnings recorded in the CPS by earnings imputation status, that is, both self-reported and imputed earnings, only self-reported earnings, and only imputed earnings. Charts 5–10 show the percentage difference between CPS and DER earnings data categorized by workers' DER earnings quartile, age, sex, racial or ethnic group, education level, and type of hours worked. These charts cover workers with earnings data in both the CPS and the DER, regardless of whether CPS earnings were reported or imputed. The charts are followed by a section that describes the results of an ordinary least squares (OLS) regression in which the characteristics shown in the charts—imputation status, DER earnings quartile, age, sex, racial or ethnic group, education level, and type of hours worked and a variable indicating coverage by employer-provided health insurance—are regressed on the difference between CPS and DER earnings data.
Trends in Completed CPS Interviews and Imputation of Earnings
For each year of CPS data shown in Chart 1, the initial sample size was between 89,000 and 100,000 units. More than 99 percent of these units were households. The remainder were group quarters, such as college dormitories. Over time, both the number and percentage of units for which CPS interviews were successfully completed declined. The March 2005 CPS/ASEC sample consisted of 99,699 units, of which 77,482 (77.7 percent) were interviewed. The March 2006 CPS/ASEC sample consisted of 97,461 units, of which 76,048 (78.0 percent) were interviewed. For the March 2020, 2021, and 2022 surveys, an average of 90,485 units were in the sample and an average of 60,819 (67.2 percent) were interviewed. The percentage of sample units interviewed in these years may have been reduced by the COVID-19 pandemic, but the decline in the percentage of sample units interviewed began before the 2020 onset of the pandemic.
Year | Units in sample | Units interviewed | Percentage of sample units interviewed |
---|---|---|---|
2005 | 99,699 | 77,482 | 77.7 |
2006 | 97,461 | 76,048 | 78.0 |
2010 | 97,263 | 76,260 | 78.4 |
2011 | 96,958 | 75,188 | 77.5 |
2015 | 99,461 | 74,257 | 74.7 |
2016 | 94,097 | 69,484 | 73.8 |
2019 | 94,589 | 68,301 | 72.2 |
2020 | 91,500 | 60,460 | 66.1 |
2021 | 90,759 | 62,850 | 69.2 |
2022 | 89,197 | 59,148 | 66.3 |
SOURCE: CPS/ASEC. |
The decline in the proportion of CPS sample units interviewed was the result of a rising proportion of households selected for the sample that chose not to participate. The Census Bureau classifies units that are not interviewed as Type A, Type B, or Type C noninterviews.9 Type A noninterviews consist of households where the residents refused to participate, no one was home after repeated attempts to contact them, the occupants were temporarily absent, a language barrier prevented the interview, the sample unit could not be located, or the occupants could not be interviewed for some other reason. Type B noninterviews are the result of an unoccupied housing unit. Type C noninterviews consist of housing units that were abandoned or demolished. In recent years, both the number of Type A noninterviews and the percentage of noninterviews that were Type A have risen. In March 2005, there were 7,485 Type A noninterviews, which constituted 33.7 percent of all noninterviews. In March 2022, there were 19,046 Type A noninterviews, which constituted 63.4 percent of all noninterviews (Chart 2).
Year | Number | As a percentage of all noninterviews |
---|---|---|
2005 | 7,485 | 33.7 |
2006 | 7,070 | 33.0 |
2010 | 5,678 | 27.0 |
2011 | 6,549 | 30.1 |
2015 | 10,271 | 40.8 |
2016 | 10,590 | 43.0 |
2019 | 13,511 | 51.4 |
2020 | 18,981 | 61.2 |
2021 | 16,455 | 59.0 |
2022 | 19,046 | 63.4 |
SOURCE: CPS/ASEC. |
When a household declines to participate in the CPS, it is called a “unit nonresponse.” If a household participates in the survey but declines to answer a certain question, it is called an “item nonresponse.” Survey participants sometimes decline answering questions about sources and amounts of income. When that occurs, the Census Bureau imputes a response through statistical procedures that match the nonrespondent to a respondent with similar characteristics that are correlated with receipt of the specific type of income in question.
CPS public-use files include variables that indicate whether the amount of income from a particular source on an individual's record was imputed. Chart 3 shows the number and percentage of survey respondents whose earnings amounts were imputed. The Census Bureau imputed earnings for 19.7 percent of participants who had any wage and salary income in the March 2005 CPS and for 17.7 percent in the March 2006 CPS. For the March 2021 and March 2022 surveys, the proportions were 21.4 percent and 21.9 percent, respectively. In recent years, the proportion of interviews for which the Census Bureau has imputed wage and salary income has been relatively stable at about 21 percent to 23 percent.
Year | Number | Percentage imputed | |
---|---|---|---|
Respondent reported | Census imputed | ||
2005 | 77,698 | 19,055 | 19.7 |
2006 | 79,361 | 17,020 | 17.7 |
2010 | 77,505 | 17,480 | 18.4 |
2011 | 74,703 | 17,324 | 18.8 |
2015 | 68,445 | 20,872 | 23.4 |
2016 | 64,048 | 19,177 | 23.0 |
2019 | 63,358 | 18,397 | 22.5 |
2020 | 55,864 | 16,248 | 22.5 |
2021 | 57,593 | 15,659 | 21.4 |
2022 | 53,071 | 14,839 | 21.9 |
SOURCE: CPS/ASEC. |
Comparison of CPS and DER Earnings Information
Chart 4 shows the percentage difference between CPS and DER earnings information for all workers aged 18–69 who had such information in both sources. In Panel A, the CPS component of the study sample includes all respondents regardless of whether their earnings information was self-reported or imputed by the Census Bureau. The median percentage difference falls into a narrow range from −0.6 percent to 0.0 percent and averages −0.3 percent. The 25th percentile difference between the CPS and DER amounts averages −16.1 percent and the 75th percentile difference averages 23.6 percent, for an average interquartile range of 39.7 percentage points. From 2005 through 2021, the interquartile range grew wider. In the March 2005 CPS, the 25th percentile difference between the CPS and DER data was −12.4 percent and the 75th percentile difference was 19.2 percent, an interquartile range of 31.6 percentage points. In March 2021, the 25th percentile difference was −18.3 percent and the 75th percentile difference was 25.7 percent, an interquartile range of 44.0 percentage points. Thus, the middle 50 percent of observations in March 2021 represented a wider range of difference between CPS and DER earnings data than in March 2005. Both underreporting and overreporting of earnings in the CPS increased by about 6 percentage points over this period.
Year | Percentile | ||
---|---|---|---|
75th | Median | 25th | |
Panel A: Respondent-reported and imputed CPS earnings | |||
2005 | 19.2 | -0.1 | -12.4 |
2006 | 21.4 | 0.0 | -13.8 |
2010 | 19.9 | -0.4 | -14.4 |
2011 | 21.2 | -0.3 | -15.2 |
2015 | 26.1 | -0.2 | -17.8 |
2016 | 25.9 | -0.6 | -18.4 |
2019 | 27.2 | 0.0 | -16.7 |
2020 | 25.6 | -0.2 | -17.6 |
2021 | 25.7 | -0.6 | -18.3 |
Panel B: Respondent-reported CPS earnings | |||
2005 | 14.1 | -0.2 | -10.4 |
2006 | 15.1 | -0.1 | -11.0 |
2010 | 14.1 | -0.4 | -11.5 |
2011 | 15.0 | -0.3 | -12.1 |
2015 | 17.3 | -0.3 | -13.6 |
2016 | 17.3 | -0.7 | -14.5 |
2019 | 19.4 | 0.0 | -13.6 |
2020 | 17.7 | -0.3 | -14.3 |
2021 | 17.5 | -0.8 | -15.5 |
Panel C: Imputed CPS earnings | |||
2005 | 84.0 | 6.0 | -36.6 |
2006 | 76.1 | 3.3 | -37.2 |
2010 | 74.6 | 2.5 | -38.0 |
2011 | 69.5 | 2.0 | -38.6 |
2015 | 75.7 | 2.5 | -38.8 |
2016 | 75.0 | 1.2 | -38.4 |
2019 | 80.6 | 4.6 | -37.2 |
2020 | 80.4 | 4.1 | -38.6 |
2021 | 87.5 | 5.2 | -40.1 |
SOURCE: Author's calculations based on CPS/ASEC and DER. |
By Imputation Status
Panel B of Chart 4 shows the percentage difference between CPS and DER earnings data for workers aged 18–69 with CPS data restricted to respondent-reported earnings. The restricted sample represents an average of 81.4 percent of CPS respondents aged 18–69 with earnings data in both the CPS and DER.
The median percentage difference ranges from −0.8 percent to 0.0 percent and averages −0.4 percent. The 25th percentile difference between the CPS and DER earnings data averages −12.9 percent and the 75th percentile difference averages 16.4 percent, an average interquartile range of 29.3 percentage points. As in Panel A, the interquartile range of difference between CPS and DER earnings data increased over time, but throughout the period the interquartile range of difference between CPS and DER earnings data is narrower in Panel B than in Panel A. In Panel B, for the March 2005 CPS, the 25th percentile difference between CPS and DER earnings data was −10.4 percent and the 75th percentile difference was 14.1 percent, an interquartile range of 24.5 percentage points. In March 2021, the 25th percentile difference was −15.5 percent and the 75th percentile difference was 17.5 percent, an interquartile range of 33.0 percentage points.
Panel C of Chart 4 shows the percentage difference between CPS and DER earnings data for workers aged 18–69 with CPS data restricted to respondents whose earnings amounts were imputed by the Census Bureau. For the years shown in Panel C, this restricted sample represents an average of 18.6 percent of workers aged 18–69. In Panel C, the median difference between CPS and DER earnings data is greater, and the interquartile range is much larger, than in Panels A and B (note the differing vertical axis scales).
In Panel C, the median difference ranges from 1.2 percent to 6.0 percent and averages 3.5 percent. The interquartile range of difference is larger for workers with imputed CPS earnings than for those who self-reported their earnings. The 25th percentile difference between CPS and DER earnings data averages −38.2 percent and the 75th percentile difference averages 78.2 percent, for an average interquartile range of 116.4 percentage points. Moreover, although the interquartile range in Panel B is almost symmetrical, the interquartile range in Panel C is asymmetrical. In Panel B, the 25th percentile difference between CPS and DER earnings in March 2021 is −15.5 percent and the 75th percentile difference is 17.5 percent. Both differ from 0 percent by almost the same amount. In Panel C, the 25th percentile difference between CPS and DER earnings in March 2021 is −40.1 percent but the 75th percentile difference is 87.5 percent. When earnings data are imputed in the CPS, overestimates of earnings are larger than underestimates. Why imputed CPS earnings differ from reported earnings in this asymmetrical way is a potential area for future research.
By DER Earnings Quartile
Chart 5 shows the difference between CPS and DER earnings data among workers aged 18–69 in each of the four DER earnings quartiles. For this chart, and all charts that follow, the CPS component of the study sample includes all respondents, regardless of whether their earnings information was self-reported or imputed by the Census Bureau.
Year | Percentile | ||
---|---|---|---|
75th | Median | 25th | |
Panel A: Fourth DER earnings quartile | |||
2005 | 3.0 | -3.5 | -16.8 |
2006 | 3.1 | -4.1 | -21.3 |
2010 | 2.6 | -4.8 | -22.7 |
2011 | 2.4 | -5.5 | -25.0 |
2015 | 2.3 | -7.2 | -32.5 |
2016 | 1.9 | -7.8 | -32.5 |
2019 | 3.2 | -6.8 | -29.3 |
2020 | 2.6 | -7.7 | -31.7 |
2021 | 1.9 | -7.9 | -32.1 |
Panel B: Third DER earnings quartile | |||
2005 | 7.7 | -1.1 | -11.5 |
2006 | 8.9 | -1.0 | -13.1 |
2010 | 8.5 | -1.2 | -13.1 |
2011 | 9.0 | -1.0 | -13.1 |
2015 | 10.9 | -1.4 | -16.1 |
2016 | 10.3 | -1.8 | -17.2 |
2019 | 11.0 | -1.3 | -15.9 |
2020 | 9.9 | -1.4 | -15.9 |
2021 | 8.7 | -2.3 | -17.0 |
Panel C: Second DER earnings quartile | |||
2005 | 23.9 | 0.6 | -9.9 |
2006 | 24.7 | 1.3 | -11.1 |
2010 | 25.7 | 0.9 | -10.8 |
2011 | 27.9 | 1.3 | -11.3 |
2015 | 31.8 | 2.1 | -12.7 |
2016 | 31.3 | 1.6 | -13.2 |
2019 | 29.2 | 2.2 | -12.5 |
2020 | 27.3 | 1.2 | -13.7 |
2021 | 27.1 | 0.9 | -14.5 |
Panel D: First DER earnings quartile | |||
2005 | 135.8 | 15.5 | -9.7 |
2006 | 145.4 | 20.2 | -8.9 |
2010 | 130.8 | 14.0 | -10.4 |
2011 | 133.7 | 16.9 | -10.6 |
2015 | 164.3 | 26.7 | -9.3 |
2016 | 162.2 | 26.7 | -9.8 |
2019 | 162.8 | 32.3 | -6.6 |
2020 | 162.5 | 30.4 | -7.9 |
2021 | 184.2 | 33.2 | -7.8 |
SOURCE: Author's calculations based on CPS/ASEC and DER. |
The difference between CPS and DER earnings data varies substantially across DER earnings quartiles. In general, workers in the top two quartiles of DER earnings underreport earnings and those in the bottom two quartiles overreport earnings. Workers in the fourth (highest) quartile underreport earnings by a larger percentage on average than those in the third quartile, and those in the first (lowest) quartile overreport earnings by a larger percentage on average than those in the second quartile.
Panel A of Chart 5 shows the difference between CPS and DER earnings data among workers in the fourth quartile of DER earnings:
- The median difference ranges from −3.5 percent to −7.9 percent and averages −6.1 percent.
- The 25th percentile difference averages −27.1 percent and the 75th percentile difference averages 2.6 percent, an interquartile range of 29.7 percentage points.
Panel B of Chart 5 shows the difference between CPS and DER earnings data among workers in the third quartile of DER earnings:
- The median percentage difference ranges from −1.0 percent to −2.3 percent and averages −1.4 percent.
- The 25th percentile difference averages −14.8 percent and the 75th percentile difference averages 9.4 percent, an interquartile range of 24.2 percentage points.
Panel C of Chart 5 shows the difference between CPS and DER earnings data among workers in the second quartile of DER earnings:
- The median percentage difference ranges from 0.6 percent to 2.2 percent and averages 1.3 percent.
- The 25th percentile difference averages −12.2 percent and the 75th percentile difference averages 27.6 percent, an interquartile range of 39.8 percentage points.
Panel D of Chart 5 shows the difference between CPS and DER earnings data among workers in the first quartile of DER earnings:
- The median percentage difference ranges from 14.0 percent to 33.2 percent and averages 24.0 percent.
- The 25th percentile difference averages −9.0 percent and the 75th percentile difference averages 153.5 percent, an interquartile range of 162.5 percentage points.
- The 75th percentile average difference of 153.5 percent indicates that one-fourth of workers had CPS earnings amounts that were more than two and a half times higher on average than the amounts recorded in the DER.
Several factors could contribute to the large percentage difference between the CPS and DER data for earners in the first quartile. First, because earnings are relatively low among these workers, a relatively small dollar difference between CPS and DER data can appear large when expressed as a percentage. Also, earnings are imputed for a larger proportion of workers in the lowest quartile. CPS earnings data were imputed for an average of 17.2 percent of workers in the fourth quartile, 17.4 percent in the third quartile, 19.2 percent in the second quartile, and 20.6 percent in the first quartile (not shown). Third, low earners could overreport earnings because of the social stigma associated with low earnings. For example, workers who experience a year of below-average earnings might report a higher amount if they believe it is more representative of their typical annual earnings. Fourth, lower-earning CPS respondents could report cash earnings not captured on Form W-2, from which earnings data in the DER are derived. Finally, low-earning workers might differ from high-earning workers systematically in other unidentified characteristics.
By Age
Chart 6 shows the percentage differences between CPS and DER earnings data for workers in five age groups: 18–29, 30–39, 40–49, 50–59, and 60–69.
Year | Percentile | ||
---|---|---|---|
75th | Median | 25th | |
Panel A: Aged 18–29 | |||
2005 | 38.6 | 1.7 | -13.1 |
2006 | 41.6 | 2.3 | -15.3 |
2010 | 40.6 | 1.3 | -13.9 |
2011 | 44.3 | 2.2 | -15.2 |
2015 | 53.2 | 2.6 | -17.6 |
2016 | 50.5 | 1.7 | -18.5 |
2019 | 50.7 | 3.2 | -17.4 |
2020 | 48.5 | 2.3 | -17.8 |
2021 | 53.3 | 1.9 | -17.8 |
Panel B: Aged 30–39 | |||
2005 | 19.3 | 0.0 | -10.5 |
2006 | 21.4 | 0.2 | -11.4 |
2010 | 18.4 | 0.0 | -11.9 |
2011 | 18.4 | 0.0 | -12.7 |
2015 | 25.4 | 0.4 | -14.1 |
2016 | 25.0 | 0.0 | -14.5 |
2019 | 25.2 | 0.9 | -13.5 |
2020 | 22.3 | 0.0 | -15.8 |
2021 | 21.9 | -0.3 | -16.2 |
Panel C: Aged 40–49 | |||
2005 | 12.5 | -1.0 | -12.3 |
2006 | 14.6 | -0.7 | -14.0 |
2010 | 14.6 | -0.9 | -14.5 |
2011 | 15.1 | -1.0 | -15.8 |
2015 | 18.0 | -1.2 | -18.6 |
2016 | 17.3 | -1.7 | -20.3 |
2019 | 19.2 | -0.7 | -17.6 |
2020 | 18.6 | -1.0 | -18.8 |
2021 | 16.9 | -1.7 | -18.7 |
Panel D: Aged 50–59 | |||
2005 | 10.2 | -1.4 | -13.6 |
2006 | 12.8 | -1.0 | -14.3 |
2010 | 11.9 | -1.7 | -15.9 |
2011 | 13.5 | -1.4 | -16.7 |
2015 | 15.9 | -1.9 | -20.2 |
2016 | 16.0 | -1.9 | -20.1 |
2019 | 18.2 | -1.3 | -18.5 |
2020 | 18.9 | -1.3 | -19.2 |
2021 | 18.1 | -2.0 | -20.8 |
Panel E: Aged 60–69 | |||
2005 | 17.3 | -0.4 | -13.7 |
2006 | 16.6 | -0.7 | -14.3 |
2010 | 14.6 | -1.2 | -16.6 |
2011 | 17.2 | -0.9 | -15.5 |
2015 | 17.2 | -0.9 | -18.5 |
2016 | 19.6 | -1.2 | -18.8 |
2019 | 20.4 | -0.7 | -17.5 |
2020 | 20.4 | -0.4 | -17.0 |
2021 | 20.4 | -1.5 | -18.5 |
SOURCE: Author's calculations based on CPS/ASEC and DER. |
- Among workers aged 18–29, the median percentage difference ranges from 1.3 percent to 3.2 percent and averages 2.1 percent (Panel A). The 25th percentile difference averages −16.3 percent and the 75th percentile difference averages 46.8 percent, an interquartile range of 63.1 percentage points.
- Among workers aged 30–39, the median percentage difference ranges from −0.3 percent to 0.9 percent and averages 0.1 percent (Panel B). The 25th percentile difference averages −13.4 percent and the 75th percentile difference averages 21.9 percent, an interquartile range of 35.3 percentage points.
- Among workers aged 40–49, the median percentage difference ranges from −0.7 percent to −1.7 percent and averages −1.1 percent (Panel C). The 25th percentile difference averages −16.7 percent and the 75th percentile difference averages 16.3 percent, an interquartile range of 33.0 percentage points.
- Among workers aged 50–59, the median percentage difference ranges from −1.0 percent to −2.0 percent and averages −1.5 percent (Panel D). The 25th percentile difference averages −17.7 percent and the 75th percentile difference averages 15.0 percent, an interquartile range of 32.7 percentage points.
- Among workers aged 60–69, the median percentage difference ranges from −0.4 percent to −1.5 percent and averages −0.9 percent (Panel E). The 25th percentile difference averages −16.7 percent and the 75th percentile difference averages 18.2 percent, an interquartile range of 34.9 percentage points.
By Sex
Chart 7 shows the percentage difference between CPS and DER earnings data for men and women. Among men aged 18–69, the median percentage difference between CPS and DER earnings data ranges from −0.5 percent to 0.0 percent and averages −0.2 percent (Panel A). Among women aged 18–69, the median percentage difference between CPS and DER earnings data ranges from −0.8 percent to 0.0 percent and averages −0.4 percent (Panel B).
Year | Percentile | ||
---|---|---|---|
75th | Median | 25th | |
Panel A: Men | |||
2005 | 21.4 | 0.0 | -12.1 |
2006 | 23.3 | 0.0 | -13.9 |
2010 | 22.6 | 0.0 | -14.5 |
2011 | 22.4 | -0.2 | -15.4 |
2015 | 26.3 | -0.2 | -18.7 |
2016 | 26.7 | -0.5 | -19.2 |
2019 | 26.1 | 0.0 | -17.2 |
2020 | 24.2 | -0.2 | -18.3 |
2021 | 26.0 | -0.5 | -18.8 |
Panel B: Women | |||
2005 | 17.0 | -0.4 | -12.7 |
2006 | 19.4 | -0.2 | -13.7 |
2010 | 17.3 | -0.8 | -14.3 |
2011 | 20.3 | -0.4 | -15.0 |
2015 | 25.8 | -0.3 | -17.0 |
2016 | 25.2 | -0.6 | -17.7 |
2019 | 28.1 | 0.0 | -16.2 |
2020 | 26.6 | -0.1 | -17.0 |
2021 | 25.4 | -0.7 | -18.0 |
SOURCE: Author's calculations based on CPS/ASEC and DER. |
The interquartile ranges of difference were similar for men and women. Among men, the 25th percentile difference between the CPS and DER averages −16.4 percent and the 75th percentile difference averages 24.3 percent, an interquartile range of 40.7 percentage points. Among women, the 25th percentile difference averages −15.8 percent and the 75th percentile difference averages 22.8 percent, an interquartile range of 38.6 percentage points.
By Racial or Ethnic Group
Chart 8 shows the percentage difference between CPS and DER earnings data for workers in four racial or ethnic groups: non-Hispanic White, non-Hispanic Black, Hispanic, and Asian.10
Year | Percentile | ||
---|---|---|---|
75th | Median | 25th | |
Panel A: Non-Hispanic White | |||
2005 | 17.2 | 0.0 | -10.5 |
2006 | 19.5 | 0.0 | -11.9 |
2010 | 18.6 | -0.1 | -12.5 |
2011 | 19.6 | -0.1 | -13.2 |
2015 | 22.8 | 0.0 | -15.6 |
2016 | 22.3 | -0.3 | -16.3 |
2019 | 24.2 | 0.1 | -14.2 |
2020 | 22.7 | 0.0 | -15.2 |
2021 | 23.6 | -0.2 | -15.7 |
Panel B: Non-Hispanic Black | |||
2005 | 29.9 | -1.1 | -20.2 |
2006 | 32.6 | 0.0 | -19.1 |
2010 | 28.4 | -0.4 | -18.6 |
2011 | 33.5 | -0.1 | -19.7 |
2015 | 41.3 | 0.4 | -22.3 |
2016 | 44.4 | -0.1 | -23.0 |
2019 | 41.8 | 1.2 | -22.1 |
2020 | 41.6 | 0.3 | -23.1 |
2021 | 39.6 | -0.7 | -23.8 |
Panel C: Hispanic | |||
2005 | 25.4 | -1.1 | -18.1 |
2006 | 23.6 | -1.4 | -20.6 |
2010 | 20.2 | -2.2 | -20.3 |
2011 | 22.8 | -1.5 | -22.0 |
2015 | 30.3 | -1.9 | -23.5 |
2016 | 31.6 | -1.7 | -23.8 |
2019 | 30.0 | -1.6 | -22.4 |
2020 | 29.6 | -1.2 | -22.1 |
2021 | 27.8 | -2.1 | -23.9 |
Panel D: Asian | |||
2005 | 19.8 | -1.9 | -21.1 |
2006 | 20.4 | -1.8 | -20.1 |
2010 | 20.3 | -1.6 | -20.5 |
2011 | 19.0 | -2.4 | -20.5 |
2015 | 22.7 | -2.7 | -23.7 |
2016 | 25.5 | -2.8 | -24.9 |
2019 | 23.5 | -1.8 | -20.6 |
2020 | 22.3 | -2.4 | -21.9 |
2021 | 16.9 | -3.9 | -25.3 |
SOURCE: Author's calculations based on CPS/ASEC and DER. |
- Among non-Hispanic White workers, the median percentage difference ranges from −0.3 percent to 0.1 percent and averages −0.1 percent. The 25th percentile difference averages −13.9 percent and the 75th percentile difference averages 21.2 percent, an interquartile range of 35.1 percentage points (Panel A).
- Among non-Hispanic Black workers, the median percentage difference ranges from −1.1 percent to 1.2 percent and averages −0.1 percent. The 25th percentile difference averages −21.3 percent and the 75th percentile difference averages 37.0 percent, an interquartile range of 58.3 percentage points (Panel B).
- Among Hispanic workers, the median percentage difference ranges from −1.1 percent to −2.2 percent and averages −1.6 percent. The 25th percentile difference averages −21.8 percent and the 75th percentile difference averages 26.8 percent, an interquartile range of 48.6 percentage points (Panel C).
- Among Asian workers, the median percentage difference ranges from −1.6 percent to −3.9 percent and averages –2.4 percent. The 25th percentile difference averages −22.1 percent and the 75th percentile difference averages 21.1 percent, an interquartile range of 43.2 percentage points (Panel D).
By Education Level
Chart 9 shows the difference between CPS and DER earnings data for workers at four levels of educational attainment: did not finish high school, received a high school diploma, attended college but did not earn a 4-year degree, and received a bachelor's degree or higher.
Year | Percentile | ||
---|---|---|---|
75th | Median | 25th | |
Panel A: Less than high school diploma | |||
2005 | 28.4 | -1.5 | -22.0 |
2006 | 30.0 | -1.9 | -24.4 |
2010 | 24.4 | -2.0 | -23.6 |
2011 | 24.1 | -2.3 | -24.7 |
2015 | 32.2 | -2.4 | -27.9 |
2016 | 32.5 | -3.1 | -29.7 |
2019 | 32.4 | -3.5 | -28.9 |
2020 | 34.0 | -2.7 | -27.7 |
2021 | 33.9 | -3.2 | -29.9 |
Panel B: High school diploma | |||
2005 | 19.6 | -0.3 | -13.7 |
2006 | 23.1 | -0.1 | -14.9 |
2010 | 22.4 | -0.3 | -15.8 |
2011 | 24.4 | -0.3 | -16.7 |
2015 | 31.2 | -0.4 | -20.6 |
2016 | 30.4 | -0.4 | -20.3 |
2019 | 31.1 | 0.0 | -18.8 |
2020 | 31.9 | 0.0 | -20.8 |
2021 | 34.0 | -0.2 | -21.1 |
Panel C: Some college | |||
2005 | 21.1 | 0.0 | -11.3 |
2006 | 22.0 | 0.0 | -12.4 |
2010 | 21.6 | -0.2 | -13.3 |
2011 | 24.2 | 0.0 | -14.0 |
2015 | 28.7 | 0.0 | -16.9 |
2016 | 29.9 | -0.2 | -17.8 |
2019 | 32.1 | 0.5 | -15.8 |
2020 | 28.7 | 0.1 | -16.3 |
2021 | 29.4 | -0.3 | -17.8 |
Panel D: Bachelor's degree or higher | |||
2005 | 14.6 | -0.1 | -10.5 |
2006 | 17.1 | 0.0 | -11.7 |
2010 | 15.7 | -0.3 | -12.6 |
2011 | 15.8 | -0.5 | -13.5 |
2015 | 20.3 | -0.1 | -15.3 |
2016 | 19.4 | -0.7 | -16.2 |
2019 | 21.0 | 0.0 | -14.6 |
2020 | 19.3 | -0.4 | -16.1 |
2021 | 17.9 | -0.9 | -16.3 |
SOURCE: Author's calculations based on CPS/ASEC and DER. |
- Among workers without a high school diploma, the median percentage difference ranges from −1.5 percent to −3.5 percent and averages −2.5 percent. The 25th percentile difference averages −26.5 percent and the 75th percentile difference averages 30.2 percent, an interquartile range of 56.7 percentage points (Panel A).
- Among workers with a high school diploma, the median percentage difference ranges from −0.4 percent to 0.0 percent and averages −0.2 percent. The 25th percentile difference averages −18.1 percent and the 75th percentile difference averages 27.6 percent, an interquartile range of 45.7 percentage points (Panel B).
- Among workers who attended college but did not earn a bachelor's degree, the median percentage difference ranges from −0.3 percent to 0.5 percent and averages 0.0 percent. The 25th percentile difference averages −15.1 percent and the 75th percentile difference averages 26.4 percent, an interquartile range of 41.5 percentage points (Panel C).
- Among college graduates, the median percentage difference ranges from −0.9 percent to 0.0 percent and averages –0.3 percent. The 25th percentile difference averages −14.1 percent and the 75th percentile difference averages 17.9 percent, a range of 32.0 percentage points (Panel D).
By Type of Hours Worked
Chart 10 shows the percentage difference between CPS and DER earnings data for workers who were employed full time and year-round and those who worked part-year or part time:
Year | Percentile | ||
---|---|---|---|
75th | Median | 25th | |
Panel A: Year-round and full time | |||
2005 | 15.1 | -0.1 | -10.4 |
2006 | 16.6 | 0.0 | -11.8 |
2010 | 15.9 | -0.3 | -11.8 |
2011 | 16.6 | -0.2 | -12.6 |
2015 | 21.3 | -0.2 | -15.2 |
2016 | 21.8 | -0.5 | -15.5 |
2019 | 22.8 | 0.0 | -14.6 |
2020 | 21.2 | -0.1 | -15.5 |
2021 | 18.5 | -0.8 | -15.7 |
Panel B: Part-year or part time | |||
2005 | 32.5 | -0.1 | -19.7 |
2006 | 37.1 | 0.0 | -21.5 |
2010 | 31.9 | -0.6 | -22.5 |
2011 | 35.7 | -0.5 | -23.8 |
2015 | 42.3 | -0.4 | -27.3 |
2016 | 41.1 | -0.8 | -29.2 |
2019 | 45.0 | 0.0 | -25.7 |
2020 | 42.7 | -0.2 | -26.1 |
2021 | 48.2 | 0.0 | -28.4 |
SOURCE: Author's calculations based on CPS/ASEC and DER. |
- Among workers employed full time year-round, the median percentage difference ranges from −0.8 percent to 0.0 percent and averages −0.2 percent. The 25th percentile difference averages −13.7 percent and the 75th percentile difference averages 18.9 percent, an interquartile range of 32.6 percentage points (Panel A).
- Among workers employed part-year or part time, the median percentage difference ranges from −0.8 percent to 0.0 percent and averages −0.3 percent. The 25th percentile difference averages −24.9 percent and the 75th percentile difference averages 39.6 percent, an interquartile range of 64.5 percentage points (Panel B).
Multivariate Analysis
Charts 4 through 10 illustrate the differences between CPS and DER earnings data by imputation status, DER earnings quartile, age, sex, racial or ethnic group, education level, and type of hours worked. Observing these differences individually is instructive, but it does not tell us how each variable is associated with the difference between CPS and DER earnings data when the other variables are also considered. Regression models can test multiple variables simultaneously. Table 1 shows the results of several OLS regressions that test the relationship between CPS and DER earnings data for each of the variables listed above and an additional variable that indicates if a worker paid all or part of an employer-provided health insurance plan premium. Employees pay premiums for employer-provided health insurance with pre-tax earnings, and this amount is not included in the DER. Consequently, CPS respondents' earnings amounts exceed the amounts in the DER if they pay for employment-based health insurance with pre-tax earnings.
Characteristic | CPS year | |||||
---|---|---|---|---|---|---|
Average | 2006 | 2011 | 2016 | 2019 | 2021 | |
Number of respondents | 70,153 | 80,937 | 76,573 | 67,984 | 65,983 | 59,289 |
Dependent mean | 0.0604 | 0.0710 | 0.0569 | 0.0493 | 0.0690 | 0.0560 |
Adjusted R-squared | . . . | 0.173 | 0.174 | 0.194 | 0.185 | 0.195 |
Weighted sample means | ||||||
CPS earnings imputed | 0.1913 | 0.1750 | 0.1792 | 0.2177 | 0.1983 | 0.1863 |
Age | 41.1494 | 40.2166 | 41.1939 | 41.3989 | 41.3975 | 41.5400 |
Men | 0.5096 | 0.5112 | 0.5073 | 0.5106 | 0.5092 | 0.5098 |
Race or ethnicity | ||||||
Non-Hispanic White | 0.6770 | 0.7231 | 0.7083 | 0.6647 | 0.6502 | 0.6386 |
Non-Hispanic Black | 0.1163 | 0.1120 | 0.1096 | 0.1172 | 0.1209 | 0.1219 |
Hispanic | 0.1340 | 0.1073 | 0.1181 | 0.1423 | 0.1483 | 0.1538 |
Asian | 0.0492 | 0.0384 | 0.0435 | 0.0512 | 0.0552 | 0.0575 |
Other or mixed a | 0.0236 | 0.0193 | 0.0205 | 0.0246 | 0.0254 | 0.0282 |
Education level | ||||||
Less than high school diploma | 0.0634 | 0.0863 | 0.0666 | 0.0623 | 0.0542 | 0.0478 |
High school diploma | 0.2667 | 0.2968 | 0.2758 | 0.2591 | 0.2538 | 0.2479 |
Some college | 0.3054 | 0.3152 | 0.3171 | 0.3117 | 0.2969 | 0.2862 |
Bachelor's degree or higher | 0.3645 | 0.3017 | 0.3405 | 0.3669 | 0.3951 | 0.4182 |
Employed year-round and full time | 0.7048 | 0.7073 | 0.6827 | 0.7134 | 0.7404 | 0.6802 |
Pre-tax health insurance | 0.4652 | 0.4639 | 0.4600 | 0.4645 | 0.4689 | 0.4688 |
Parameter estimates | ||||||
Intercept | -1.5336 | -1.5386*** | -1.5291*** | -1.5700*** | -1.5477*** | -1.4829*** |
CPS earnings imputed | 0.0078 | 0.0065 | -0.0120* | -0.0166** | 0.0347*** | 0.0264*** |
DER earnings quartile | ||||||
Third | 0.2582 | 0.2217*** | 0.2543*** | 0.2905*** | 0.2545*** | 0.2698*** |
Second | 0.4981 | 0.4439*** | 0.5022*** | 0.5514*** | 0.4795*** | 0.5135*** |
First | 1.2237 | 1.1475*** | 1.1791*** | 1.3038*** | 1.2187*** | 1.2695*** |
Age | 0.0285 | 0.0334*** | 0.0297*** | 0.0248*** | 0.0291*** | 0.0255*** |
Age squared | . . . | -0.0004*** | -0.0003*** | -0.0003*** | -0.0003*** | -0.0003*** |
Men | 0.1188 | 0.1306*** | 0.1161*** | 0.1179*** | 0.1134*** | 0.1163*** |
Race or ethnicity | ||||||
Non-Hispanic Black | -0.0516 | -0.0411*** | -0.0445*** | -0.0484*** | -0.0629*** | -0.0610*** |
Hispanic | -0.0552 | -0.0542*** | -0.0573*** | -0.0356*** | -0.0638*** | -0.0651*** |
Asian | -0.0350 | -0.0226* | 0.0296** | -0.0313* | -0.0223** | -0.0692*** |
Other or mixed a | -0.0301 | -0.0354** | -0.0053 | 0.0029 | -0.0587*** | -0.0543*** |
Education level | ||||||
Less than high school diploma | -0.1233 | -0.1049*** | -0.1076*** | -0.1526*** | -0.1340*** | -0.1173*** |
Some college | 0.0635 | 0.0667*** | 0.0669*** | 0.0563*** | 0.0608*** | 0.0667*** |
Bachelor's degree or higher | 0.2192 | 0.2013*** | 0.2022*** | 0.2426*** | 0.2205*** | 0.2297*** |
Employed year-round and full time | 0.4658 | 0.4343*** | 0.4679*** | 0.5047*** | 0.4634*** | 0.4587*** |
Pre-tax health insurance | 0.0606 | 0.0608*** | 0.0739*** | 0.0887*** | 0.0423*** | 0.0371*** |
SOURCE: Author's calculations based on CPS/ASEC and DER. | ||||||
NOTES: Dependent variable: log(CPS earnings) − log(DER earnings)
. . . = not applicable.
* = statistically significant at the 0.10 level; ** = statistically significant at the 0.05 level; *** = statistically significant at the 0.01 level.
|
||||||
a. Consists primarily of respondents identifying as multiracial or American Indian/Alaska Native. |
Following the research of Kim and Tamborini (2012, 2014), the dependent variable in Table 1 is the difference between each person's earnings amounts in the CPS and the DER. CPS and DER earnings amounts have been logarithmically transformed. If we assume that the DER represents actual earnings, then log(CPS earnings) − log(DER earnings) estimates the measurement error in the CPS.
The OLS regressions use data from the 2006, 2011, 2016, 2019, and 2021 CPS/ASEC surveys linked to the DER. Each year's sample consists of workers aged 18–69 with earnings data in both the CPS and the DER, regardless of whether the CPS earnings were reported or imputed. The unweighted sample sizes range from 59,289 to 80,937 respondents with an average of 70,153 respondents per year. The mean of the dependent variable, log(CPS earnings) − log(DER earnings), for those 5 years ranges from 0.0493 to 0.0710 with an average of 0.0604. Thus, annual earnings amounts in the CPS exceed those in the DER by an average of 6 percent for a typical respondent in the 5 years tested in the OLS model. In the regression results, a positive coefficient indicates association with a larger percentage difference between the CPS and DER earnings data than the mean difference while a negative coefficient indicates a smaller percentage difference than the mean.
The difference between CPS and DER earnings data varies by whether CPS earnings were self-reported or imputed and by the quartile rank of workers' DER earnings. In the regression model, a dummy variable indicates whether the earnings values were imputed. If so, the variable has a value of 1; if earnings were self-reported, its value is 0. Three variables indicate whether a worker's DER earnings ranked in the third, second, or first quartile in the calendar year preceding the survey. The fourth (highest) earnings quartile is the omitted category.
In the 5 years tested in the model, the weighted proportion of workers aged 18–69 with earnings data in both the CPS and DER for whom the Census Bureau imputed wage and salary income ranged from 17.5 percent to 21.8 percent. The coefficient for the variable indicating that the Census Bureau imputed earnings is negative for the March 2011 and March 2016 CPS and positive for the March 2019 and March 2021 CPS. (It was positive but not statistically significant for the March 2006 CPS.) One reason the sign changed may be that the Census Bureau changed its data processing and imputation procedures during this period. CPS technical documentation notes that the “imputation system was updated to make use of income ranges provided by some non-respondents as well as to increase the number of characteristics used in the imputation models” (Census Bureau 2019). Although the imputation variable was statistically significant, the coefficient was not large. In the regression on the March 2021 CPS, the mean of log(CPS earnings) − log(DER earnings) was 0.0560 and the coefficient for the imputation variable was 0.0264. All else being equal, log(CPS earnings) − log(DER earnings) was 2.64 percent greater when earnings were imputed rather than reported.
Chart 5 shows the median differences between CPS and DER earnings data for workers in each DER earnings quartile. For the 9 years in the study period, the median differences average −6.1 percent in the fourth quartile, −1.4 percent in the third quartile, 1.3 percent in the second quartile, and 24.0 percent in the first quartile.11 In general, workers in the fourth quartile of DER earnings tend to underreport earnings in the CPS and those in the first quartile tend to overreport their earnings. In the OLS regression, the fourth quartile is omitted as the reference group. The coefficients for the three lower quartiles are positive, are statistically significant, and increase as quartile rank falls. Other things being equal, log(CPS earnings) − log(DER earnings) is negatively correlated with earnings as measured by DER earnings quartile rank.
Results for the other independent variables generally reflect the relationships seen in the charts. The coefficients for men and for year-round, full-time workers are positive and statistically significant. Age squared is negative and significant. Relative to non-Hispanic White workers, the coefficients for non-Hispanic Black, Hispanic, and Asian workers are negative and significant. Relative to high school graduates, the coefficient for workers without a high school diploma is negative and significant and the coefficients for workers with some college and college graduates are positive and significant.
The variable indicating that a worker pays all or part of the premium for employer-provided health insurance is statistically significant in each survey year examined and has an average coefficient of 0.06, indicating that it is associated with a slightly higher-than-average percentage difference between CPS and DER data on wages, all else being equal. In a regression run separately on workers in each quartile of DER earnings (not shown), this variable was statistically significant only for workers in the third and second quartiles. Most workers in the lowest quartile of earnings do not have employer-provided health insurance unless it is through a family member's employer. For workers in the highest quartile, health insurance premiums often represent a smaller percentage of earnings than they do for workers in the middle two quartiles.
Given that the coefficient of determination (adjusted R-squared) across the 5 CPS years ranges from 0.173 to 0.195, the regression model explains less than 20 percent of the variability observed in the percentage difference between CPS and DER earnings data (Table 1). Other factors not included in the model also affect the difference between earnings reported in the CPS and amounts recorded in the DER. Whether other worker characteristics that we can observe in the CPS or in other data sets can explain more of the difference between CPS and DER earnings data is a possible topic for further research.
Summary and Conclusion
This note has examined earnings data in the CPS and earnings amounts recorded for the same workers in the DER administrative data file. The results generally confirm the findings of earlier research that compared earnings data from household surveys with those in SSA's records. If we assume that SSA's records represent actual earnings, the results suggest that the misreporting of earnings in the CPS is not random. Misreporting in the CPS varies by imputation status, DER earnings quartile, age, sex, racial or ethnic group, education level, and type of hours worked.
Both underreporting and overreporting of earnings in the CPS increased over the period studied. From 2005 through 2021, the interquartile range of differences in earnings data between the two sources, representing the middle 50 percent of observations, grew wider. The 25th percentile difference between the CPS and DER increased from −12.4 percent to −18.3 percent and the 75th percentile difference increased from 19.2 percent to 25.7 percent. Thus, the interquartile range increased from 31.6 percentage points to 44.0 percentage points (Chart 4, Panel A).
The difference between CPS and DER earnings data varied substantially by DER earnings quartile. In general, workers in the highest quartile of DER earnings underreported earnings on the CPS, and those in the lowest quartile of DER earnings overreported earnings. Among workers in the fourth quartile of DER earnings, the median percentage difference between the CPS and DER averaged −6.1 percent (Chart 5, Panel A). Among workers in the first quartile of DER earnings, the median percentage difference between the CPS and DER averaged 24.0 percent (Chart 5, Panel D). Because higher-earning workers underreported earnings and lower-earning workers overreported earnings, researchers should be cautious about using CPS/ASEC public-use files to study the distribution of earnings and earnings inequality. Ideally, such research should use CPS files linked to SSA earnings records.
Appendix
Characteristic | CPS year | ||||||||
---|---|---|---|---|---|---|---|---|---|
2005 | 2006 | 2010 | 2011 | 2015 | 2016 | 2019 | 2020 | 2021 | |
Wage data in— | |||||||||
Both DER and CPS | 58,604 | 80,937 | 78,356 | 76,573 | 73,066 | 67,984 | 65,983 | 57,930 | 59,289 |
DER only | 4,595 | 6,913 | 7,195 | 7,545 | 7,481 | 6,867 | 6,577 | 5,710 | 6,265 |
CPS only | 4,259 | 5,732 | 6,011 | 5,867 | 5,949 | 5,478 | 5,248 | 4,595 | 4,567 |
CPS wages imputed | 8,475 | 13,165 | 13,120 | 13,242 | 16,139 | 14,787 | 13,029 | 11,634 | 11,228 |
CPS wages imputed (%) | 14.5 | 16.3 | 16.7 | 17.3 | 22.1 | 21.8 | 19.7 | 20.1 | 18.9 |
DER earnings quartile | |||||||||
Fourth | 14,451 | 19,995 | 19,227 | 19,307 | 18,304 | 16,799 | 16,279 | 14,443 | 14,666 |
Third | 14,886 | 20,403 | 19,705 | 19,436 | 18,553 | 17,360 | 16,560 | 14,820 | 15,116 |
Second | 14,743 | 20,340 | 19,883 | 19,100 | 18,412 | 17,145 | 16,771 | 14,540 | 14,906 |
First | 14,524 | 20,199 | 19,541 | 18,730 | 17,797 | 16,680 | 16,373 | 14,127 | 14,601 |
Age | |||||||||
18–29 | 14,081 | 19,048 | 18,033 | 17,814 | 16,401 | 15,346 | 14,670 | 12,008 | 12,620 |
30–39 | 14,930 | 19,454 | 18,325 | 17,814 | 17,330 | 16,272 | 16,050 | 14,150 | 14,726 |
40–49 | 16,554 | 22,617 | 20,239 | 19,641 | 17,087 | 15,735 | 15,050 | 12,987 | 13,439 |
50–59 | 9,940 | 15,116 | 15,878 | 15,729 | 15,265 | 13,973 | 12,959 | 11,940 | 11,778 |
60–69 | 3,099 | 4,702 | 5,881 | 6,283 | 6,983 | 6,658 | 7,284 | 6,845 | 6,726 |
Sex | |||||||||
Men | 29,460 | 40,735 | 39,183 | 38,202 | 36,712 | 34,288 | 33,223 | 29,244 | 30,004 |
Women | 29,144 | 40,202 | 39,173 | 38,371 | 36,354 | 33,696 | 32,760 | 28,686 | 29,285 |
Race or ethnicity | |||||||||
Non-Hispanic White | 41,910 | 56,931 | 53,650 | 51,969 | 47,286 | 43,322 | 41,917 | 36,974 | 37,061 |
Non-Hispanic Black | 5,699 | 8,152 | 8,193 | 7,940 | 8,120 | 7,798 | 7,200 | 6,078 | 6,566 |
Hispanic | 6,619 | 9,996 | 10,418 | 10,341 | 11,232 | 11,022 | 10,953 | 9,380 | 9,935 |
Asian | 2,287 | 3,233 | 3,737 | 3,960 | 4,084 | 3,696 | 3,816 | 3,640 | 3,670 |
Other or mixed a | 2,089 | 2,625 | 2,358 | 2,363 | 2,344 | 2,146 | 2,097 | 1,858 | 2,057 |
Education level | |||||||||
Less than high school diploma | 5,559 | 7,354 | 5,903 | 5,566 | 5,002 | 4,638 | 3,943 | 3,147 | 3,087 |
High school diploma | 17,545 | 24,032 | 22,068 | 21,017 | 19,333 | 17,697 | 16,891 | 14,320 | 14,994 |
Some college | 18,631 | 25,490 | 24,758 | 24,034 | 22,700 | 21,142 | 19,709 | 17,063 | 16,967 |
Bachelor's degree or higher | 16,869 | 24,061 | 25,627 | 25,956 | 26,031 | 24,507 | 25,440 | 23,400 | 24,241 |
Employment | |||||||||
Worked year-round and full time | 40,125 | 57,048 | 53,131 | 52,348 | 52,308 | 48,856 | 49,050 | 43,209 | 40,694 |
Worked part-year and part time | 18,479 | 23,889 | 25,225 | 24,225 | 20,758 | 19,128 | 16,933 | 14,721 | 18,595 |
SOURCE: Author's calculations based on CPS/ASEC and DER. | |||||||||
a. Consists primarily of respondents identifying as multiracial or American Indian/Alaska Native. |
Notes
1 Survey respondents can opt out of the data linkage.
2 These restricted-use linked files are available only for research and analysis by individuals who have completed required training on the federal laws that protect the confidentiality of data provided by Census Bureau survey participants. Access to the linked files is available through secure computing facilities for research projects that have been approved by the Census Bureau.
3 Appendix Table A-1 shows the annual numbers of observations in the linked data sets with earnings data in the CPS but not in the DER and the number with earnings data in the DER but not in the CPS.
4 CPS respondents are asked the amount they earned before any deductions. Employees pay premiums for employer-provided health insurance with pre-tax earnings, and the premium is not included in the DER data. CPS earnings could be more than the amount in the DER for respondents with employer-provided health insurance.
5 Procedures the Census Bureau uses to prevent disclosure of confidential information may result in differences between the public-use CPS/ASEC and the DER. These procedures are unlikely to have affected the results of this analysis.
6 Not all W-2 forms are posted to the DER. Some are posted to the Earnings Suspense File because they fail to meet SSA match criteria. Some employers or payroll providers fail to submit W-2 forms timely, and some forms contain errors that require correction; however, a large majority of W-2 forms are submitted on time and with accurate information.
7 DER linkage was not yet available for the 2022 CPS public-use file.
8 The percentage difference between CPS earnings and DER earnings is the same whether CPS and DER earnings are both in current or constant dollars.
9 The Census Bureau assigns noninterview households a sample weight of zero and adjusts the sample weights of interview households accordingly.
10 In the race and ethnicity categories used by the Census Bureau, a person of Hispanic ethnicity may be of any race.
11 For the 5 years used in the OLS model, the median differences average −6.4 percent in the fourth quartile, −1.5 percent in the third quartile, 1.5 percent in the second quartile, and 25.9 percent in the first quartile.
References
Abowd, John M., and Martha H. Stinson. 2013. “Estimating Measurement Error in Annual Job Earnings: A Comparison of Survey and Administrative Data.” Review of Economics and Statistics 95(5): 1451–1467.
Bollinger, Christopher R. 1998. “Measurement Error in the Current Population Survey: A Nonparametric Look.” Journal of Labor Economics 16(3): 576–594.
Bollinger, Christopher R., and Barry T. Hirsch. 2006. “Match Bias from Earnings Imputation in the Current Population Survey: The Case of Imperfect Matching.” Journal of Labor Economics 24(3): 483–519.
Bollinger, Christopher R., Barry T. Hirsch, Charles M. Hokayem, and James P. Ziliak. 2019. “Trouble in the Tails? What We Know about Earnings Nonresponse 30 Years after Lillard, Smith, and Welch.” Journal of Political Economy 127(5): 2143–2185.
Bound, John, and Alan B. Krueger. 1991. “The Extent of Measurement Error in Longitudinal Earnings Data: Do Two Wrongs Make a Right?” Journal of Labor Economics 9(1): 1–24.
Census Bureau. 2019. Current Population Survey: March 2019 Annual Social and Economic Supplement (ASEC). Technical Documentation. https://www2.census.gov/programs-surveys/cps/techdocs/cpsmar19.pdf.
———. 2022. Current Population Survey: March 2022 Annual Social and Economic Supplement (ASEC). Technical Documentation. https://www2.census.gov/programs-surveys/cps/techdocs/cpsmar22.pdf.
Cristia, Julian, and Jonathan A. Schwabish. 2007. “Measurement Error in the SIPP: Evidence from Matched Administrative Records.” Working Paper No. 2007-03. Washington, DC: Congressional Budget Office. https://www.cbo.gov/sites/default/files/110th-congress-2007-2008/workingpaper/2007-03_0.pdf.
Davies, Paul S., and T. Lynn Fisher. 2009. “Measurement Issues Associated with Using Survey Data Matched with Administrative Data from the Social Security Administration.” Social Security Bulletin 69(2): 1–12. https://www.ssa.gov/policy/docs/ssb/v69n2/v69n2p1.html.
Genadek, Katie R., Charles Hokayem, and Philip Pendergast. 2021. “The Summary Earnings Record and Detailed Earnings Record Extracts.” Working Paper No. 2021-05. Washington DC: Census Bureau. https://www.census.gov/library/working-papers/2021/econ/earnings-record-extracts.html.
Gottschalk, Peter, and Minh Huynh. 2005. “Validation Study of Earnings Data in the SIPP—Do Older Workers Have Larger Measurement Error?” Working Paper No. 2005-07. Chestnut Hill, MA: Center for Retirement Research at Boston College. https://crr.bc.edu/wp-content/uploads/2005/05/wp_2005-071.pdf.
Kim, ChangHwan, and Christopher R. Tamborini. 2012. “Do Survey Data Estimate Earnings Inequality Correctly? Measurement Errors Among Black and White Male Workers.” Social Forces 90(4): 1157–1181.
———. 2014. “Response Error in Earnings: An Analysis of the Survey of Income and Program Participation Matched with Administrative Data.” Sociological Methods & Research 43(1): 39–72.
Meyer, Bruce D., Wallace K. C. Mok, and James X. Sullivan. 2015. “Household Surveys in Crisis.” Journal of Economic Perspectives 29(4): 199–226. https://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.29.4.199.
Olsen, Anya, and Russell Hudson. 2009. “Social Security Administration's Master Earnings File: Background Information.” Social Security Bulletin 69(3): 29–46. https://www.ssa.gov/policy/docs/ssb/v69n3/v69n3p29.html.
Pedace, Roberto, and Nancy Bates. 2000. “Using Administrative Records to Assess Earnings Reporting Error in the Survey of Income and Program Participation.” Journal of Economic and Social Measurement 26: 173–192.
Roemer, Marc. 2002. “Using Administrative Earnings Records to Assess Wage Data Quality in the March Current Population Survey and the Survey of Income and Program Participation.” Working Paper. Washington, DC: Census Bureau. https://www.census.gov/content/dam/Census/library/working-papers/2002/demo/asa2002.pdf.
Rothbaum, Jonathan, and Edward Berchick. 2019. “Redesign of the Current Population Survey Annual Social and Economic Supplement.” Presented at the Census Scientific Advisory Committee Spring 2019 Meeting, Washington, DC. https://www2.census.gov/cac/sac/meetings/2019-03/current-population-survey-annual-social-economic-supplement.pdf.