Evaluating a New Process for Assigning Geographic Residence Codes and Identifying Demographic Information for Workers in a Given Tax Year

by
Social Security Bulletin, Vol. 84 No. 1, 2024

The Social Security Administration's Office of Research, Evaluation, and Statistics (ORES) produces annual statistical publications that estimate the employment and earnings of U.S. workers. This article evaluates a new methodology developed by ORES to assign a state and county of residence code and identify the date of birth and sex of nearly all workers, rather than a 1-percent sample of workers, for whom tax records provide earnings data for a given year. The evaluation compares the estimates generated by the current methodology with those of the new methodology using microdata for tax year 2017. The results align with preevaluation expectations and highlight the importance of using a much larger sample of workers, which the new process enables, to generate the annual employment and earnings estimates.


Michael Compson is a senior economist with the Office of Statistical Analysis and Support, Office of Research, Evaluation, and Statistics, Office of Retirement and Disability Policy, Social Security Administration.

Acknowledgments: I would like to thank Greg Diez, for procuring access to the geographic data that made the MGD process possible; Angela Harper and Hansa Patel, for assistance in the early stages of developing the MGD process; Pat Purcell, Richard Chard, and Glenn Springstead, for comments on the draft of this article; and Ben Pitkin and Jessie Dalrymple, for editorial assistance. I dedicate this article to my wife and daughter for their steadfast love, support, and encouragement throughout the years.

The findings and conclusions presented in the Bulletin are those of the author and do not necessarily represent the views of the Social Security Administration.

The original version of this article contained errors in the first paragraph of the section titled “The MGD Process.” The numbers of workers contained in the 1-percent Continuous Work History Sample and in the MGD data file for tax year 2017 were incorrect. The correct figures now appear in the text. [Posted: February 26, 2024.]

Introduction

Selected Abbreviations
ASA Assigned State_SE_Active (merged data file)
CWHS Continuous Work History Sample
IRS Internal Revenue Service
MEF Master Earnings File
MGD Master Geographic and Demographic
Numident Numerical Identification System
OASDI Old-Age, Survivors, and Disability Insurance
OEIS Office of Enterprise Information Systems
ORES Office of Research, Evaluation, and Statistics
SCC state and county code
SSA Social Security Administration
SSN Social Security number

In 2021, the Office of Research, Evaluation and Statistics (ORES) in the Social Security Administration (SSA) completed development of a new methodology for assigning geographic residence codes and identifying demographic information for nearly all workers with earnings in a given year. Compson (2022) describes the new methodology in detail, and this article evaluates it by comparing it to the methodology SSA currently uses to generate the comprehensive earnings and employment estimates it publishes in its annual statistical publications. ORES has applied the new methodology to tax information for tax years 2014 through 2020,1 producing a standalone Master Geographic and Demographic (MGD) data file for each year. For that reason, the terms “new methodology” and “MGD process” are used interchangeably throughout this article. ORES considers the development and evaluation of the new methodology to be the first and second steps, respectively, in a multistep process that will result in a dramatic expansion of the sample size used to generate the estimates for its statistical publications.

The evaluation of the MGD methodology consists of two distinct assessments. The first, a procedural assessment, uses internal audit reports to assess the completeness and accuracy of the new methodology in processing tax records for a 7-year span. It involves looking at the number of records processed, the number of unique Social Security numbers (SSNs) represented, the sources of the tax information, the methodology used to assign state and county codes (SCCs), and the results of various imputations used in populating missing data fields. The second assessment compares the MGD-assigned SCCs and demographic identifiers with those assigned under the current methodology. Specifically, this assessment involves comparing the estimated numbers of workers with Social Security taxable earnings and the amounts of those earnings by state, sex, age, and type of earnings (wage and salary, self-employment). This assessment also compares the two methodologies' estimated numbers of workers by county.

In general, the procedural assessment finds that the MGD process is consistent and thorough across the 7 years of tax data analyzed, although it raises minor questions, noted later, that ORES is currently investigating. The comparative assessment finds match rates of nearly 99 percent for workers' state code assignments and sex and age identifications. For county code assignments, the match rate is lower, at 94.5 percent. This result was expected because the MGD process uses more detailed geographic identifiers to assign county codes than were available when the current process was developed.2 Thus, in this circumstance, the lower match rate might reflect greater accuracy in the new methodology.

This introduction is followed by five sections. The first section highlights the key points of the MGD process for identifying demographic information and assigning geographic codes for the population of workers in a given tax year. The second section details the procedural assessment of the MGD process for the 7 years of tax data currently available. The third section discusses the methodology for conducting the microdata comparison, presents the preevaluation expectations of the comparisons, and assesses the results for worker counts by type of earnings, sex, age, and state. The fourth section discusses the comparison of the county-level estimates, and the fifth section concludes.

The MGD Process

The MGD process was developed to address the current methodology's limitations in assigning accurate geographic codes to workers' tax records and its reliance on sample data too small in size for the required scope of the work. The current methodology uses administrative microdata (that is, person-level information) from SSA's 1-percent Continuous Work History Sample (CWHS). For tax year 2017, the sample consists of fewer than 1.7 million workers. By contrast, the new methodology culminates in the creation of a standalone MGD data file containing geographic and demographic information for nearly all workers in a given tax year (179 million in 2017). The process generates 32 audit reports for each year. The reports allow ORES to track and evaluate the results of each step in the process of assigning a single SCC to each worker in a given tax year.

Each year, SSA and the Internal Revenue Service (IRS) share information from IRS tax forms for those agencies' respective programmatic needs. To that end, SSA's Office of Enterprise Information Systems (OEIS) receives hundreds of millions of IRS Forms W-2 and W-2c (filed by employers) and millions of Form 1040 Schedule SE (filed by the self-employed) from the IRS.3 As part of its elaborate annual wage reporting process, OEIS extracts all the information SSA needs to administer its programs. In a separate and distinct process undertaken for ORES and SSA's Office of the Chief Actuary, OEIS extracts the full address information reported on these forms and uses Pitney-Bowes' Finalist software to assign an SCC for each record.4 The resulting data files are the basis of the MGD process for assigning geographic residence codes for nearly all workers in a tax year.

The MGD process begins with job-level data—that is, records that contain both the worker's SSN and the employer identification number—and converts them to worker-level data, assigning a single SCC to each SSN as it does so. For this article, “number of records processed in a given tax year” is synonymous with the number of jobs in a given tax year, and “number of SSNs” refers to the number of workers in a given tax year. The number of records for each of the data sources (that is, the tax forms) is always greater than the number of SSNs because many individuals hold multiple jobs during the year.

Audits track the number of records or SSNs throughout each step of the MGD process. For example, the audit reports provide the number of records processed for each type of data source (Forms W-2, W-2c, and 1040 Schedule SE) and the number of unique SSNs associated with each data source, on an annual basis and over time.

In general, the audit reports contain the following information:

Once the underlying OEIS/Finalist data are extracted, the MGD process sorts workers into one of the following mutually exclusive data-source categories:

The audit reports detail the number of records and the number of unique SSNs for each of these data-source categories and compute the total number of unique SSNs in a given tax year.

The next step in the MGD process uses an administrative data master file called the Numerical Identification System (Numident) to identify valid and invalid SSNs and to supply information on each worker's sex and date of birth.5 Any SSN in both the MGD file and the Numident file is deemed to be valid. SSNs in the MGD file but not in the Numident file are deemed to be invalid. ORES can record sex and date of birth only for workers whose SSNs appear in the Numident file. For records with invalid SSNs, ORES enters “Missing” in the sex and date of birth data fields. The MGD process creates a data file containing demographic information that is set aside while the process of assigning a single SCC to each worker in a given tax year, described below, proceeds.

First, the MGD process groups workers by the number of SCCs (one, multiple, none) that the OEIS/Finalist process assigned to them. The records for workers with a single SCC assigned by the OEIS/Finalist step are referred to as the “gold-standard file” and for them, the process of assigning an SCC is complete. For workers who were not assigned an SCC, ORES uses the frequency distribution of SCCs in gold-standard file records that share the worker's ZIP Code to try to impute an SCC.6 ORES employs a multistep process (briefly summarized later and detailed in Compson 2022) to assign the “best” SCC to the records of workers that have multiple SCCs after the OEIS/Finalist step. The audit reports show which method ORES used to assign a single SCC for each worker.

Once a single SCC has been assigned to each worker, the resulting file is rejoined with the file containing demographic information to create the standalone MGD file for that tax year. The merged file contains the following data fields for each worker:

Researchers and policy analysts using the MGD file must consider several important points. First, as noted earlier, the process that OEIS uses to assign SCCs for each job is based on the full address reported on the tax forms and is separate and distinct from the annual wage reporting process undertaken as part of program operations. As a result, the data used to create the MGD file have not been subjected to the cleaning and evaluation techniques that the tax data must undergo before they can be posted to SSA's Master Earnings File (MEF) for programmatic purposes. One result of using the raw tax data is that the MGD file contains invalid or improperly assigned SSNs. The latter may occur if the employer incorrectly enters an SSN when filling out the worker's Form W-2 or W-2c, or a self-employed individual enters the wrong SSN when filing Form 1040 Schedule SE. There is currently no way for ORES to correct such errors in its files.

Second, because the MGD file does not contain any information on the type or amount of earnings reported on the tax forms,7 it cannot, by itself, be used to estimate earnings covered or taxable under Social Security and Medicare. ORES is currently developing a new process to generate estimates using a much a larger sample of workers extracted from the MEF or even, possibly, the entire population of workers in a tax year. The MEF and the MGD files together would contain the data necessary to generate the annual earnings estimates.

Third, the MGD file for a given tax year y contains data only for tax records that were processed in calendar year y + 1. For example, the first MGD file contains data for tax year 2017, but only for forms that were processed in 2018. In turn, for processing year 2018, 2017 is the primary tax year. The 2017 MGD file excludes any data for tax year 2017 that were processed in a calendar year other than 2018, and it excludes data for tax years other than 2017 that were processed in 2018. In developing the MGD methodology, ORES decided to focus on the data for a single tax year that were processed in the single calendar year that followed. ORES chose this method despite knowing that some tax year 2017 earnings were processed in 2017 or would not be processed until after 2018. Whether it was possible to include these data in the 2017 MGD file, and if so, how, was yet to be determined.8

To illustrate, the MGD file for 2017 contains records for 178,863,694 workers whose tax forms were processed in 2018. However, an additional 2,618,600 workers had earnings in tax year 2017, but their forms were processed in other years, as follows: 233,222 in 2017; 1,737,114 in 2019; 404,899 in 2020; and 243,365 in 2021. Thus, the 2017 MGD file omits up to 1.44 percent of the population of workers with reported earnings in 2017.

This circumstance raises several critical questions. First, are any of these individuals already in the 2017 MGD file? (This can occur for multiple job holders or those with earnings reported on both a Form W-2 and an amended Form W-2c, or because of filer error in entering the tax year.) Identifying these instances can reduce the number of individuals whose records need to be incorporated into the MGD file. Second, to add the records for workers whose tax forms were not processed in 2018 into the 2017 MGD file, how many processing years should be included, and how reliable will those data be? Experience shows that some data for a given tax year may not be reported for several years. However, over time, the number of workers being added trends to zero so the potential effect on the MGD file becomes inconsequential.

Another concern is the reliability of the address information reported on the tax forms processed in later years. For example, if a Form W-2 or W-2c for tax year 2017 is not processed until 2021, the individual may no longer reside in the same location. ORES is evaluating the possibility of incorporating the additional tax information reported in subsequent years to the MGD files. This issue is especially pertinent given that the COVID-19 pandemic led to substantial delays in IRS processing of tax returns from 2020 to April 2023.

The Procedural Evaluation

Table 1 presents the number of records extracted in each processing year from 2015 to 2021 and the number of unique SSNs associated with those records, by type of data source (W-2, W-2c, and 1040 Schedule SE). The number of records is analogous to the number of jobs, and the number of SSNs reflects the number of workers. The number of records far exceeds the number of unique SSNs each year because workers may have multiple jobs, each requiring its own tax form. A worker may have tax forms of more than one type for a given year, or multiple forms of the same type in a year, or both. The total number of unique SSNs for each year overstates the actual number of workers because it includes duplicates (that is, the SSNs of workers with more than one type of tax form). Note the relatively large volume of W-2c records processed in 2017, the decrease in the number of Schedule SE records processed in 2020, and the drop in the numbers of W-2s and associated SSNs in 2021. It is not clear if the lower numbers of Schedule SE records processed in 2020 and W-2s processed in 2021 reflect fewer jobs in the economy or the effect of COVID-19 on employers' ability to timely file W-2s or W-2cs for their employees and the IRS' ability to process Schedule SEs.9

Table 1. ORES data extraction volume: Numbers of tax records processed and unique SSNs contained therein, by type of form, 2015–2021
Processing year Primary tax year Total Form W-2 Form W-2c Form 1040 Schedule SE
Records Unique SSNa Records Unique SSNs Records Unique SSNs Records Unique SSNs
2015 2014 259,791,044 181,523,762 237,765,591 160,795,805 3,243,285 2,538,180 18,782,168 18,189,777
2016 2015 269,436,834 186,400,337 245,528,242 163,550,439 3,227,003 2,799,543 20,681,589 20,050,355
2017 2016 278,488,758 191,098,053 251,509,338 166,219,172 6,214,674 4,695,964 20,764,746 20,182,917
2018 2017 279,435,723 191,637,671 254,788,713 168,297,764 3,452,217 2,840,058 21,194,793 20,499,849
2019 2018 284,888,320 194,365,647 259,798,529 170,468,612 3,709,345 3,179,679 21,380,446 20,717,356
2020 2019 286,651,734 195,429,463 262,691,363 172,374,107 3,428,934 3,024,076 20,531,437 20,031,280
2021 2020 276,478,907 194,970,016 250,693,566 170,750,781 4,167,226 3,760,955 21,618,115 20,458,280
SOURCE: Author's calculations based on SSA data processing audit reports.
a. Because some workers have more than one type of tax form in a given year, the total number of unique SSNs includes duplicates.

As mentioned earlier, data for nonprimary tax years are included in a calendar year's processing workload. Table 2 shows the prevalence of primary-year and nonprimary-year data for each type of tax form in 2015–2021. The number of W-2s processed dropped by nearly 12 million from 2020 to 2021, which is likely due to COVID-19's effect on the labor market and employers' W-2 filings.

Table 2. ORES extraction volume for primary and nonprimary tax year data: Numbers of tax records processed and unique SSNs contained therein, by type of form, 2015–2021
Processing year Primary tax year Records processed Number of unique SSNs
Number Percent For primary tax year For other tax year
Total For primary tax year For other tax year Total For primary tax year For other tax year
  Form W-2
2015 2014 237,765,591 235,615,820 2,149,771 100.00 99.10 0.90 160,535,225 2,007,148
2016 2015 245,528,242 243,723,231 1,805,011 100.00 99.26 0.74 163,366,783 1,704,720
2017 2016 251,509,338 249,530,278 1,979,060 100.00 99.21 0.79 166,000,893 1,839,742
2018 2017 254,788,713 253,365,171 1,423,542 100.00 99.44 0.56 168,108,594 1,329,548
2019 2018 259,798,529 258,510,183 1,288,346 100.00 99.50 0.50 170,275,487 1,217,177
2020 2019 262,691,363 261,583,557 1,107,806 100.00 99.58 0.42 172,238,245 1,041,896
2021 2020 250,693,566 249,832,215 861,351 100.00 99.66 0.34 170,623,150 812,196
  Form W-2c
2015 2014 3,243,285 2,179,694 1,063,591 100.00 67.21 32.79 1,870,493 814,835
2016 2015 3,227,003 2,000,757 1,226,246 100.00 62.00 38.00 1,886,081 996,142
2017 2016 6,214,674 3,699,613 2,515,061 100.00 59.53 40.47 3,443,782 1,840,675
2018 2017 3,452,217 2,591,048 861,169 100.00 75.05 24.95 2,192,494 709,762
2019 2018 3,709,345 2,785,824 923,521 100.00 75.10 24.90 2,532,575 723,700
2020 2019 3,428,934 2,695,360 733,574 100.00 78.61 21.39 2,517,683 584,314
2021 2020 4,167,226 3,474,898 692,328 100.00 83.39 16.61 3,253,048 557,342
  Form 1040 Schedule SE
2015 2014 18,782,168 17,813,779 968,389 100.00 94.84 5.16 17,812,721 728,932
2016 2015 20,681,589 19,664,474 1,017,115 100.00 95.08 4.92 19,663,466 780,799
2017 2016 20,764,746 19,804,112 960,634 100.00 95.37 4.63 19,803,275 750,329
2018 2017 21,194,793 20,050,718 1,144,075 100.00 94.60 5.40 20,050,006 908,497
2019 2018 21,380,446 20,278,455 1,101,991 100.00 94.85 5.15 20,277,674 859,115
2020 2019 20,531,437 19,601,328 930,109 100.00 95.47 4.53 19,601,024 795,290
2021 2020 21,618,115 19,308,932 2,309,183 100.00 89.32 10.68 19,308,531 2,031,568
SOURCE: Author's calculations based on SSA data processing audit reports.

The number of W-2cs processed nearly doubled in 2017 and increased sharply in 2021. Part of the increase in W-2c processing in 2017 reflects a large payroll service provider's issuance of corrections to approximately 500,000 records (SSA 2017). The increase in 2021 is most likely a rebound after COVID-19 limited W-2c processing in 2020. The steep increase in 2021 processing of 1040 Schedule SEs for nonprimary tax years most likely reflects IRS efforts to reduce the backlog caused by the pandemic.

As noted earlier, the MGD process uses data from the Numident master file to identify valid and invalid SSNs and to provide information on each worker's sex and date of birth, yet the tax data used in the OEIS process are not subject to the cleaning and verification associated with the programmatic annual wage reporting process. As a result, some of the SSNs in the data extracted for the new process are not in the Numident file and are deemed to be invalid.10 Table 3 shows the number of valid and invalid SSNs and expresses both numbers as a percentage of the unique SSNs contained in the records processed each year. The percentage of SSNs that are valid is stable over the years.

Table 3. Unique SSNs in records processed by ORES, by whether valid, 2015–2021
Processing year Primary tax year Number Percent
Total Valid Invalid Total Valid Invalid
2015 2014 170,260,465 168,962,452 1,298,013 100.00 99.24 0.76
2016 2015 174,002,077 172,610,971 1,391,106 100.00 99.20 0.80
2017 2016 176,723,136 175,237,389 1,485,747 100.00 99.16 0.84
2018 2017 178,863,694 177,339,293 1,524,401 100.00 99.15 0.85
2019 2018 181,131,038 179,553,005 1,578,033 100.00 99.13 0.87
2020 2019 182,622,507 181,050,599 1,571,908 100.00 99.14 0.86
2021 2020 181,232,792 179,465,649 1,767,143 100.00 99.02 0.98
SOURCE: Author's calculations based on SSA data processing audit reports.

In the next step of the MGD process, ORES identifies the demographic information for each worker using data from the Numident file. Table 4 shows the volume of records processed for this step and the breadth of the demographic information the records contained, which enabled ORES to identify, in each tax year, the sex and date of birth of nearly 99 percent of workers whose records include a valid SSN.

Table 4. Number and percentage of tax records processed that include populated demographic data fields, by type of demographic information, 2015–2021
Processing year Primary tax year Number Percentage
Sex Date of— Sex Date of—
Birth Death a Birth Death a
2015 2014 168,147,105 168,082,142 5,175,248 98.76 98.72 3.04
2016 2015 171,808,654 171,745,301 4,547,893 98.74 98.70 2.61
2017 2016 174,446,485 174,385,193 3,948,451 98.74 98.70 2.61
2018 2017 176,571,242 176,512,242 2,003,789 98.72 98.69 1.12
2019 2018 178,801,354 178,744,349 2,696,883 98.71 98.68 1.49
2020 2019 180,317,027 180,262,406 2,128,766 98.74 98.71 1.17
2021 2020 178,765,797 178,714,205 1,566,516 98.64 98.61 0.86
SOURCE: Author's calculations based on SSA data processing audit reports.
a. Lags in posting death date information result in apparent annual declines in deaths that do not reflect actual annual mortality.

Table 5 shows, for unique SSNs associated with worker records processed, the number to which the OEIS/Finalist process assigned either one SCC, multiple SCCs, or no SCCs. The number of workers for whom the OEIS/Finalist process assigned a single SCC is dramatically lower for 2015 than all other years and is reflected in the aberrantly high number of workers with no SCC assigned in that year. In addition, the number of workers with multiple assigned SCCs is much lower for 2015 than all other years. These results raise concerns about the quality of the processing-year 2015 MGD data and ORES will carefully evaluate the distribution of the state and county assignments for that year. The record-processing results for the other years are consistent over time.

Table 5. Unique SSNs in records processed by ORES, by number of SCCs assigned in the OEIS/Finalist process, 2015–2021
Processing year Primary tax year Total One SCC Multiple SCCs No SCC
    Number
2015 2014 170,260,465 58,018,347 1,025,466 111,216,652
2016 2015 174,002,077 163,954,526 8,674,681 1,372,870
2017 2016 176,723,136 166,415,923 9,099,500 1,207,713
2018 2017 178,863,694 168,338,342 9,304,745 1,220,607
2019 2018 181,131,038 170,390,900 9,532,040 1,208,098
2020 2019 182,622,507 171,744,208 9,673,477 1,204,822
2021 2020 181,232,792 171,189,181 8,835,307 1,208,304
    Percent
2015 2014 100.00 34.08 0.60 65.32
2016 2015 100.00 94.23 4.99 0.79
2017 2016 100.00 94.17 5.15 0.68
2018 2017 100.00 94.12 5.20 0.68
2019 2018 100.00 94.07 5.26 0.67
2020 2019 100.00 94.04 5.30 0.66
2021 2020 100.00 94.46 4.88 0.67
SOURCE: Author's calculations based on SSA data processing audit reports.
NOTE: Rounded components of percentage distributions do not necessarily sum to 100.00.

ZIP Code imputation is the first of several steps ORES takes to assign a single SCC for records that were not assigned an SCC in the OEIS/Finalist process. Table 6 shows that ZIP Code imputation dramatically affects the distribution for 2015, converting many worker records from zero to one assigned SCC. Yet for 2015, the number of workers with multiple SCCs is still much lower than in subsequent years and the number of workers with no SCC is much higher than in later years. The high volume of records that were subject to imputation because they were not assigned an SCC in the OEIS/Finalist process probably accounts for the anomalous 2015 figures. The results for the other years are consistent.

Table 6. Unique SSNs in records processed by ORES, by number of SCCs assigned after ZIP Code imputation, 2015–2021
Processing year Primary tax year Total One SCC Multiple SCCs No SCC
    Number
2015 2014 170,260,465 163,623,449 4,883,072 1,753,944
2016 2015 174,002,077 165,122,626 8,678,490 200,961
2017 2016 176,723,136 167,433,007 9,104,367 185,762
2018 2017 178,863,694 169,358,474 9,308,397 196,823
2019 2018 181,131,038 171,373,714 9,535,314 222,010
2020 2019 182,622,507 172,723,721 9,676,552 222,234
2021 2020 181,232,792 172,172,586 8,836,639 223,567
    Percent
2015 2014 100.00 96.10 2.87 1.03
2016 2015 100.00 94.90 4.99 0.12
2017 2016 100.00 94.74 5.15 0.11
2018 2017 100.00 94.69 5.20 0.11
2019 2018 100.00 94.61 5.26 0.12
2020 2019 100.00 94.58 5.30 0.12
2021 2020 100.00 95.00 4.88 0.12
SOURCE: Author's calculations based on SSA data processing audit reports.
NOTE: Rounded components of percentage distributions do not necessarily sum to 100.00.

The next step in the MGD process determines, for workers with multiple SCCs, which one is the best to assign. For this, ORES first generates a file containing all the SSNs that have multiple SCCs and extracts the earnings data for each worker from the MEF. The SCC for the location of the worker's highest-paying job is assigned, when that information is available. For the remaining workers, ORES applies one of several additional imputation techniques (detailed in Compson 2022) that involve matching the frequency distribution of employer location and worker SCCs in the gold-standard file to select the best SCC.

Table 7 quantifies the methods by which records received a single SCC assignment. Excluding processing year 2015, the volume of records having a single SCC assigned via each method is consistent over time. The OEIS/Finalist process produces most of the single-SCC assignments, with the resulting gold-standard file constituting at least 94 percent of workers each year. Using the highest-paying job to assign a single SCC for workers with multiple SCCs accounts for at least 4.7 percent and as much as 5.1 percent of workers in a given year. Combined, these techniques enabled ORES to assign a single SCC to at least 99 percent of workers with a valid SSN in 2016–2021. The frequencies of the other imputation techniques are also consistent over time, as is the percentage of SSNs for which ORES could not assign an SCC.

Table 7. Unique SSNs in records processed by ORES, by number of SCCs assigned and method of assignment, 2015–2021
Processing year Primary tax year Total Number of SCCs assigned after OEIS/Finalist process Missing data; cannot assign SCC
One (gold-standard records) None (single SCC assigned via ZIP Code imputation) More than one: Single SCC assigned based on imputation of MEF data on location of highest-paying job
MEF data identify a single highest-paying job Highest-paying job has multiple locations a No highest-paying job a No earnings data in MEF a
  Number
2015 2014 170,260,465 58,018,347 105,605,102 4,706,334 13,536 692 137,444 1,779,010
2016 2015 174,002,077 163,954,526 1,168,100 8,364,726 29,937 1,886 280,216 202,686
2017 2016 176,723,136 166,415,923 1,017,084 8,757,866 40,532 1,745 302,377 187,609
2018 2017 178,863,694 168,338,342 1,020,132 8,995,263 31,444 1,939 277,928 198,646
2019 2018 181,131,038 170,390,900 982,814 9,230,337 26,826 1,912 274,257 223,992
2020 2019 182,622,507 171,744,208 979,513 9,369,156 25,673 2,070 277,695 224,192
2021 2020 181,232,792 171,189,181 983,405 8,532,308 25,240 2,112 274,689 225,857
  Percent
2015 2014 100.00 34.08 62.03 2.76 0.01 (L) 0.08 1.04
2016 2015 100.00 94.23 0.67 4.81 0.02 (L) 0.16 0.12
2017 2016 100.00 94.17 0.58 4.96 0.02 (L) 0.17 0.11
2018 2017 100.00 94.12 0.57 5.03 0.02 (L) 0.16 0.11
2019 2018 100.00 94.07 0.54 5.10 0.01 (L) 0.15 0.12
2020 2019 100.00 94.04 0.54 5.13 0.01 (L) 0.15 0.12
2021 2020 100.00 94.46 0.54 4.71 0.01 (L) 0.15 0.12
SOURCE: Author's calculations based on SSA data processing audit reports.
NOTES: Rounded components of percentage distributions do not necessarily sum to 100.00.
(L) = less than 0.005.
a. Imputations involve matching the frequency distributions of employer location and worker SCC combinations in the gold-standard file with data available in the MEF.

The procedural evaluation of the MGD process shows consistency over time and provides evidence that the process is stable and robust. However, some observations warrant further investigation. Why did so many records have no SCC assigned in 2015, and did that affect the assumed geographic distribution of workers for that year? Why did the number of W-2c records processed increase sharply in 2017? What accounts for the drop, shown in Table 4, in the number of records with a date of death from 5.18 million (3.0 percent of records processed) in 2015 to 1.57 million (0.9 percent) in 2021? Comparing the following tabulation, which shows all U.S. deaths for 2014–2022, with the number of worker records containing a value in the date of death field shown in Table 4 suggests that many deaths from earlier years were not posted until 2015, 2016, and 2017, and that many deaths occurring in 2018 or later have not been posted yet.

Year Number
2014 2,626,418
2015 2,712,630
2016 2,744,248
2017 2,813,503
2018 2,839,205
2019 2,854,838
2020 3,390,079
2021 3,471,742
2022 3,289,236
SOURCE: Centers for Disease Control and Prevention (2022, 2023).

The Comparative Evaluation

This section describes the steps ORES took in preparing to compare the current-methodology and new-process estimates, summarizes the results that ORES staff expected the evaluation would produce, and describes the construction and characteristics of the data files used in the evaluation. Then, it discusses the differences between the two methodologies in the estimated number of workers with covered earnings and the amounts of those earnings by state, sex, and age.

Comparison of Current-Methodology and MGD-Process Geographic Estimates

The current methodology provides the estimates that ORES publishes in annual statistical publications. ORES publishes covered employment and earnings estimates by state in the Annual Statistical Supplement to the Social Security Bulletin (hereafter, the Annual Statistical Supplement; see https://www.ssa.gov/policy/docs/statcomps/supplement/index.html) and by state and county in Earnings and Employment Data for Workers Covered Under Social Security and Medicare, by State and County (hereafter, Earnings and Employment; see https://www.ssa.gov/policy/docs/statcomps/eedata_sc/index.html). This evaluation uses the microdata and the estimation methods currently used for those publications (with slight modifications, described later) to generate a data file that allows comparison with the MGD process used for assigning SCCs and identifying demographic information. The evaluation comprises two distinct comparisons. The first comparison focuses solely on estimates of the number of workers and their taxable earnings amounts by state, sex, and age. The second comparison focuses on county-level estimates. It is addressed in a separate section because it is significantly more complex than the state-level comparison.

Chart 1 diagrams the steps ORES currently takes to generate the state- and county-level earnings estimates in its statistical publications.11 The process begins by merging the contents of three distinct component files in the 1-percent CWHS file system: the Assigned State file, the Active file, and the SE file. The Assigned State file contains annual earnings and geographic data at the job level, with one record for each SSN/employer identification number combination. The Active file contains time-series earnings and demographic data at the SSN level for each worker with reported earnings over time. The SE file contains job-level earnings and geographic data for self-employed individuals for a given year. The resulting merged microdata file is called the Assigned State_SE_Active (ASA) file. After various manipulations, this merged file contains worker-level data and includes the following data fields:

Flowchart with text description below.
Show text description

Text equivalent for Chart 1.
ORES current-methodology process for estimating state- and county-level covered earnings and employment for its statistical publications

Step 1: Extract geographic and demographic data elements from the CWHS component data files that house them and create a new data file to hold the combined data. The image shows three CWHS component data files—the Assigned State file, the SE (self-employed) file, and the Active file—merging to be come the ASA merged microdata file.

Step 2: Create summarized data files that include the SCC data available in the ASA file. Use the first two digits of the five-digit SCC to identify state names. Arrows indicate that the ASA microdata file from Step 1 flows down to the Summarized data files, which then flow out to the published estimates of covered earnings and employment by state.

Step 3: For county-level estimates, merge the SCC data in the summarized data files with the geographic identifiers contained in the LABELS data file, which links the five-digit SCCs with the corresponding county names. Arrows indicate that the Summarized data files from Step 2 are matched on five-digit SCCs to the LABELS file to become the New file from joined Summarized and LABELS files. This new file then flows out to the published estimates of coverd earnings and employment data by county.

SOURCE: ORES.

ORES currently uses the merged ASA file to create several summarized data files from which it generates state- and county-level employment and earnings estimates. Generating the county-level estimates requires an extra step because the microdata do not contain the county names associated with the SCCs. Specifically, the summarized county-level data must be joined with a separate data file called the LABELS file that contains both the numeric SCCs and the corresponding county names. Further details are provided below in the section on county-level estimates.

Comparing the current and MGD methodologies involves joining the ASA and MGD files, linking the two files' records by SSN. The resulting joined ASA-MGD file—the evaluation file—contains all the information needed to generate two versions of the earnings tables with state-level estimates. This allows a direct comparison between the current and MGD processes of the estimated number of workers and total earnings amounts by sex, age, and type of earnings.

The ASA microdata file that is used to generate the tax year 2017 earnings tables by state and county contains 1,758,471 SSNs (Table 8). Of those, 1,751,807 SSNs are found in both files and 6,664 are in the ASA but not in the MGD file. Given that the MGD file represents the entire population of workers in a tax year, what explains the 6,664 workers represented in the ASA but not in the MGD file? There are two possible answers.

Table 8. Characteristics of the 2017 ASA microdata file and the merged ASA-MGD evaluation file
Criterion Number Percent
  ASA microdata file
Workers represented 1,758,471 100.00
With records used in evaluating MGD 1,751,807 99.62
With records not in the MGD file 6,664 0.38
  Merged ASA-MGD evaluation file
Total 1,751,807 100.00
Workers with Social Security–taxable earnings a 1,687,544 96.33
Wage and salary 1,580,879 90.24
Self-employment 186,697 10.66
Workers with earnings not covered for Social Security 64,263 3.67
Workers with Medicare-taxable earnings a 1,726,916 98.58
Wage and salary 1,622,793 92.64
Self-employment 194,288 11.09
Workers with earnings not covered for Medicare 24,891 1.42
SOURCE: Author's calculations using 2017 ASA and merged ASA-MGD files.
a. Because some workers accrued both wage and salary and self-employment earnings, the sum of those two categories exceeds the total number of workers with taxable earnings.

Recall that the MGD file for a given tax year excludes records for earnings that were not processed in the calendar year following that tax year. For example, in 2018, the MGD process excluded all records containing information for tax years other than 2017. Therefore, some of the MGD file's “missing” SSNs for tax year 2017 were processed in a year other than 2018.12 As previously noted, tax year 2017 data for 2,618,600 individuals were processed in 2017, 2019, 2020, and 2021, and were therefore omitted from the tax year 2017 MGD file.

Of the 6,664 individuals with a 2017 ASA record but no MGD record, ORES identified 1,001 whose records were processed in 2017, 50 whose records were processed in 2019, and 8 whose records were processed in 2020 or 2021. ORES did not attempt to assign a geographic code or identify the sex and date of birth for the 1,059 individuals whose tax year 2017 information was not processed in 2018.

A second explanation for the “missing” individuals is the possibility that incorrect SSNs were entered on the tax forms. Recall that the MGD process for assigning location codes and identifying sex and date of birth is separate and distinct from the OEIS process that cleans and verifies the information before posting the data to the MEF. For example: In compiling its annual wage reports, OEIS matches the name and SSN shown on Form W-2 to that worker's administrative records. If one of the digits in the SSN was entered incorrectly, OEIS undertakes one or more procedures to assign the W-2 information to the correct worker. However, the MGD process does not have this capability. Instead, ORES simply takes the SSN as given and uses it to assign a geographic code and identify the worker's sex and date of birth using the Numident file. As a result, the record for a worker whose information was incorrectly reported on Form W-2 could be retained in the ASA file but would not be included in the MGD file.

Whatever the cause of the discrepancy, the 6,664 individuals with records missing from the MGD file represent less than 0.4 percent of the 1,758,471 workers in the 2017 ASA file. Therefore, ORES removed them from the merged ASA-MGD file that was used in evaluating the MGD process results.

Table 8 shows the number of workers represented in the ASA microdata file and in the large subgroup who comprise the MGD evaluation file, with detail by earnings type (wage and salary, self-employment). It also distinguishes between workers whose earnings are taxable and are not taxable for Social Security and Medicare.

The evaluation begins by comparing MGD-process estimates with slightly modified versions of those published in Annual Statistical Supplement Tables 4.B10 and 4.B12, which respectively show Social Security– and Medicare-covered workers and taxable earnings, by state.13 In the next step, MGD-process estimates are compared with the worker counts and earnings amounts by sex and state found in the modified versions of Earnings and Employment Tables 1 and 4. The third step involves comparing the MGD-process estimates with worker counts and earnings amounts by sex, age, and state, as published in Earnings and Employment Tables 2 and 5. After comparing the state-level estimates, the MGD file's county-level estimates of workers by sex are compared with those in Earnings and Employment Tables 3 and 6.

Preevaluation Expectations

Prior to comparing the estimates produced by the current and MGD processes, ORES expected the outcomes to include larger percentage differences between the methodologies' worker counts and earning amounts for less populous states than for larger ones. To provide a deliberately exaggerated example, consider two hypothetical states: SSA statistical publications estimate that state A has 10,000 workers and state B has 50,000 workers. If the MGD process assigns 2,000 more workers to each state, the estimated number of workers differs by 20 percent in state A but only 4 percent in state B. In such a scenario, the estimated amounts of taxable earnings reported in the states would be similarly affected. In addition, because there are far fewer self-employed individuals (186,697) than wage and salary workers (1,580,879) in the CWHS microdata that underlie the current methodology, smaller absolute changes will likewise generate larger percentage differences for the self-employed than for other workers. ORES expected a similar effect in the estimates by age for the age groups that include comparatively few workers in the CWHS.

ORES also expected match rates between the current methodology and the MGD process to be higher for state assignments than for county assignments. Any worker whose state code does not match in the two files will also have a nonmatching county code, even before considering the several reasons why county assignments within a state may differ between the files. The current methodology assigns state and county codes based on abbreviated geographic identifiers (the first five letters of the city name and the five-digit ZIP Codes reported on tax forms). Although the same abbreviated city name can appear in multiple states, the fact that few (if any) ZIP Codes cross state lines indicates that the current methodology generates reasonably accurate state code assignments.

For county code assignments, however, abbreviated geographic information can be problematic. ZIP Codes speed the flow of mail by designating efficient postal delivery zones which, at the five-digit level, may cross county boundaries. Thus, using only the first five letters of a city name and the five-digit ZIP Code can lead to occasional county code inaccuracies.

Furthermore, under the current methodology, an SCC assigned for a worker with both wage and salary and self-employment earnings might be based on data reported on Form 1040 Schedule SE and on either or both of Forms W-2 or W-2c. When the current methodology was developed more than 30 years ago, the SCC corresponding with the self-employment income was typically assigned because the address reported on Schedule SE was viewed as more reliable than a conflicting address reported on another form. However, the MGD process has revealed that millions of individuals are assigned multiple SCCs in a given tax year and there is no reason to believe that the address reported on Schedule SE is more reliable than the address on the W-2 or W-2c. The MGD process provides several options for assigning an SCC and ORES has determined that the best option is to use the SCC corresponding with the highest-paying job regardless of the type of earnings. For these reasons, differences between the current methodology and the MGD process are more likely in county assignments than in state assignments.

Third, ORES expected very high match rates between the current methodology and the MGD process for worker sex and age. Where discrepancies emerged, ORES expected that the MGD process would be more accurate than the current process. This is because the Numident master file is the sole source of the sex and age information used in the MGD process, while in the current methodology, that information may be drawn from either of two files that are derived from the Numident, rather than from the source file itself.

Evaluating Worker Counts

Of the 1,751,807 individuals represented in the full MGD evaluation file, which includes those with noncovered earnings as well as those with earnings covered by Social Security or Medicare, 98.87 percent have the same state code assigned by the current and MGD processes (not shown). Thus, only 19,875 workers (1.13 percent) have nonmatching state codes. However, among those workers with nonmatching state codes are 2,511 to whom the current methodology assigns one of the following location categories: Armed Forces, International Operations, Other, and Reserves, categories that are not included in the MGD process.14 Because those categories do not represent a state or U.S. territory, calculating a “true” match rate—one that accounts only for cases in which it is possible for the two state codes to match—requires removing those 2,511 individuals from the total of 1,751,807 workers. The resulting “true” match rate is 99.26 percent, which leaves only 17,364 of 1,749,296 workers whose MGD-process and current-methodology state codes do not match. This result aligns with ORES expectations of high state code match rates given that few ZIP Codes, if any, cross state lines.

The tables that follow compare the numbers of workers and the taxable earnings amounts estimated using the current and the MGD processes for assigning geographic and demographic information. Recall that the current-methodology estimates are slightly modified so that the estimates for both processes are based on the same unadjusted and unweighted raw data from the microdata file derived from the 1-percent CWHS.

Table 9 shows the estimated number of workers with earnings taxable for Social Security—that is, Old-Age, Survivors, and Disability Insurance (OASDI)—by state or other area (as assigned using the current methodology) and type of earnings. It also shows the number and percentage of workers who are assigned the same state codes using the MGD process. Note that these estimates include the workers for whom the current methodology assigned the codes Armed Forces, International Operations, Other, and Reserves. As a result, the state-code match rates are slightly understated.

Table 9. Number of workers with Social Security (OASDI) taxable earnings, by state or other area as assigned under the current methodology; and number and percent of workers with matching state codes in the MGD file; by type of earnings, tax year 2017
Current-methodology assigned state or area All Wage and salary Self-employed
Total Workers with matching state code in MGD file Total Workers with matching state code in MGD file Total Workers with matching state code in MGD file
Number Percent Number Percent Number Percent
All areas 1,687,544 1,669,082 98.91 1,580,879 1,563,528 98.90 186,697 183,118 98.08
Alabama 23,856 23,657 99.17 22,531 22,340 99.15 2,411 2,374 98.47
Alaska 3,791 3,760 99.18 3,561 3,531 99.16 417 410 98.32
Arizona 33,785 33,553 99.31 31,847 31,629 99.32 3,455 3,374 97.66
Arkansas 14,690 14,428 98.22 13,774 13,516 98.13 1,608 1,578 98.13
California 189,421 188,343 99.43 173,786 172,769 99.41 25,134 24,829 98.79
Colorado 29,337 29,041 98.99 27,275 26,995 98.97 3,647 3,568 97.83
Connecticut 19,621 19,452 99.14 18,326 18,164 99.12 2,228 2,189 98.25
Delaware 5,199 5,120 98.48 4,984 4,905 98.41 422 416 98.58
District of Columbia 4,155 3,986 95.93 3,939 3,775 95.84 437 415 94.97
Florida 104,426 103,565 99.18 96,427 95,607 99.15 13,431 13,164 98.01
Georgia 52,577 52,067 99.03 49,197 48,709 99.01 6,020 5,916 98.27
Hawaii 7,715 7,652 99.18 7,183 7,124 99.18 867 854 98.50
Idaho 8,866 8,745 98.64 8,325 8,208 98.59 951 931 97.90
Illinois 66,450 65,557 98.66 62,455 61,585 98.61 7,220 7,096 98.28
Indiana 36,500 36,229 99.26 34,895 34,643 99.28 3,119 3,062 98.17
Iowa 17,681 17,516 99.07 16,723 16,565 99.06 1,847 1,817 98.38
Kansas 15,798 15,670 99.19 14,921 14,799 99.18 1,641 1,612 98.23
Kentucky 22,194 22,006 99.15 20,975 20,793 99.13 2,177 2,143 98.44
Louisiana 21,612 21,339 98.74 20,175 19,909 98.68 2,537 2,491 98.19
Maine 7,164 7,096 99.05 6,631 6,563 98.97 913 898 98.36
Maryland 33,296 32,996 99.10 31,493 31,206 99.09 3,385 3,322 98.14
Massachusetts 36,585 36,209 98.97 34,164 33,799 98.93 4,154 4,095 98.58
Michigan 52,165 51,845 99.39 49,353 49,040 99.37 5,206 5,142 98.77
Minnesota 32,585 32,310 99.16 30,920 30,650 99.13 3,220 3,191 99.10
Mississippi 14,298 14,229 99.52 13,406 13,340 99.51 1,691 1,660 98.17
Missouri 31,759 31,517 99.24 30,041 29,808 99.22 3,196 3,142 98.31
Montana 6,098 5,688 93.28 5,723 5,318 92.92 671 651 97.02
Nebraska 11,127 10,801 97.07 10,525 10,202 96.93 1,151 1,129 98.09
Nevada 13,930 13,851 99.43 13,095 13,021 99.43 1,459 1,422 97.46
New Hampshire 8,055 7,983 99.11 7,548 7,478 99.07 826 816 98.79
New Jersey 49,423 49,059 99.26 46,467 46,124 99.26 5,287 5,200 98.35
New Mexico 9,740 9,690 99.49 9,198 9,150 99.48 932 910 97.64
New York 105,970 104,884 98.98 98,858 97,849 98.98 12,494 12,260 98.13
North Carolina 52,577 52,238 99.36 49,529 49,199 99.33 5,482 5,401 98.52
North Dakota 4,469 4,337 97.05 4,222 4,092 96.92 510 501 98.24
Ohio 58,397 57,740 98.87 54,935 54,288 98.82 5,895 5,839 99.05
Oklahoma 19,624 19,513 99.43 18,488 18,384 99.44 2,038 2,008 98.53
Oregon 21,674 21,547 99.41 20,326 20,207 99.41 2,287 2,240 97.94
Pennsylvania 68,886 68,531 99.48 65,408 65,070 99.48 6,426 6,334 98.57
Rhode Island 5,964 5,885 98.68 5,650 5,573 98.64 587 574 97.79
South Carolina 25,479 25,336 99.44 24,176 24,040 99.44 2,450 2,398 97.88
South Dakota 5,470 5,131 93.80 5,158 4,822 93.49 612 600 98.04
Tennessee 34,994 34,754 99.31 32,637 32,409 99.30 4,124 4,045 98.08
Texas 134,668 133,888 99.42 124,891 124,142 99.40 16,667 16,438 98.63
Utah 16,305 16,206 99.39 15,631 15,535 99.39 1,481 1,464 98.85
Vermont 3,786 3,747 98.97 3,553 3,514 98.90 434 428 98.62
Virginia 46,057 45,662 99.14 43,680 43,312 99.16 4,510 4,409 97.76
Washington 39,559 39,303 99.35 37,498 37,252 99.34 3,629 3,571 98.40
West Virginia 8,378 8,327 99.39 7,992 7,941 99.36 683 677 99.12
Wisconsin 32,812 32,663 99.55 31,346 31,207 99.56 2,742 2,712 98.91
Wyoming 3,217 3,180 98.85 3,036 3,001 98.85 357 343 96.08
Outlying areas a 10,197 10,019 98.25 9,424 9,248 98.13 997 985 98.80
Other and unknown 5,162 1,231 23.85 4,578 1,178 25.73 632 74 11.71
SOURCE: Author's calculations using 2017 merged ASA-MGD file.
NOTE: Because some workers accrued both wage and salary and self-employment earnings, the sum of those two categories exceeds the number of all workers with taxable earnings.
a. Most of the workers in this category are assigned a Puerto Rico state code. Other outlying areas are American Samoa, Guam, Northern Mariana Islands, and U.S. Virgin Islands.

The match rate for all workers with OASDI taxable earnings is 98.9 percent. It is at least 99 percent in 34 states, at least 98 percent in 46 states, and at least 97 percent in 48 states.15 The lowest match rates are those for the District of Columbia (95.9 percent), South Dakota (93.8 percent), and Montana (93.3 percent)—states with relatively few workers.

The match rate for all wage and salary workers with OASDI taxable earnings is also 98.9 percent.16 The match rate is at least 99 percent in 34 states and at least 98 percent in 47 states. The lowest match rates are 96.9 percent for North Dakota, 95.8 percent for the District of Columbia, and around 93 percent for Montana and South Dakota.

The match rate for all self-employed individuals with OASDI taxable earnings is 98.1 percent. In most states, the match rate for self-employed individuals tends to be lower than that for wage and salary workers, likely because self-employed individuals are far less numerous than wage and salary workers in the CWHS. Only three states have a match rate of at least 99 percent, although 39 states have a match rate of at least 98 percent and 49 have a match rate of at least 97 percent. The match rate for Wyoming is 96.1 percent and for the District of Columbia it is 95.0 percent.

Results of the same analysis for workers with earnings covered under the Medicare programs were similar to those for workers covered under OASDI, and this pattern recurred for all subsequent comparisons between the two methodologies. Therefore, the results for Medicare-covered workers are not shown in separate tables and are not discussed hereafter unless they diverge from those for OASDI-covered workers.

Differences in Estimated Worker Counts

Table 10 shows the number of workers for whom the current methodology and the MGD process assigned a state code, by the assigned state or area. For all workers with OASDI taxable earnings, the difference in the number of state assignments ranges from 420 fewer workers estimated in the MGD file for Illinois to 1,331 additional workers estimated in the MGD file for California. For only seven states does the percentage differ by more than 1 percent, with the MGD file assigning fewer workers for six of them: Montana (−6.1 percent), South Dakota (−5.1 percent), the District of Columbia (−2.7 percent), Nebraska (−2.5 percent), North Dakota (−1.2 percent), Delaware (−1.1 percent), and Missouri (1.1 percent). These states all have relatively few workers.

Table 10. Number of workers with Social Security (OASDI) taxable earnings for whom a state was assigned using the current methodology and the MGD process, by state or other area and type of earnings, tax year 2017
State or area All Wage and salary Self-employed
Current methodology MGD process Difference Current methodology MGD process Difference Current methodology MGD process Difference
Number Percent Number Percent Number Percent
All areas 1,687,544 1,687,544 0 0.00 1,580,879 1,580,879 0 0.00 186,697 186,697 0 0.00
Alabama 23,856 23,874 18 0.08 22,531 22,547 16 0.07 2,411 2,420 9 0.37
Alaska 3,791 3,805 14 0.37 3,561 3,576 15 0.42 417 420 3 0.72
Arizona 33,785 33,912 127 0.38 31,847 31,975 128 0.40 3,455 3,442 -13 -0.38
Arkansas 14,690 14,637 -53 -0.36 13,774 13,716 -58 -0.42 1,608 1,610 2 0.12
California 189,421 190,752 1,331 0.70 173,786 175,077 1,291 0.74 25,134 25,170 36 0.14
Colorado 29,337 29,420 83 0.28 27,275 27,328 53 0.19 3,647 3,682 35 0.96
Connecticut 19,621 19,609 -12 -0.06 18,326 18,312 -14 -0.08 2,228 2,231 3 0.13
Delaware 5,199 5,142 -57 -1.10 4,984 4,926 -58 -1.16 422 423 1 0.24
District of Columbia 4,155 4,044 -111 -2.67 3,939 3,829 -110 -2.79 437 447 10 2.29
Florida 104,426 104,990 564 0.54 96,427 96,911 484 0.50 13,431 13,450 19 0.14
Georgia 52,577 52,601 24 0.05 49,197 49,234 37 0.08 6,020 6,010 -10 -0.17
Hawaii 7,715 7,709 -6 -0.08 7,183 7,181 -2 -0.03 867 869 2 0.23
Idaho 8,866 8,816 -50 -0.56 8,325 8,279 -46 -0.55 951 949 -2 -0.21
Illinois 66,450 66,030 -420 -0.63 62,455 62,024 -431 -0.69 7,220 7,199 -21 -0.29
Indiana 36,500 36,608 108 0.30 34,895 35,014 119 0.34 3,119 3,135 16 0.51
Iowa 17,681 17,630 -51 -0.29 16,723 16,671 -52 -0.31 1,847 1,852 5 0.27
Kansas 15,798 15,790 -8 -0.05 14,921 14,912 -9 -0.06 1,641 1,643 2 0.12
Kentucky 22,194 22,153 -41 -0.18 20,975 20,939 -36 -0.17 2,177 2,165 -12 -0.55
Louisiana 21,612 21,468 -144 -0.67 20,175 20,036 -139 -0.69 2,537 2,520 -17 -0.67
Maine 7,164 7,213 49 0.68 6,631 6,670 39 0.59 913 916 3 0.33
Maryland 33,296 33,296 0 0.00 31,493 31,496 3 0.01 3,385 3,378 -7 -0.21
Massachusetts 36,585 36,491 -94 -0.26 34,164 34,057 -107 -0.31 4,154 4,168 14 0.34
Michigan 52,165 52,213 48 0.09 49,353 49,391 38 0.08 5,206 5,204 -2 -0.04
Minnesota 32,585 32,598 13 0.04 30,920 30,927 7 0.02 3,220 3,245 25 0.78
Mississippi 14,298 14,326 28 0.20 13,406 13,435 29 0.22 1,691 1,691 0 0.00
Missouri 31,759 32,121 362 1.14 30,041 30,391 350 1.17 3,196 3,216 20 0.63
Montana 6,098 5,728 -370 -6.07 5,723 5,358 -365 -6.38 671 659 -12 -1.79
Nebraska 11,127 10,854 -273 -2.45 10,525 10,253 -272 -2.58 1,151 1,147 -4 -0.35
Nevada 13,930 14,037 107 0.77 13,095 13,204 109 0.83 1,459 1,470 11 0.75
New Hampshire 8,055 8,092 37 0.46 7,548 7,580 32 0.42 826 842 16 1.94
New Jersey 49,423 49,543 120 0.24 46,467 46,589 122 0.26 5,287 5,306 19 0.36
New Mexico 9,740 9,806 66 0.68 9,198 9,264 66 0.72 932 927 -5 -0.54
New York 105,970 106,741 771 0.73 98,858 99,672 814 0.82 12,494 12,500 6 0.05
North Carolina 52,577 52,579 2 0.00 49,529 49,533 4 0.01 5,482 5,480 -2 -0.04
North Dakota 4,469 4,414 -55 -1.23 4,222 4,166 -56 -1.33 510 516 6 1.18
Ohio 58,397 58,066 -331 -0.57 54,935 54,599 -336 -0.61 5,895 5,904 9 0.15
Oklahoma 19,624 19,657 33 0.17 18,488 18,517 29 0.16 2,038 2,053 15 0.74
Oregon 21,674 21,712 38 0.18 20,326 20,368 42 0.21 2,287 2,276 -11 -0.48
Pennsylvania 68,886 69,062 176 0.26 65,408 65,573 165 0.25 6,426 6,438 12 0.19
Rhode Island 5,964 5,929 -35 -0.59 5,650 5,615 -35 -0.62 587 587 0 0.00
South Carolina 25,479 25,648 169 0.66 24,176 24,338 162 0.67 2,450 2,452 2 0.08
South Dakota 5,470 5,192 -278 -5.08 5,158 4,881 -277 -5.37 612 614 2 0.33
Tennessee 34,994 34,976 -18 -0.05 32,637 32,624 -13 -0.04 4,124 4,101 -23 -0.56
Texas 134,668 135,072 404 0.30 124,891 125,224 333 0.27 16,667 16,702 35 0.21
Utah 16,305 16,384 79 0.48 15,631 15,707 76 0.49 1,481 1,498 17 1.15
Vermont 3,786 3,792 6 0.16 3,553 3,558 5 0.14 434 434 0 0.00
Virginia 46,057 46,235 178 0.39 43,680 43,844 164 0.38 4,510 4,516 6 0.13
Washington 39,559 39,797 238 0.60 37,498 37,724 226 0.60 3,629 3,648 19 0.52
West Virginia 8,378 8,410 32 0.38 7,992 8,020 28 0.35 683 692 9 1.32
Wisconsin 32,812 32,907 95 0.29 31,346 31,436 90 0.29 2,742 2,758 16 0.58
Wyoming 3,217 3,211 -6 -0.19 3,036 3,028 -8 -0.26 357 352 -5 -1.40
Outlying areas a 10,197 10,137 -60 -0.59 9,424 9,362 -62 -0.66 997 1009 12 1.20
Other and unknown 5,162 2,315 -2,847 -55.15 4,578 1,988 -2,590 -56.57 632 361 -271 -42.88
SOURCE: Author's calculations using 2017 merged ASA-MGD file.
NOTE: Because some workers accrued both wage and salary and self-employment earnings, the sum of those two categories exceeds the number of all workers with taxable earnings.
a. Most of the workers in this category are assigned a Puerto Rico state code. Other outlying areas are American Samoa, Guam, Northern Mariana Islands, and U.S. Virgin Islands.

For wage and salary workers with OASDI taxable earnings, the difference in the number of state assignments ranges from 431 fewer workers in the MGD file for Illinois to 1,291 additional workers in the MGD file for California. Because wage and salary workers far outnumber self-employed individuals, their results in Table 10 are similar to those for all workers: MGD assignments for the same seven states differ by more than 1 percent from those of the current methodology (Montana, −6.4 percent; South Dakota, −5.4 percent; the District of Columbia, −2.8 percent; Nebraska, −2.6 percent; North Dakota, −1.3 percent; Delaware, −1.2 percent; and Missouri, 1.2 percent).

For self-employed individuals, the differences in the numbers of state assignments range from 23 fewer workers in the MGD file for Tennessee to 36 additional workers in the MGD file for California. The MGD state code assignments differ by more than 1 percent from the current-methodology assignments in seven states: the District of Columbia (2.3 percent), New Hampshire (1.9 percent), West Virginia (1.3 percent), North Dakota (1.2 percent), Utah (1.2 percent), Wyoming (−1.4 percent), and Montana (−1.8 percent). In a notable departure from the pattern for wage and salary workers, MGD code assignments for the self-employed are more than 1 percent higher than those in the current methodology for five states.

A parallel analysis for workers with Medicare-taxable earnings produced very similar results, with one difference worth noting. The MGD process assigned the District of Columbia code to 1.3 percent more individuals with Medicare-covered self-employment income than the current methodology did (not shown), compared with 2.3 percent more for self-employed individuals with OASDI taxable earnings.

Differences in Estimated Taxable Earnings Amounts

Given the high match rates in the estimated numbers of workers with OASDI taxable earnings for both earnings types, one might expect the estimated taxable earnings amounts by state to be similar under the two methodologies as well. However, some of the workers with different state codes assigned by the MGD process could have earnings that are high enough to alter some of the estimated state-level earnings. Potential state-level shifts in estimated Medicare-covered earnings amounts could be even greater because unlike OASDI-covered earnings, there is no cap on the amount of Medicare earnings subject to the payroll tax.

Table 11 compares the estimated amounts of Social Security taxable earnings for workers whose state code was assigned under the current methodology with those whose state code was assigned under the MGD process.

Table 11. Earnings of workers with Social Security (OASDI) taxable earnings for whom a state was assigned using the current methodology and the MGD process, by state or other area and type of earnings, tax year 2017 (in 2017 dollars)
State or area Current methodology MGD process Difference
Amount Percent
  All
All areas 68,423,438,380 68,423,438,380 0 0.00
Alabama 857,716,152 858,592,612 876,460 0.10
Alaska 153,874,120 154,421,206 547,086 0.36
Arizona 1,302,959,209 1,306,516,111 3,556,902 0.27
Arkansas 491,163,014 489,982,217 -1,180,797 -0.24
California 8,433,766,110 8,481,973,766 48,207,656 0.57
Colorado 1,238,763,159 1,243,011,623 4,248,464 0.34
Connecticut 921,093,226 919,812,262 -1,280,964 -0.14
Delaware 216,156,133 212,893,943 -3,262,190 -1.51
District of Columbia 231,648,128 222,967,504 -8,680,624 -3.75
Florida 3,791,556,468 3,812,600,272 21,043,804 0.56
Georgia 1,993,330,826 1,991,930,932 -1,399,894 -0.07
Hawaii 319,080,649 318,270,329 -810,320 -0.25
Idaho 302,386,845 301,788,771 -598,074 -0.20
Illinois 2,748,333,915 2,727,070,643 -21,263,272 -0.77
Indiana 1,352,562,930 1,360,357,123 7,794,193 0.58
Iowa 665,729,171 662,820,670 -2,908,501 -0.44
Kansas 598,835,629 598,542,985 -292,644 -0.05
Kentucky 762,050,270 760,602,506 -1,447,764 -0.19
Louisiana 773,282,923 764,287,215 -8,995,708 -1.16
Maine 248,549,671 251,190,221 2,640,550 1.06
Maryland 1,630,296,173 1,631,735,995 1,439,822 0.09
Massachusetts 1,748,735,103 1,747,889,863 -845,240 -0.05
Michigan 2,042,364,653 2,042,894,366 529,713 0.03
Minnesota 1,404,270,994 1,405,178,905 907,911 0.06
Mississippi 467,146,390 467,671,454 525,064 0.11
Missouri 1,143,533,770 1,158,822,712 15,288,942 1.34
Montana 200,619,056 190,604,753 -10,014,303 -4.99
Nebraska 418,010,730 410,128,643 -7,882,087 -1.89
Nevada 503,719,119 507,489,173 3,770,054 0.75
New Hampshire 362,380,857 362,902,978 522,121 0.14
New Jersey 2,418,851,606 2,424,681,320 5,829,714 0.24
New Mexico 337,908,949 341,565,374 3,656,425 1.08
New York 4,771,907,899 4,804,933,827 33,025,928 0.69
North Carolina 1,974,284,037 1,976,214,801 1,930,764 0.10
North Dakota 180,947,605 179,144,543 -1,803,062 -1.00
Ohio 2,139,409,841 2,119,620,918 -19,788,923 -0.92
Oklahoma 694,532,355 693,883,937 -648,418 -0.09
Oregon 871,808,720 873,599,440 1,790,720 0.21
Pennsylvania 2,849,416,932 2,855,317,004 5,900,072 0.21
Rhode Island 243,885,478 243,124,171 -761,307 -0.31
South Carolina 919,022,725 925,946,911 6,924,186 0.75
South Dakota 186,155,770 178,413,757 -7,742,013 -4.16
Tennessee 1,277,836,973 1,276,286,028 -1,550,945 -0.12
Texas 5,403,455,636 5,423,910,900 20,455,264 0.38
Utah 611,070,284 614,279,890 3,209,606 0.53
Vermont 142,580,658 142,886,697 306,039 0.21
Virginia 2,088,562,332 2,099,671,869 11,109,537 0.53
Washington 1,862,102,324 1,871,095,019 8,992,695 0.48
West Virginia 293,519,718 295,441,183 1,921,465 0.65
Wisconsin 1,292,637,852 1,295,553,320 2,915,468 0.23
Wyoming 122,145,345 121,929,576 -215,769 -0.18
Outlying areas a 237,340,154 235,879,261 -1,460,893 -0.62
Other and unknown 180,139,794 65,106,881 -115,032,913 -63.86
  Wage and salary
All areas 65,799,740,190 65,799,740,190 0 0.00
Alabama 827,957,923 828,946,232 988,309 0.12
Alaska 147,351,844 147,905,856 554,012 0.38
Arizona 1,260,295,968 1,263,843,879 3,547,911 0.28
Arkansas 473,254,737 471,965,870 -1,288,867 -0.27
California 8,023,239,762 8,070,405,328 47,165,566 0.59
Colorado 1,184,284,373 1,187,477,629 3,193,256 0.27
Connecticut 875,217,959 874,178,633 -1,039,326 -0.12
Delaware 210,174,607 206,902,502 -3,272,105 -1.56
District of Columbia 222,607,680 213,946,772 -8,660,908 -3.89
Florida 3,650,205,329 3,670,244,169 20,038,840 0.55
Georgia 1,925,737,777 1,924,700,902 -1,036,875 -0.05
Hawaii 305,121,578 304,328,035 -793,543 -0.26
Idaho 290,087,661 289,567,847 -519,814 -0.18
Illinois 2,652,242,379 2,631,031,675 -21,210,704 -0.80
Indiana 1,314,385,276 1,322,446,028 8,060,752 0.61
Iowa 641,044,554 638,025,174 -3,019,380 -0.47
Kansas 574,789,255 574,751,422 -37,833 -0.01
Kentucky 737,285,458 735,928,648 -1,356,810 -0.18
Louisiana 742,800,185 733,968,962 -8,831,223 -1.19
Maine 235,689,047 238,101,805 2,412,758 1.02
Maryland 1,581,885,107 1,583,384,514 1,499,407 0.09
Massachusetts 1,675,437,348 1,674,458,736 -978,612 -0.06
Michigan 1,975,784,546 1,976,301,140 516,594 0.03
Minnesota 1,357,964,358 1,358,654,929 690,571 0.05
Mississippi 448,508,759 449,122,191 613,432 0.14
Missouri 1,104,700,118 1,119,716,148 15,016,030 1.36
Montana 191,194,586 181,294,921 -9,899,665 -5.18
Nebraska 402,459,145 394,582,716 -7,876,429 -1.96
Nevada 484,192,411 488,181,527 3,989,116 0.82
New Hampshire 345,023,243 345,473,658 450,415 0.13
New Jersey 2,321,913,541 2,327,899,404 5,985,863 0.26
New Mexico 326,703,029 330,365,888 3,662,859 1.12
New York 4,592,694,595 4,626,594,435 33,899,840 0.74
North Carolina 1,906,929,140 1,908,884,471 1,955,331 0.10
North Dakota 172,882,650 170,998,260 -1,884,390 -1.09
Ohio 2,062,139,701 2,042,274,693 -19,865,008 -0.96
Oklahoma 671,254,262 670,515,050 -739,212 -0.11
Oregon 835,388,680 837,410,794 2,022,114 0.24
Pennsylvania 2,754,547,086 2,760,069,736 5,522,650 0.20
Rhode Island 235,265,798 234,450,170 -815,628 -0.35
South Carolina 890,369,058 896,951,561 6,582,503 0.74
South Dakota 178,287,749 170,632,577 -7,655,172 -4.29
Tennessee 1,209,383,919 1,208,427,144 -956,775 -0.08
Texas 5,173,321,386 5,192,270,975 18,949,589 0.37
Utah 596,024,683 599,105,522 3,080,839 0.52
Vermont 137,165,055 137,460,528 295,473 0.22
Virginia 2,025,521,370 2,036,073,995 10,552,625 0.52
Washington 1,799,128,895 1,807,248,859 8,119,964 0.45
West Virginia 283,313,293 285,180,695 1,867,402 0.66
Wisconsin 1,256,046,557 1,258,873,867 2,827,310 0.23
Wyoming 117,372,594 117,079,703 -292,891 -0.25
Outlying areas a 224,617,215 223,109,020 -1,508,195 -0.67
Other and unknown 168,546,961 58,024,995 -110,521,966 -65.57
  Self-employed
All areas 5,799,361,603 5,799,361,603 0 0.00
Alabama 66,648,167 66,545,016 -103,151 -0.15
Alaska 14,519,740 14,719,815 200,075 1.38
Arizona 100,645,843 100,315,372 -330,471 -0.33
Arkansas 40,949,914 40,505,569 -444,345 -1.09
California 797,303,016 797,260,291 -42,725 -0.01
Colorado 120,265,806 121,486,330 1,220,524 1.01
Connecticut 89,875,562 89,283,750 -591,812 -0.66
Delaware 14,895,078 14,902,505 7,427 0.05
District of Columbia 20,949,978 21,524,978 575,000 2.74
Florida 317,106,454 315,124,001 -1,982,453 -0.63
Georgia 156,585,224 156,027,875 -557,349 -0.36
Hawaii 28,383,465 28,535,683 152,218 0.54
Idaho 28,006,004 28,077,951 71,947 0.26
Illinois 218,901,727 217,987,027 -914,700 -0.42
Indiana 97,753,666 99,101,243 1,347,577 1.38
Iowa 60,342,164 60,399,916 57,752 0.10
Kansas 56,259,390 56,278,057 18,667 0.03
Kentucky 61,320,285 61,070,243 -250,042 -0.41
Louisiana 68,502,378 67,810,281 -692,097 -1.01
Maine 26,088,892 26,075,417 -13,475 -0.05
Maryland 124,010,332 124,322,266 311,934 0.25
Massachusetts 153,009,485 153,360,152 350,667 0.23
Michigan 157,817,436 157,261,359 -556,077 -0.35
Minnesota 113,550,586 114,494,480 943,894 0.83
Mississippi 42,668,082 42,780,374 112,292 0.26
Missouri 94,816,194 95,176,683 360,489 0.38
Montana 20,833,803 20,360,948 -472,855 -2.27
Nebraska 38,648,035 38,576,907 -71,128 -0.18
Nevada 42,495,330 43,113,267 617,937 1.45
New Hampshire 32,061,866 32,339,105 277,239 0.86
New Jersey 205,454,248 206,256,749 802,501 0.39
New Mexico 26,110,321 25,969,312 -141,009 -0.54
New York 413,468,918 414,833,661 1,364,743 0.33
North Carolina 156,079,879 156,247,750 167,871 0.11
North Dakota 20,229,337 20,502,566 273,229 1.35
Ohio 168,884,200 169,135,108 250,908 0.15
Oklahoma 58,331,043 58,587,588 256,545 0.44
Oregon 75,262,265 74,820,957 -441,308 -0.59
Pennsylvania 223,224,904 223,504,490 279,586 0.13
Rhode Island 19,783,599 19,866,812 83,213 0.42
South Carolina 71,291,266 71,880,825 589,559 0.83
South Dakota 20,107,307 20,077,426 -29,881 -0.15
Tennessee 137,155,734 136,201,945 -953,789 -0.70
Texas 487,609,948 486,965,031 -644,917 -0.13
Utah 49,916,410 50,305,835 389,425 0.78
Vermont 12,999,807 12,964,845 -34,962 -0.27
Virginia 154,147,599 154,729,247 581,648 0.38
Washington 134,758,092 136,214,362 1,456,270 1.08
West Virginia 22,225,842 22,463,427 237,585 1.07
Wisconsin 90,998,522 91,812,978 814,456 0.90
Wyoming 12,254,030 12,040,548 -213,482 -1.74
Outlying areas a 20,637,445 20,776,379 138,934 0.67
Other and unknown 13,216,985 8,386,901 -4,830,084 -36.54
SOURCE: Author's calculations using 2017 merged ASA-MGD file.
NOTE: Because some workers accrued both wage and salary and self-employment earnings, the sum of the earnings in those two categories exceeds the amount shown for all workers with taxable earnings.
a. Most of the workers in this category are assigned a Puerto Rico state code. Other outlying areas are American Samoa, Guam, Northern Mariana Islands, and U.S. Virgin Islands.

For all workers, the MGD earnings estimate differs by at least 1 percent from that of the current methodology in 10 states. The MGD estimate is lower in seven of those states: Montana (−5.0 percent), South Dakota (−4.2 percent), the District of Columbia (−3.8 percent), Nebraska (−1.9 percent), Delaware (−1.5 percent), Louisiana (−1.2 percent), and North Dakota (−1.0 percent). The MGD estimate is higher in three states: Missouri (1.3 percent) and New Mexico and Maine (1.1 percent).

For wage and salary workers, the MGD estimate differs by at least 1 percent from that of the current methodology in 11 states. The MGD estimate is lower in eight of those states: Montana (−5.2 percent), South Dakota (−4.3 percent), the District of Columbia (−3.9 percent), Nebraska (−2.0 percent), Delaware (−1.6 percent), Louisiana (−1.2 percent), North Dakota (−1.1 percent), and Ohio (−1.0 percent). In three states, the MGD estimate is at least 1 percent higher than the current-methodology estimate: Missouri (1.4 percent), New Mexico (1.1 percent), and Maine (1.0 percent).

For self-employed individuals, the MGD estimate differs by at least 1 percent from that of the current methodology in 12 states. The MGD estimate is higher for eight of them: the District of Columbia (2.7 percent); Nevada (1.5 percent); Alaska, Indiana, and North Dakota (1.4 percent); Washington and West Virginia (1.1 percent); and Colorado (1.0 percent). The MGD estimate is lower for four states: Montana (−2.3 percent), Wyoming (−1.7 percent), Arkansas (−1.1 percent), and Louisiana (−1.0 percent).

The percentage changes in the estimated amounts of OASDI taxable earnings between the two methodologies are generally small, as was expected; but are the percentage changes in estimated earnings proportional with the percentage changes in the estimated numbers of workers? That is, if the MGD estimate of workers in a given state is 1.5 percent lower than the current-methodology estimate, is there a corresponding decrease in the estimated amount of taxable OASDI earnings?

Table 12 shows the percentage differences between the current-methodology estimates and the MGD-process estimates of both the number of workers with OASDI taxable earnings (from Table 10) and the amounts of those earnings (from Table 11), and presents the percentage-point differences between those two measures. For all workers, the percentage-point difference between the two measures exceeds 0.5 in only four states: Montana and the District of Columbia (1.1 percentage points), South Dakota (0.9 percentage point), and Nebraska (0.6 percentage point). The results for wage and salary workers are similar.

Table 12. Percentage difference between current-methodology and MGD-process estimates of the number of workers with Social Security (OASDI) taxable earnings and their taxable earnings amounts, and the percentage-point difference between those two estimates, by state or other area and type of earnings, tax year 2017
State or area All Wage and salary Self-employed
Percentage difference in estimated— Percentage point difference Percentage difference in estimated— Percentage point difference Percentage difference in estimated— Percentage point difference
Number of workers Taxable earnings Number of workers Taxable earnings Number of workers Taxable earnings
All areas 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Alabama 0.08 0.10 0.02 0.07 0.12 0.05 0.37 -0.15 0.52
Alaska 0.37 0.36 0.01 0.42 0.38 0.04 0.72 1.38 0.66
Arizona 0.38 0.27 0.11 0.40 0.28 0.12 -0.38 -0.33 0.05
Arkansas -0.36 -0.24 0.12 -0.42 -0.27 0.15 0.12 -1.09 1.21
California 0.70 0.57 0.13 0.74 0.59 0.15 0.14 -0.01 0.15
Colorado 0.28 0.34 0.06 0.19 0.27 0.08 0.96 1.01 0.05
Connecticut -0.06 -0.14 0.08 -0.08 -0.12 0.04 0.13 -0.66 0.79
Delaware -1.10 -1.51 0.41 -1.16 -1.56 0.40 0.24 0.05 0.19
District of Columbia -2.67 -3.75 1.08 -2.79 -3.89 1.10 2.29 2.74 0.45
Florida 0.54 0.56 0.02 0.50 0.55 0.05 0.14 -0.63 0.77
Georgia 0.05 -0.07 0.12 0.08 -0.05 0.13 -0.17 -0.36 0.19
Hawaii -0.08 -0.25 0.17 -0.03 -0.26 0.23 0.23 0.54 0.31
Idaho -0.56 -0.20 0.36 -0.55 -0.18 0.37 -0.21 0.26 0.47
Illinois -0.63 -0.77 0.14 -0.69 -0.80 0.11 -0.29 -0.42 0.13
Indiana 0.30 0.58 0.28 0.34 0.61 0.27 0.51 1.38 0.87
Iowa -0.29 -0.44 0.15 -0.31 -0.47 0.16 0.27 0.10 0.17
Kansas -0.05 -0.05 0.00 -0.06 -0.01 0.05 0.12 0.03 0.09
Kentucky -0.18 -0.19 0.01 -0.17 -0.18 0.01 -0.55 -0.41 0.14
Louisiana -0.67 -1.16 0.49 -0.69 -1.19 0.50 -0.67 -1.01 0.34
Maine 0.68 1.06 0.38 0.59 1.02 0.43 0.33 -0.05 0.38
Maryland 0.00 0.09 0.09 0.01 0.09 0.08 -0.21 0.25 0.46
Massachusetts -0.26 -0.05 0.21 -0.31 -0.06 0.25 0.34 0.23 0.11
Michigan 0.09 0.03 0.06 0.08 0.03 0.05 -0.04 -0.35 0.31
Minnesota 0.04 0.06 0.02 0.02 0.05 0.03 0.78 0.83 0.05
Mississippi 0.20 0.11 0.09 0.22 0.14 0.08 0.00 0.26 0.26
Missouri 1.14 1.34 0.20 1.17 1.36 0.19 0.63 0.38 0.25
Montana -6.07 -4.99 1.08 -6.38 -5.18 1.20 -1.79 -2.27 0.48
Nebraska -2.45 -1.89 0.56 -2.58 -1.96 0.62 -0.35 -0.18 0.17
Nevada 0.77 0.75 0.02 0.83 0.82 0.01 0.75 1.45 0.70
New Hampshire 0.46 0.14 0.32 0.42 0.13 0.29 1.94 0.86 1.08
New Jersey 0.24 0.24 0.00 0.26 0.26 0.00 0.36 0.39 0.03
New Mexico 0.68 1.08 0.40 0.72 1.12 0.40 -0.54 -0.54 0.00
New York 0.73 0.69 0.04 0.82 0.74 0.08 0.05 0.33 0.28
North Carolina 0.00 0.10 0.10 0.01 0.10 0.09 -0.04 0.11 0.15
North Dakota -1.23 -1.00 0.23 -1.33 -1.09 0.24 1.18 1.35 0.17
Ohio -0.57 -0.92 0.35 -0.61 -0.96 0.35 0.15 0.15 0.00
Oklahoma 0.17 -0.09 0.26 0.16 -0.11 0.27 0.74 0.44 0.30
Oregon 0.18 0.21 0.03 0.21 0.24 0.03 -0.48 -0.59 0.11
Pennsylvania 0.26 0.21 0.05 0.25 0.20 0.05 0.19 0.13 0.06
Rhode Island -0.59 -0.31 0.28 -0.62 -0.35 0.27 0.00 0.42 0.42
South Carolina 0.66 0.75 0.09 0.67 0.74 0.07 0.08 0.83 0.75
South Dakota -5.08 -4.16 0.92 -5.37 -4.29 1.08 0.33 -0.15 0.48
Tennessee -0.05 -0.12 0.07 -0.04 -0.08 0.04 -0.56 -0.70 0.14
Texas 0.30 0.38 0.08 0.27 0.37 0.10 0.21 -0.13 0.34
Utah 0.48 0.53 0.05 0.49 0.52 0.03 1.15 0.78 0.37
Vermont 0.16 0.21 0.05 0.14 0.22 0.08 0.00 -0.27 0.27
Virginia 0.39 0.53 0.14 0.38 0.52 0.14 0.13 0.38 0.25
Washington 0.60 0.48 0.12 0.60 0.45 0.15 0.52 1.08 0.56
West Virginia 0.38 0.65 0.27 0.35 0.66 0.31 1.32 1.07 0.25
Wisconsin 0.29 0.23 0.06 0.29 0.23 0.06 0.58 0.90 0.32
Wyoming -0.19 -0.18 0.01 -0.26 -0.25 0.01 -1.40 -1.74 0.34
Outlying areas a -0.59 -0.62 0.03 -0.66 -0.67 0.01 1.20 0.67 0.53
Other and unknown -55.15 -63.86 54.53 -56.57 -65.57 9.00 -42.88 -36.54 6.34
SOURCE: Author's calculations using 2017 merged ASA-MGD file.
a. Most of the workers in this category are assigned a Puerto Rico state code. Other outlying areas are American Samoa, Guam, Northern Mariana Islands, and U.S. Virgin Islands.

For self-employed individuals, the two measures differ by 0.5 percentage point or more in nine states: Arkansas (1.2 percentage points); New Hampshire (1.1 percentage points); Indiana (0.9 percentage point); Connecticut, Florida, and South Carolina (0.8 percentage point); Nevada and Alaska (0.7 percentage point); and Washington (0.6 percentage point).

Estimates by State and Sex

The evaluation continues by comparing the results of the current methodology and the MGD process for identifying the sex of workers. Table 13 shows that the match rate of the reported sex for all workers is 99.3 percent. However, the MGD file includes two categories of incomplete data, Missing and Unknown, that are not duplicated in the CWHS microdata file. If the records for the 5,237 workers with Missing values and the 648 workers with Unknown values for sex are removed from the MGD file, the match rate is 99.6 percent (not shown).

Table 13. Number of workers with Social Security (OASDI) taxable earnings, by state or other area as assigned under the current methodology; and number and percent of workers with matching sex identifiers in the MGD file; by type of earnings, tax year 2017
Current-methodology assigned state or area All Wage and salary Self-employed
Total Workers with matching sex identifier in MGD file Total Workers with matching sex identifier in MGD file Total Workers with matching sex identifier in MGD file
Number Percent Number Percent Number Percent
All areas 1,687,544 1,675,898 99.31 1,580,879 1,570,114 99.32 186,697 185,287 99.24
Alabama 23,856 23,710 99.39 22,531 22,391 99.38 2,411 2,397 99.42
Alaska 3,791 3,763 99.26 3,561 3,534 99.24 417 414 99.28
Arizona 33,785 33,585 99.41 31,847 31,659 99.41 3,455 3,434 99.39
Arkansas 14,690 14,598 99.37 13,774 13,687 99.37 1,608 1,599 99.44
California 189,421 187,988 99.24 173,786 172,497 99.26 25,134 24,921 99.15
Colorado 29,337 29,178 99.46 27,275 27,129 99.46 3,647 3,628 99.48
Connecticut 19,621 19,487 99.32 18,326 18,201 99.32 2,228 2,209 99.15
Delaware 5,199 5,168 99.40 4,984 4,954 99.40 422 421 99.76
District of Columbia 4,155 4,121 99.18 3,939 3,909 99.24 437 432 98.86
Florida 104,426 103,726 99.33 96,427 95,778 99.33 13,431 13,337 99.30
Georgia 52,577 52,176 99.24 49,197 48,832 99.26 6,020 5,968 99.14
Hawaii 7,715 7,673 99.46 7,183 7,146 99.48 867 861 99.31
Idaho 8,866 8,821 99.49 8,325 8,286 99.53 951 944 99.26
Illinois 66,450 65,964 99.27 62,455 61,995 99.26 7,220 7,171 99.32
Indiana 36,500 36,330 99.53 34,895 34,740 99.56 3,119 3,100 99.39
Iowa 17,681 17,478 98.85 16,723 16,536 98.88 1,847 1,818 98.43
Kansas 15,798 15,727 99.55 14,921 14,855 99.56 1,641 1,634 99.57
Kentucky 22,194 22,076 99.47 20,975 20,866 99.48 2,177 2,164 99.40
Louisiana 21,612 21,463 99.31 20,175 20,051 99.39 2,537 2,504 98.70
Maine 7,164 7,134 99.58 6,631 6,604 99.59 913 908 99.45
Maryland 33,296 33,112 99.45 31,493 31,321 99.45 3,385 3,365 99.41
Massachusetts 36,585 36,388 99.46 34,164 33,980 99.46 4,154 4,127 99.35
Michigan 52,165 51,676 99.06 49,353 48,900 99.08 5,206 5,153 98.98
Minnesota 32,585 32,417 99.48 30,920 30,761 99.49 3,220 3,204 99.50
Mississippi 14,298 14,210 99.38 13,406 13,322 99.37 1,691 1,681 99.41
Missouri 31,759 31,563 99.38 30,041 29,857 99.39 3,196 3,178 99.44
Montana 6,098 6,063 99.43 5,723 5,691 99.44 671 665 99.11
Nebraska 11,127 11,072 99.51 10,525 10,474 99.52 1,151 1,145 99.48
Nevada 13,930 13,839 99.35 13,095 13,011 99.36 1,459 1,448 99.25
New Hampshire 8,055 8,017 99.53 7,548 7,512 99.52 826 822 99.52
New Jersey 49,423 49,045 99.24 46,467 46,107 99.23 5,287 5,254 99.38
New Mexico 9,740 9,675 99.33 9,198 9,139 99.36 932 922 98.93
New York 105,970 105,455 99.51 98,858 98,388 99.52 12,494 12,414 99.36
North Carolina 52,577 52,141 99.17 49,529 49,127 99.19 5,482 5,430 99.05
North Dakota 4,469 4,445 99.46 4,222 4,199 99.46 510 507 99.41
Ohio 58,397 57,902 99.15 54,935 54,468 99.15 5,895 5,850 99.24
Oklahoma 19,624 19,502 99.38 18,488 18,373 99.38 2,038 2,028 99.51
Oregon 21,674 21,542 99.39 20,326 20,207 99.41 2,287 2,268 99.17
Pennsylvania 68,886 68,274 99.11 65,408 64,842 99.13 6,426 6,360 98.97
Rhode Island 5,964 5,925 99.35 5,650 5,612 99.33 587 585 99.66
South Carolina 25,479 25,325 99.40 24,176 24,029 99.39 2,450 2,434 99.35
South Dakota 5,470 5,445 99.54 5,158 5,135 99.55 612 608 99.35
Tennessee 34,994 34,780 99.39 32,637 32,434 99.38 4,124 4,109 99.64
Texas 134,668 133,380 99.04 124,891 123,713 99.06 16,667 16,499 98.99
Utah 16,305 16,242 99.61 15,631 15,571 99.62 1,481 1,476 99.66
Vermont 3,786 3,771 99.60 3,553 3,538 99.58 434 434 100.00
Virginia 46,057 45,803 99.45 43,680 43,445 99.46 4,510 4,475 99.22
Washington 39,559 39,230 99.17 37,498 37,185 99.17 3,629 3,601 99.23
West Virginia 8,378 8,339 99.53 7,992 7,956 99.55 683 678 99.27
Wisconsin 32,812 32,676 99.59 31,346 31,218 99.59 2,742 2,729 99.53
Wyoming 3,217 3,200 99.47 3,036 3,019 99.44 357 356 99.72
Outlying areas a 10,197 10,136 99.40 9,424 9,371 99.44 997 987 99.00
Other and unknown 5,162 5,142 99.61 4,578 4,559 99.58 632 631 99.84
SOURCE: Author's calculations using 2017 merged ASA-MGD file.
NOTE: Because some workers accrued both wage and salary and self-employment earnings, the sum of those two categories exceeds the number of all workers with taxable earnings.
a. Most of the workers in this category are assigned a Puerto Rico state code. Other outlying areas are American Samoa, Guam, Northern Mariana Islands, and U.S. Virgin Islands.

Table 13 shows that the sex identified in the current methodology and the MGD process matches for at least 99 percent of all workers and for wage and salary workers in all states except Iowa, which has a 98.9 percent match rate. For self-employed individuals, the match rate by sex is lower than 99 percent in seven states. However, the match rate for all seven of those states exceeds 98.4 percent.

Table 14 repeats Table 12 with detail by sex; that is, it shows the percentage differences between the current-methodology estimates and the MGD-process estimates of both the number of workers with OASDI taxable earnings and the amounts of those earnings, and presents the percentage-point differences between those two measures. For all workers, the percentage-point difference between the number of workers and the amount of taxable OASDI earnings exceeds 0.5 in only nine states (Louisiana, Ohio, South Dakota, and Wyoming, for women; the District of Columbia, Maine, Nebraska, and Oklahoma, for men; and Montana, for both).

Table 14. Percentage difference between current-methodology and MGD-process estimates of the number of workers with Social Security (OASDI) taxable earnings and their taxable earnings amounts, and the percentage-point difference between those two estimates, by sex, state or other area, and type of earnings, tax year 2017
State or area All Wage and salary Self-employed
Percentage difference in estimated— Percentage point difference Percentage difference in estimated— Percentage point difference Percentage difference in estimated— Percentage point difference
Number of workers Taxable earnings Number of workers Taxable earnings Number of workers Taxable earnings
All areas -0.33 -0.41 -0.08 -0.32 -0.41 -0.08 -0.40 -0.50 -0.10
Men -0.33 -0.42 -0.09 -0.32 -0.42 -0.10 -0.40 -0.46 -0.06
Women -0.33 -0.39 -0.06 -0.33 -0.39 -0.06 -0.42 -0.45 -0.03
Alabama
Men 0.06 0.28 0.22 0.08 0.31 0.23 -0.08 -0.45 -0.37
Women -0.28 -0.69 -0.41 -0.29 -0.69 -0.40 0.27 -0.05 -0.32
Alaska
Men 0.25 0.31 0.06 0.32 0.33 0.02 1.63 1.37 -0.26
Women -0.17 -0.32 -0.15 -0.12 -0.33 -0.21 -1.16 1.34 2.50
Arizona
Men 0.10 -0.05 -0.14 0.11 -0.06 -0.17 -0.11 0.03 0.14
Women 0.26 0.13 -0.13 0.29 0.15 -0.14 -1.07 -1.59 -0.51
Arkansas
Men -0.67 -0.71 -0.03 -0.79 -0.78 0.00 0.00 -1.76 -1.76
Women -0.47 -0.14 0.33 -0.49 -0.13 0.36 0.00 -0.63 -0.63
California
Men 0.70 0.30 -0.40 0.81 0.36 -0.45 -0.19 -0.43 -0.23
Women -0.03 -0.24 -0.20 -0.03 -0.25 -0.22 -0.45 -0.75 -0.30
Colorado
Men -0.09 -0.09 0.00 -0.17 -0.20 -0.03 0.40 0.66 0.26
Women 0.15 0.19 0.04 0.05 0.14 0.09 1.15 1.18 0.03
Connecticut
Men -0.17 -0.25 -0.08 -0.17 -0.18 -0.01 -0.24 -1.04 -0.80
Women -0.62 -0.77 -0.15 -0.62 -0.80 -0.17 -0.52 -1.03 -0.51
Delaware
Men -1.18 -1.62 -0.44 -1.28 -1.67 -0.40 0.00 -0.33 -0.33
Women -1.36 -1.81 -0.45 -1.41 -1.86 -0.45 0.52 0.55 0.04
District of Columbia
Men -2.09 -4.42 -2.33 -2.42 -4.72 -2.31 5.99 4.93 -1.06
Women -3.55 -3.52 0.02 -3.44 -3.51 -0.07 -1.82 -0.44 1.38
Florida
Men 0.36 0.28 -0.09 0.33 0.28 -0.06 0.23 -0.55 -0.78
Women 0.17 0.07 -0.10 0.12 0.05 -0.06 -0.54 -1.85 -1.31
Georgia
Men -0.37 -0.57 -0.19 -0.33 -0.55 -0.22 -0.43 -0.47 -0.05
Women -0.29 -0.50 -0.21 -0.24 -0.47 -0.23 -0.84 -1.28 -0.44
Hawaii
Men -0.73 -0.72 0.01 -0.65 -0.74 -0.09 -0.43 0.39 0.82
Women 0.00 -0.18 -0.18 0.06 -0.16 -0.22 0.00 0.04 0.04
Idaho
Men -1.13 -0.68 0.45 -1.11 -0.68 0.43 -1.14 -0.19 0.95
Women -0.31 0.15 0.46 -0.23 0.19 0.42 0.00 0.99 0.99
Illinois
Men -1.12 -1.36 -0.24 -1.18 -1.42 -0.24 -0.91 -0.73 0.19
Women -0.91 -1.04 -0.13 -0.98 -1.04 -0.06 -0.40 -0.78 -0.38
Indiana
Men 0.10 0.48 0.37 0.18 0.54 0.35 -0.11 0.68 0.79
Women 0.17 0.31 0.13 0.22 0.34 0.11 0.67 2.31 1.64
Iowa
Men -1.33 -1.67 -0.34 -1.27 -1.61 -0.34 -1.45 -1.76 -0.31
Women -1.11 -1.28 -0.17 -1.13 -1.34 -0.20 -0.81 -1.58 -0.77
Kansas
Men -0.26 -0.38 -0.12 -0.25 -0.30 -0.05 -0.31 -0.61 -0.30
Women -0.25 -0.34 -0.09 -0.27 -0.35 -0.08 0.45 0.97 0.52
Kentucky
Men -0.17 -0.06 0.11 -0.09 -0.02 0.07 -1.31 -1.20 0.11
Women -0.47 -0.67 -0.21 -0.51 -0.71 -0.20 -0.21 0.16 0.37
Louisiana
Men -1.03 -1.51 -0.49 -1.00 -1.54 -0.53 -1.69 -1.53 0.17
Women -0.85 -1.51 -0.67 -0.83 -1.47 -0.63 -1.02 -2.66 -1.64
Maine
Men 0.38 0.91 0.54 0.24 0.83 0.59 -0.39 -0.59 -0.21
Women 0.72 0.87 0.15 0.68 0.88 0.20 1.01 0.82 -0.19
Maryland
Men -0.22 -0.41 -0.20 -0.20 -0.39 -0.19 -0.51 0.11 0.62
Women -0.15 0.16 0.30 -0.15 0.15 0.30 -0.19 -0.14 0.04
Massachusetts
Men -0.32 -0.07 0.25 -0.37 -0.07 0.30 0.17 -0.14 -0.31
Women -0.51 -0.45 0.06 -0.57 -0.47 0.10 -0.06 -0.32 -0.26
Michigan
Men -0.60 -0.74 -0.14 -0.60 -0.72 -0.12 -1.07 -1.35 -0.28
Women -0.51 -0.61 -0.11 -0.50 -0.61 -0.10 -0.42 -0.45 -0.03
Minnesota
Men -0.12 -0.13 0.00 -0.14 -0.16 -0.01 0.64 0.81 0.17
Women -0.30 -0.37 -0.07 -0.31 -0.36 -0.05 0.45 0.31 -0.14
Mississippi
Men 0.00 -0.25 -0.25 0.02 -0.20 -0.22 0.00 -0.48 -0.48
Women 0.01 0.09 0.07 0.04 0.09 0.05 -0.47 0.77 1.25
Missouri
Men 0.86 0.96 0.10 0.93 1.00 0.07 0.00 -0.35 -0.35
Women 0.86 0.97 0.11 0.86 1.01 0.15 0.82 0.65 -0.16
Montana
Men -6.17 -5.57 0.59 -6.55 -5.81 0.73 -2.36 -2.42 -0.06
Women -6.48 -4.50 1.98 -6.67 -4.64 2.02 -2.43 -2.30 0.13
Nebraska
Men -2.73 -2.01 0.72 -2.86 -2.09 0.77 -0.84 -0.25 0.58
Women -2.52 -2.09 0.43 -2.65 -2.14 0.50 -0.46 -0.46 0.00
Nevada
Men 0.39 0.24 -0.15 0.48 0.34 -0.15 0.40 1.12 0.72
Women 0.62 0.77 0.15 0.69 0.81 0.12 0.14 1.22 1.07
New Hampshire
Men -0.02 -0.14 -0.12 0.00 -0.10 -0.10 1.82 0.99 -0.83
Women 0.62 0.18 -0.44 0.57 0.14 -0.43 1.21 -0.50 -1.71
New Jersey
Men -0.19 -0.16 0.03 -0.19 -0.16 0.03 -0.23 0.06 0.29
Women -0.16 -0.39 -0.22 -0.14 -0.37 -0.22 0.31 -0.22 -0.54
New Mexico
Men 0.59 0.86 0.27 0.64 0.95 0.31 -1.12 -2.36 -1.24
Women 0.25 0.58 0.33 0.33 0.62 0.29 -1.03 -0.58 0.45
New York
Men 1.12 0.83 -0.29 1.29 0.90 -0.38 -0.01 -0.14 -0.12
Women 0.07 0.15 0.08 0.13 0.18 0.06 -0.29 0.48 0.77
North Carolina
Men -0.42 -0.46 -0.03 -0.41 -0.44 -0.03 -0.41 -0.82 -0.41
Women -0.38 -0.27 0.11 -0.34 -0.27 0.07 -0.83 -0.01 0.82
North Dakota
Men -1.07 -0.92 0.16 -1.15 -1.03 0.12 0.00 0.44 0.44
Women -1.86 -1.38 0.48 -1.93 -1.41 0.53 2.82 3.89 1.06
Ohio
Men -1.00 -1.33 -0.33 -1.04 -1.38 -0.33 -0.49 -0.43 0.05
Women -1.15 -1.78 -0.62 -1.20 -1.82 -0.62 -0.19 -0.40 -0.21
Oklahoma
Men -0.21 -0.75 -0.54 -0.15 -0.72 -0.58 -0.27 -0.90 -0.63
Women -0.06 -0.25 -0.19 -0.13 -0.32 -0.19 1.19 1.47 0.28
Oregon
Men -0.18 -0.24 -0.07 -0.14 -0.20 -0.06 -0.68 -0.93 -0.25
Women 0.05 -0.10 -0.15 0.09 -0.02 -0.11 -0.99 -1.33 -0.33
Pennsylvania
Men -0.22 -0.32 -0.09 -0.17 -0.29 -0.12 -0.68 -0.48 0.20
Women -0.52 -0.72 -0.21 -0.55 -0.76 -0.21 -0.23 -0.52 -0.30
Rhode Island
Men -0.50 -0.42 0.08 -0.57 -0.48 0.09 0.00 0.18 0.18
Women -1.31 -1.01 0.29 -1.30 -1.04 0.26 -0.78 0.59 1.37
South Carolina
Men 0.46 0.53 0.06 0.49 0.51 0.02 0.08 1.12 1.05
Women 0.42 0.50 0.08 0.40 0.48 0.08 -0.51 -0.51 0.00
South Dakota
Men -4.65 -4.19 0.46 -4.92 -4.32 0.61 0.00 -0.33 -0.33
Women -5.93 -4.73 1.20 -6.16 -4.85 1.30 -0.44 -1.75 -1.30
Tennessee
Men -0.41 -0.43 -0.02 -0.37 -0.39 -0.01 -1.24 -0.91 0.33
Women -0.30 -0.66 -0.37 -0.32 -0.64 -0.31 -0.06 -0.92 -0.86
Texas
Men -0.26 -0.35 -0.09 -0.26 -0.33 -0.06 -0.69 -1.42 -0.73
Women -0.18 -0.25 -0.07 -0.22 -0.28 -0.06 -0.15 -0.17 -0.02
Utah
Men 0.45 0.47 0.02 0.46 0.43 -0.03 1.08 1.15 0.07
Women 0.26 0.12 -0.13 0.25 0.18 -0.07 1.08 -0.16 -1.24
Vermont
Men -0.16 -0.04 0.11 -0.17 -0.05 0.12 0.00 0.12 0.12
Women 0.22 0.14 -0.08 0.17 0.12 -0.05 0.00 -0.72 -0.72
Virginia
Men 0.00 0.05 0.04 0.00 0.04 0.04 -0.62 -0.52 0.11
Women 0.27 0.46 0.20 0.28 0.48 0.20 0.14 0.32 0.18
Washington
Men 0.11 -0.12 -0.23 0.10 -0.18 -0.28 0.10 0.91 0.81
Women 0.14 -0.09 -0.23 0.16 -0.07 -0.23 -0.12 -0.07 0.06
West Virginia
Men 0.31 0.71 0.40 0.35 0.74 0.39 -0.26 0.42 0.67
Women 0.05 0.09 0.04 -0.03 0.06 0.09 2.41 1.92 -0.49
Wisconsin
Men -0.07 -0.11 -0.04 -0.09 -0.12 -0.03 0.50 0.71 0.21
Women 0.27 0.16 -0.12 0.28 0.15 -0.13 0.00 0.45 0.45
Wyoming
Men 0.00 -0.48 -0.48 0.00 -0.51 -0.51 -0.49 -1.98 -1.49
Women -0.59 0.08 0.67 -0.76 -0.07 0.69 -2.63 -1.34 1.29
Outlying areas a
Men -0.74 -0.59 0.15 -0.78 -0.60 0.18 0.46 -1.04 -1.51
Women -0.71 -0.91 -0.21 -0.75 -1.00 -0.26 1.43 4.43 3.00
Other and unknown
Men -62.44 -70.79 -8.36 -64.88 -72.96 -8.08 -34.64 -28.96 5.68
Women -42.53 -47.14 -4.61 -40.76 -46.82 -6.06 -50.61 -46.50 4.12
SOURCE: Author's calculations using 2017 merged ASA-MGD file.
a. Most of the workers in this category are assigned a Puerto Rico state code. Other outlying areas are American Samoa, Guam, Northern Mariana Islands, and U.S. Virgin Islands.

Estimates by State and Age

Earnings and Employment includes tables showing the numbers of workers with taxable Social Security and Medicare earnings by state or other area, sex, and age. Table 15 compares the worker ages identified using the current methodology and the MGD process and shows that the ages assigned by the MGD process match those identified under the current methodology for 98.9 percent of workers overall. However, the records for 5,734 workers in the MGD file are missing an age value and therefore cannot match the current-methodology age value. Removing these records from consideration would produce a “true” match rate of 99.2 percent. Further, for an additional 0.6 percent of all workers, the age assigned in the MGD file is within 2 years (plus or minus) of the age assigned by the current methodology. Combining the true match rate and the share of workers whose ages are within 2 years of the current-methodology assigned age would result in a 99.8 percent match rate for all workers.

Table 15. Number of workers with Social Security (OASDI) taxable earnings, by state or other area as assigned under the current methodology; and number and percent of workers with matching ages in the MGD file; tax year 2017
Current-methodology assigned state or area Total Workers with matching age in MGD file
Number Percent
All areas 1,687,544 1,668,449 98.87
Alabama 23,856 23,613 98.98
Alaska 3,791 3,748 98.87
Arizona 33,785 33,442 98.98
Arkansas 14,690 14,529 98.90
California 189,421 187,067 98.76
Colorado 29,337 29,054 99.04
Connecticut 19,621 19,369 98.72
Delaware 5,199 5,148 99.02
District of Columbia 4,155 4,110 98.92
Florida 104,426 103,209 98.83
Georgia 52,577 51,900 98.71
Hawaii 7,715 7,652 99.18
Idaho 8,866 8,782 99.05
Illinois 66,450 65,572 98.68
Indiana 36,500 36,201 99.18
Iowa 17,681 17,436 98.61
Kansas 15,798 15,659 99.12
Kentucky 22,194 22,005 99.15
Louisiana 21,612 21,381 98.93
Maine 7,164 7,102 99.13
Maryland 33,296 32,942 98.94
Massachusetts 36,585 36,246 99.07
Michigan 52,165 51,468 98.66
Minnesota 32,585 32,323 99.20
Mississippi 14,298 14,119 98.75
Missouri 31,759 31,419 98.93
Montana 6,098 6,042 99.08
Nebraska 11,127 11,048 99.29
Nevada 13,930 13,769 98.84
New Hampshire 8,055 7,979 99.06
New Jersey 49,423 48,800 98.74
New Mexico 9,740 9,627 98.84
New York 105,970 104,947 99.03
North Carolina 52,577 51,945 98.80
North Dakota 4,469 4,433 99.19
Ohio 58,397 57,662 98.74
Oklahoma 19,624 19,401 98.86
Oregon 21,674 21,490 99.15
Pennsylvania 68,886 67,960 98.66
Rhode Island 5,964 5,901 98.94
South Carolina 25,479 25,145 98.69
South Dakota 5,470 5,421 99.10
Tennessee 34,994 34,612 98.91
Texas 134,668 133,007 98.77
Utah 16,305 16,193 99.31
Vermont 3,786 3,750 99.05
Virginia 46,057 45,543 98.88
Washington 39,559 39,114 98.88
West Virginia 8,378 8,306 99.14
Wisconsin 32,812 32,557 99.22
Wyoming 3,217 3,181 98.88
Outlying areas a 10,197 9,999 98.06
Other and unknown 5,162 5,121 99.21
SOURCE: Author's calculations using 2017 merged ASA-MGD file.
a. Most of the workers in this category are assigned a Puerto Rico state code. Other outlying areas are American Samoa, Guam, Northern Mariana Islands, and U.S. Virgin Islands.

Estimates by age in Earnings and Employment are shown for each of nine age groups. In many states, some of those categories contain relatively few workers. Specifically, five of the age groups (under 20, 60–61, 62–64, 65–69, and 70 or older) contain far fewer workers than the other four. As previously noted, lower numbers of workers in these categories are likely to result in larger percentage differences in the estimates by age between the current methodology and the MGD process. Comparing the differences between the two processes is therefore problematic because many of the larger percentage changes may reflect relatively small changes in the number of workers.

Table 16 shows how the MGD-estimated counts of workers with OASDI taxable earnings by sex and age differ from the current-methodology estimates (after removing the MGD records for 5,734 workers with a missing value for age). Because the differences are slight, the MGD assignment of age requires no further evaluation.

Table 16. Difference from the current-methodology estimates of the number of workers with Social Security (OASDI) taxable earnings when using the MGD process, by age, sex, and state or other area, tax year 2017
State or area All ages Under 20 20–29 30–39 40–49 50–59 60–61 62–64 65–69 70 or older
All areas -6,070 1 42 -214 -755 -2,956 -1,213 -462 -321 -192
Men -3,147 -9 29 -176 -442 -1,494 -531 -267 -87 -170
Women -2,923 10 13 -38 -313 -1,462 -682 -195 -234 -22
Alabama
Men 1 3 7 -2 23 -21 0 4 -10 -3
Women -33 6 -12 -8 -3 -3 -4 -7 1 -3
Alaska
Men 5 -1 2 6 -2 -2 2 0 1 -1
Women -4 0 3 -2 2 -3 -4 0 -1 1
Arizona
Men 13 3 7 17 -1 -5 -3 -9 3 1
Women 35 6 16 12 9 -15 0 3 5 -1
Arkansas
Men -51 0 -2 1 -7 -30 -4 -4 -5 0
Women -36 5 2 -8 -4 -10 -4 -8 -9 0
California
Men 673 55 562 249 -19 -76 -49 -22 0 -27
Women -46 22 144 31 2 -150 -49 -12 -39 5
Colorado
Men -21 0 15 -8 0 -20 -1 -1 2 -8
Women 16 -1 10 0 16 8 -16 -7 4 2
Connecticut
Men -21 2 -6 0 -1 1 -15 -4 0 2
Women -64 -1 7 -1 -7 -32 -6 -4 -13 -7
Delaware
Men -31 -2 -3 -8 -10 -2 -3 -1 -2 0
Women -35 -1 -12 -9 -3 -6 -1 -2 -2 1
District of Columbia
Men -42 -1 11 -17 -20 -10 -5 1 0 -1
Women -76 -3 -24 -7 -18 -17 -5 -1 -2 1
Florida
Men 174 17 54 47 77 -12 -16 -16 12 11
Women 76 11 38 41 -11 3 -13 -3 -1 11
Georgia
Men -112 10 12 -16 -2 -53 -27 -18 -4 -14
Women -80 15 13 8 -8 -61 -16 -18 -8 -5
Hawaii
Men -32 -6 -15 -4 -4 -1 0 3 1 -6
Women 0 1 2 10 -4 -5 -3 3 -5 1
Idaho
Men -55 -2 -13 -13 -14 -6 -6 -1 0 0
Women -15 -5 -1 4 -2 -7 -3 0 -1 0
Illinois
Men -389 -19 -29 -69 -41 -148 -40 -22 -10 -11
Women -301 -4 -35 -25 -25 -120 -54 -6 -17 -15
Indiana
Men 15 -1 4 13 11 -6 -7 9 -6 -2
Women 25 5 3 17 16 -5 -4 -11 4 0
Iowa
Men -121 -2 -1 -11 -32 -66 -7 -3 4 -3
Women -98 0 -4 -6 -27 -45 -11 -2 -2 -1
Kansas
Men -22 -2 2 -12 -1 -9 1 1 -1 -1
Women -21 -1 -2 2 -5 -14 -1 1 -1 0
Kentucky
Men -24 0 5 -2 -14 -1 5 -8 -5 -4
Women -52 -8 -7 -7 -7 -14 -1 -7 -3 2
Louisiana
Men -119 -7 -7 -16 -28 -32 -11 -3 -7 -8
Women -93 0 -13 -22 -33 -18 3 -7 -3 0
Maine
Men 13 1 4 3 4 1 -1 2 -1 0
Women 25 -2 -2 3 12 12 2 2 -1 -1
Maryland
Men -42 3 21 6 -11 -37 -9 -6 -5 -4
Women -34 -4 -10 6 9 -32 -5 5 -9 6
Massachusetts
Men -66 -11 -6 -26 -20 7 -4 0 -5 -1
Women -98 -2 -2 -14 -17 -25 -13 -17 0 -8
Michigan
Men -172 3 7 -5 15 -119 -54 -10 1 -10
Women -132 1 7 4 1 -96 -46 5 -4 -4
Minnesota
Men -28 1 -12 4 -1 -7 4 -2 -10 -5
Women -50 4 2 -14 -5 -27 -4 -3 -2 -1
Mississippi
Men -3 5 11 -7 0 -9 4 -7 -1 1
Women -3 -2 -4 9 -3 -14 -3 7 3 4
Missouri
Men 136 19 48 16 44 1 -8 -2 12 6
Women 131 29 20 32 12 36 -10 11 -1 2
Montana
Men -201 -26 -36 -50 -29 -44 0 -7 -6 -3
Women -187 -27 -48 -40 -24 -19 -4 -7 -7 -11
Nebraska
Men -159 -12 -35 -45 -23 -30 -1 -7 -4 -2
Women -134 -25 -27 -27 -15 -20 -13 -1 -5 -1
Nevada
Men 27 1 21 -1 11 0 -5 5 0 -5
Women 39 6 9 26 5 -4 -3 3 -2 -1
New Hampshire
Men -3 3 2 4 -2 0 -2 -3 -2 -3
Women 22 7 2 7 2 4 -1 -1 2 0
New Jersey
Men -54 -1 22 11 -2 -44 -23 -7 4 -14
Women -44 0 34 8 -1 -48 -45 1 -1 8
New Mexico
Men 28 5 12 5 10 -1 -3 1 0 -1
Women 10 -1 2 9 -1 0 -5 6 2 -2
New York
Men 575 9 332 191 73 -8 -12 -12 1 1
Women 21 7 80 36 16 -64 -10 -15 -25 -4
North Carolina
Men -120 8 -2 -19 -35 -64 -11 7 -4 0
Women -104 2 5 7 9 -61 -34 -20 -11 -1
North Dakota
Men -26 1 8 -3 -9 -11 -5 -5 1 -3
Women -39 -5 -10 -6 -4 -4 0 -4 1 -7
Ohio
Men -317 -1 -37 -33 -66 -106 -42 -27 4 -9
Women -327 -15 -13 -17 -80 -99 -63 -22 -18 0
Oklahoma
Men -23 1 0 -6 -8 -5 -4 0 5 -6
Women -9 1 10 -6 8 -19 -1 4 -9 3
Oregon
Men -23 -2 -20 7 0 -7 2 4 -4 -3
Women 3 3 19 -8 1 -5 3 -9 2 -3
Pennsylvania
Men -93 11 3 12 26 -92 -40 -3 -20 10
Women -176 10 27 21 -9 -98 -67 -39 -19 -2
Rhode Island
Men -20 -4 -6 0 1 -3 -1 -1 -2 -4
Women -39 -3 -8 -7 -5 -8 -7 1 -2 0
South Carolina
Men 55 4 32 27 8 -27 3 2 -2 8
Women 47 12 11 12 3 -5 2 9 2 1
South Dakota
Men -130 -8 -33 -23 -25 -25 -3 -5 -5 -3
Women -160 -21 -37 -33 -25 -31 -4 -5 -3 -1
Tennessee
Men -78 8 -7 2 -22 -27 -9 -13 -7 -3
Women -56 10 -11 -19 10 -30 -21 2 -1 4
Texas
Men -197 16 -2 62 -55 -145 -50 -33 1 9
Women -127 28 30 45 -17 -134 -78 16 -16 -1
Utah
Men 38 4 15 20 16 -10 -1 -6 3 -3
Women 17 3 9 5 4 -4 -1 -3 2 2
Vermont
Men -4 -1 -5 -2 3 4 -2 0 1 -2
Women 4 0 -3 5 6 -3 0 -2 0 1
Virginia
Men -5 7 3 -23 25 -2 -13 -8 9 -3
Women 51 1 18 53 12 -9 -17 -3 -2 -2
Washington
Men 15 5 31 22 9 -28 -13 -4 -5 -2
Women 25 11 19 24 24 -36 -28 -4 6 9
West Virginia
Men 13 1 3 10 2 2 -1 1 0 -5
Women 0 -1 6 5 1 -6 2 -1 -4 -2
Wisconsin
Men -17 -2 6 14 -2 -10 -9 -1 -3 -10
Women 39 0 21 18 4 -3 -5 2 -1 3
Wyoming
Men 0 1 -1 1 -1 -2 -3 2 3 0
Women -10 3 -6 -6 -3 4 1 -4 0 1
Outlying areas a
Men -41 0 -9 -2 -7 -16 -4 -6 8 -5
Women -38 -2 -8 -14 -5 0 -8 1 -6 4
Other and unknown
Men -2,066 -105 -946 -503 -286 -131 -25 -22 -27 -21
Women -788 -65 -257 -192 -126 -100 -4 -22 -12 -10
SOURCE: Author's calculations using 2017 merged ASA-MGD file.
a. Most of the workers in this category are assigned a Puerto Rico state code. Other outlying areas are American Samoa, Guam, Northern Mariana Islands, and U.S. Virgin Islands.

Estimates by County

Earnings and Employment includes 102 tables showing county-level statistics: 51 (one for each state plus one for Puerto Rico) for workers covered under Social Security and 51 for those covered under Medicare. Each table presents worker counts, taxable earnings, and trust fund contributions, by sex, for all workers, wage and salary workers, and self-employed individuals.

Evaluating the results of the MGD process at the county level is much more complex than assessing the estimates shown by state, sex, and age for three primary reasons. First, the current methodology and the MGD process use distinct sets of county codes and names. The current methodology uses SSA-designated SCCs while the MGD process uses Federal Information Processing Standards SCCs. As a result, ORES must confirm the consistency of the county names used in the two methodologies and determine if any counties are identified in one process and not the other. For example, some states recognize independent cities as well as counties.17 Earnings and Employment includes estimates for those independent cities. Are each of those independent cities also identified in the MGD file?

Second, data nondisclosure requirements significantly affect the quantity of county-level estimates that SSA may publish. More than one-half of the cells showing county-level data in the Earnings and Employment tables are suppressed to comply with disclosure restrictions. Primary cell suppression rules require any unweighted estimate of fewer than 10 workers to be suppressed. For tables that include sex, age, or type-of-earnings breakdowns, SSA must also apply secondary cell suppression. Consider a small county with an unweighted count of 25 workers. If 13 are men and 12 are women, SSA can publish estimates for the total number of workers and workers by sex for this county. However, if 16 of the workers are women and only nine are men, secondary data disclosure rules require SSA to suppress the estimates by sex and publish only the total number of workers for the county (because suppressing only the number of men would leave that value open to computation). Estimates with breakdowns by age and type of earnings only increase the instances that require cell suppression. More than one-half of the estimates of self-employed individuals are subject to primary cell suppression, which requires SSA to apply secondary cell suppression to the corresponding estimates for wage and salary workers.

Third, evaluating county-level estimates is complicated by their sheer volume. In the 2017 edition of Earnings and Employment, the tables showing county-level data for Social Security–covered workers contain 88,182 discreet estimates, as do the tables for Medicare-covered workers.

The comparison of the SCCs assigned via the current methodology and the MGD process takes place in two steps. The first step involves aligning the universe of geographic identifiers: comparing all possible state and county combinations in the two methodologies irrespective of the actual distribution of workers. This step ensures that the SCCs include all possible state and county combinations in both methodologies and not just the combinations found in the CWHS microdata file. This first step allows a direct comparison between the resulting distribution of workers under both methodologies. The second step simply extends the first step by directly comparing the numbers of workers estimated under each methodology.

Identifying All Possible State and County Combinations and Removing Incomplete or Incompatible Records

The universe of state and county combinations is drawn from the current methodology's LABELS file (Chart 1) and the MGD file. Box 1 shows an excerpt from the LABELS file and provides examples of the geographic coding it contains. For example, row 1 shows the codes that designate workers with a missing value for both the state and county, row 2 shows the codes for workers with an “unknown” state code and a missing county value, and row 3 contains the codes for workers with the Alabama state code and a missing county value. Rows 4 through 10 and 69–70 show the data fields that apply when state and county codes are assigned. Row 71 applies the “Statewide” identifier in the county name field and indicates data for all workers in Alabama.

 Box 1.  Sample data fields from current methodology's LABELS file
ROW STATE_SCC COUNTY-SCC COUNTY_NAME STATE_ABBR STATE_NAME SCC
1 00 000   Nn   00000
2 00     Aa   00
3 64     AL Alabama 64
4 64 000 Autagua AL Alabama 64000
5 64 010 Baldwin AL Alabama 64010
6 64 020 Barbour AL Alabama 64020
7 64 030 Bibb AL Alabama 64030
8 64 040 Blount AL Alabama 64040
9 64 050 Bullock AL Alabama 64050
10 64 060 Butler AL Alabama 64060
69 64 660 Wilcox AL Alabama 64650
70 64 660 Winston AL Alabama 64660
71 64 990 Statewide AL Alabama 64990
SOURCE: SSA LABELS file, derived from 2017 CWHS.

To focus the evaluation on counties, ORES removed LABELS file records with the values American Samoa, Armed Forces, District of Columbia, Guam, International Operations, Northern Mariana Islands, Other, Reserves, UNKNOWN, or Virgin Islands in the STATE_NAME field; and with Statewide or no value in the COUNTY_NAME field. ORES used the resulting adjusted LABELS file in comparing the current methodology with the MGD process.

The 2017 MGD file contains records for 178,863,694 workers. To limit the file to records that are relevant for comparison, ORES removed the records of workers with the values American Samoa, District of Columbia, Federated State of Micronesia, Guam, Marshall Islands, Northern Mariana Islands, Palau, UNKNOWN, or Virgin Islands in the STATE_NAME field; and UNKNOWN in the COUNTY_NAME field.

This step removed records for 1,031,176 workers from the file, leaving 177,832,518 workers represented in the modified MGD file. Those records were then exported to a separate data file that sorts the workers across the U.S. counties, which can be compared with the data from the current methodology's modified LABELS file. In both files, the county-level records are arranged by state.

The comparison begins by ensuring that the entries in the state name data fields are consistent in both files and confirming that the number of observations (that is, counties) in the state tables match. For the tax year 2017 data, this process revealed duplicate entries for Waukesha County in Wisconsin (with the same SCC) and two different SCCs associated with Teton County in Wyoming, enabling ORES to remove the duplicate records from the LABELS file.

Next, ORES compared the county names in the two files and identified nonmatching names. This review revealed mismatches caused by variant spellings of the county names, such as the following:

State County name from—
LABELS file (current methodology) MGD file
Illinois De Witt Dewitt
Indiana LaGrange Lagrange
Indiana LaPorte La Porte
Louisiana St. Bernard Saint Bernard
Missouri St. Clair Saint Clair
New York St. Lawrence Saint Lawrence
 

After standardizing the spelling of county names, ORES identified the following counties (or county equivalents) in the LABELS file but not the MGD file:

State County name
Alaska Kusilvak
Puerto Rico Puerto Rico
Montana Yellowstone National Park
South Dakota Oglala Lakota
Virginia Clifton Forge City
Virginia Emporia City
Virginia Nansemond City
Virginia South Boston City
 

Kusilvak Census Area in Alaska and Oglala Lakota County in South Dakota were, until 2015, named Wade Hampton Census Area and Shannon County, respectively. The part of Yellowstone National Park located in Montana was a county equivalent until 1978, when the area was absorbed by two adjacent counties.18 Administrative districts called municipalities are the Puerto Rican equivalent of counties, but because no municipality is named “Puerto Rico,” that term's appearance in the county-name data field seems to be similar to “Statewide,” or a proxy for the entire territory. Of the four independent cities in Virginia named in LABELS but not in the MGD file, Clifton Forge and South Boston voluntarily dissolved their charters as independent cities (in 2001 and 1994, respectively), and became part of their surrounding counties; Nansemond merged with Suffolk Independent City in 1974; and Emporia remains an independent city. ORES is in the process of standardizing the county names in the two files.

Comparing County Assignments

The final step in compiling the data that allows a comparison of the two methodologies' county assignments is to compare the number of counties allocated to each state via the two processes. The number of allocated counties differed in six states: The current methodology allocated one more county to Alaska, South Dakota, and Virginia than the MGD process did, and the MGD process allocated one more county to Montana, Puerto Rico, and Texas, and two more counties to Virginia, than the current methodology did.

Regarding the counties that are identified in the current methodology but not the MGD process, 35 workers were assigned by the current process to Kusilvak Census Area in Alaska, 54 were assigned to Oglala Lakota County in South Dakota, and 66 were assigned to the independent city of Emporia in Virginia. Conversely, the MGD process assigned 1 worker to Wibaux County in Montana, 2 workers to Aibonito Municipality in Puerto Rico, 4 workers to Borden County in Texas, and 91 workers to Manassas Park Independent City and 79 workers to Poquoson Independent City in Virginia; the current methodology assigned no workers to those areas. The records for these 332 workers in 8 areas were removed from the merged county-comparison file because the evaluation requires the state and county names to align across the two methodologies. The resulting file contains records for 1,731,546 workers and 3,202 counties.

Evaluating the County-Level Estimates

With the preliminary processes complete, the resulting merged file allows a comparison of current-methodology and MGD-process county-level estimates of worker counts by type of earnings. Note that the MGD process, unlike the current methodology, does not generate any county-level estimates if the microdata file has no workers with a given type of earnings in that county. This has a pronounced effect on the number of counties to which self-employed individuals are assigned.

Table 17 shows the numbers and percentages of workers whose records have matching and nonmatching county assignments by type of earnings. More than 97 percent of the individuals represented in the county-comparison file have earnings that are taxable under Social Security. Among all workers with taxable earnings, the county-assignment match rate is 94.5 percent. Workers with OASDI taxable wage and salary earnings account for 90.3 percent of the workers in the county-comparison file. For them, the match rate for county assignments is also 94.5 percent.

Table 17. County assignment match rates between the current methodology and the MGD process, by type of earnings, tax year 2017
Records in microdata file All workers Worker records with— Counties represented
Number Percent of workers in microdata file Matching county assignments Nonmatching county assignments
Number Percent Number Percent
Total 1,731,546 100.00 . . . . . . . . . . . . 3,202
Workers with taxable earnings 1,688,819 97.53 1,596,103 94.51 92,716 5.49 3,202
Wage and salary 1,563,334 90.29 1,477,184 94.49 86,150 5.51 3,202
Self-employed 184,978 10.68 170,637 92.25 14,341 7.75 3,140
SOURCE: Author's calculations using 2017 LABELS, MGD, and merged ASA-MGD files.
NOTES: Because some workers accrued both wage and salary and self-employment earnings, the sum of those two categories exceeds the numbers of all workers with taxable earnings (and all workers represented in the microdata file).
. . . = not applicable.

Nearly 11 percent of workers represented in the county-comparison file have OASDI taxable self-employment income. Among them, the match rate for county assignments is 92.3 percent. Note that because the number of self-employed individuals is far less than that of wage and salary workers, 62 counties have at least one of the latter but none of the former, resulting in fewer counties assigned for self-employed individuals (3,140) than for wage and salary workers (3,202). Table 18 shows the county-assignment match rates by state.19 The match rates for all workers range from a high of 99.3 percent for Hawaii to a low of 80.3 percent for Virginia.

Table 18. Number of workers with Social Security (OASDI) taxable earnings for whom the county assigned using the current methodology and the MGD process matches, by state or other area and type of earnings, tax year 2017
State or area All Wage and salary Self-employed
Worker records County code matches Number of counties Worker records County code matches Number of counties Worker records County code matches Number of counties
Number Percent Number Percent Number Percent
All areas 1,668,819 1,577,202 94.51 3,202 1,563,334 1,477,184 94.49 3,202 184,978 170,637 92.25 3,140
Alabama 23,818 22,233 93.35 67 22,493 21,007 93.39 67 2,410 2,160 89.63 67
Alaska 3,657 3,623 99.07 22 3,438 3,405 99.04 22 395 384 97.22 21
Arizona 33,700 32,810 97.36 15 31,764 30,922 97.35 15 3,450 3,283 95.16 15
Arkansas 14,478 13,750 94.97 75 13,562 12,887 95.02 75 1,608 1,466 91.17 75
California 188,006 183,290 97.49 58 172,485 168,093 97.45 58 24,969 23,862 95.57 56
Colorado 28,982 23,907 82.49 63 26,933 22,170 82.32 63 3,626 2,927 80.72 61
Connecticut 19,587 19,060 97.31 8 18,293 17,804 97.33 8 2,225 2,118 95.19 8
Delaware 5,177 5,075 98.03 3 4,962 4,862 97.98 3 422 411 97.39 3
District of Columbia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Florida 104,227 99,721 95.68 67 96,233 92,008 95.61 67 13,422 12,526 93.32 67
Georgia 52,187 46,476 89.06 159 48,814 43,474 89.06 159 6,010 5,176 86.12 155
Hawaii 7,461 7,405 99.25 4 6,939 6,886 99.24 4 850 836 98.35 4
Idaho 8,798 8,499 96.60 44 8,258 7,976 96.59 44 950 893 94.00 42
Illinois 66,187 61,773 93.33 102 62,195 58,023 93.29 102 7,214 6,623 91.81 101
Indiana 36,428 35,043 96.20 92 34,827 33,540 96.30 92 3,115 2,878 92.39 92
Iowa 17,645 16,826 95.36 99 16,687 15,950 95.58 99 1,844 1,662 90.13 98
Kansas 15,738 15,315 97.31 105 14,861 14,465 97.34 105 1,640 1,558 95.00 104
Kentucky 22,161 21,157 95.47 120 20,943 20,002 95.51 120 2,171 2,009 92.54 118
Louisiana 21,576 20,338 94.26 64 20,140 18,973 94.21 64 2,534 2,350 92.74 62
Maine 7,133 6,994 98.05 16 6,600 6,465 97.95 16 912 883 96.82 16
Maryland 33,182 30,384 91.57 24 31,388 28,732 91.54 24 3,368 3,003 89.16 24
Massachusetts 36,422 35,445 97.32 14 34,006 33,083 97.29 14 4,144 3,951 95.34 14
Michigan 51,993 49,248 94.72 83 49,183 46,586 94.72 83 5,202 4,816 92.58 83
Minnesota 32,227 29,958 92.96 87 30,575 28,425 92.97 87 3,198 2,912 91.06 87
Mississippi 14,291 13,452 94.13 82 13,399 12,600 94.04 82 1,691 1,562 92.37 81
Missouri 31,474 28,454 90.40 115 29,766 26,905 90.39 115 3,183 2,802 88.03 115
Montana 5,715 5,622 98.37 55 5,343 5,261 98.47 55 664 626 94.28 53
Nebraska 10,846 10,228 94.30 93 10,245 9,676 94.45 93 1,142 1,047 91.68 84
Nevada 13,924 13,775 98.93 17 13,090 12,953 98.95 17 1,453 1,407 96.83 15
New Hampshire 8,025 7,928 98.79 10 7,518 7,424 98.75 10 825 806 97.70 10
New Jersey 49,323 48,086 97.49 21 46,376 45,192 97.45 21 5,278 5,041 95.51 21
New Mexico 9,730 9,297 95.55 33 9,189 8,771 95.45 33 931 875 93.98 30
New York 105,544 102,894 97.49 62 98,464 95,944 97.44 62 12,456 11,900 95.54 61
North Carolina 52,259 49,334 94.40 100 49,224 46,479 94.42 100 5,460 5,007 91.70 100
North Dakota 4,364 4,233 97.00 53 4,118 4,001 97.16 53 505 475 94.06 50
Ohio 58,197 54,923 94.37 88 54,742 51,646 94.34 88 5,885 5,498 93.42 88
Oklahoma 19,579 17,637 90.08 77 18,444 16,616 90.09 77 2,037 1,774 87.09 76
Oregon 21,652 19,965 92.21 36 20,306 18,729 92.23 36 2,284 2,048 89.67 35
Pennsylvania 68,629 65,547 95.51 67 65,159 62,255 95.54 67 6,410 5,963 93.03 67
Rhode Island 5,925 5,851 98.75 5 5,611 5,540 98.73 5 587 564 96.08 5
South Carolina 25,448 23,533 92.47 46 24,147 22,327 92.46 46 2,448 2,195 89.67 46
South Dakota 5,100 4,743 93.00 65 4,789 4,455 93.03 65 606 545 89.93 63
Tennessee 34,939 33,201 95.03 95 32,584 30,962 95.02 95 4,121 3,789 91.94 95
Texas 133,970 124,157 92.68 252 124,227 115,108 92.66 252 16,603 14,957 90.09 240
Utah 16,266 15,992 98.32 29 15,592 15,328 98.31 29 1,481 1,417 95.68 28
Vermont 3,757 3,614 96.19 14 3,524 3,389 96.17 14 433 406 93.76 14
Virginia 45,625 36,642 80.31 130 43,268 34,768 80.35 130 4,468 3,392 75.92 128
Washington 39,425 38,303 97.15 39 37,367 36,298 97.14 39 3,623 3,444 95.06 39
West Virginia 8,358 7,984 95.53 55 7,972 7,620 95.58 55 681 631 92.66 53
Wisconsin 32,506 31,108 95.70 72 31,055 29,734 95.75 72 2,714 2,531 93.26 72
Wyoming 3,205 3,161 98.63 23 3,026 2,984 98.61 23 355 340 95.77 23
Puerto Rico 9,973 9,208 92.33 77 9,210 8,481 92.08 77 975 908 93.13 75
SOURCE: Author's calculations using 2017 LABELS, MGD, and merged ASA-MGD files.
NOTES: Because some workers accrued both wage and salary and self-employment earnings, the sum of those two categories exceeds the number of all workers with taxable earnings.
. . . = not applicable.

As noted earlier, the critical limitation of the current methodology is that data disclosure restrictions require some estimates to be suppressed, and estimates based on a 1-percent sample of self-employed individuals fall under that rule in many counties. Table 19 shows, for each state, the percentage distribution of counties by the number of self-employed workers with Social Security taxable earnings who have records assigned by the current methodology to that county. Nearly 30 percent of Alabama's 67 counties, for example, have fewer than 10 self-employed individuals assigned to them in the current methodology, and primary cell suppression rules require SSA to suppress the estimates for those counties in Earnings and Employment. Although the estimated number of wage and salary workers exceeds 10 in most if not all of those counties, secondary cell suppression rules require SSA to suppress those estimates as well. In total, more than 37 percent of the county estimates for self-employed individuals (and, therefore, also for wage and salary workers) must be suppressed.

Table 19. Percentage distribution of counties by the number of self-employed individuals with Social Security (OASDI) taxable earnings identified under the current methodology, by state or other area, tax yea  2017
State or area Total 0–9 10–19 20–29 30–49 50–99 100–249 250–499 500–999 1,000 or more
All areas 100.00 37.36 22.90 11.11 9.59 8.38 5.92 2.61 1.43 0.70
Alabama 100.00 29.85 26.87 13.43 8.96 13.43 5.97 1.49 0.00 0.00
Alaska 100.00 66.67 9.52 4.76 4.76 9.52 4.76 0.00 0.00 0.00
Arizona 100.00 13.33 6.67 6.67 13.33 33.33 13.33 0.00 6.67 6.67
Arkansas 100.00 41.33 32.00 10.67 8.00 4.00 4.00 0.00 0.00 0.00
California 100.00 7.14 12.50 5.36 7.14 12.50 21.43 12.50 8.93 12.50
Colorado 100.00 45.90 18.03 8.20 6.56 6.56 4.92 8.20 1.64 0.00
Connecticut 100.00 0.00 0.00 0.00 0.00 37.50 25.00 25.00 12.50 0.00
Delaware 100.00 0.00 0.00 0.00 0.00 33.33 66.67 0.00 0.00 0.00
District of Columbia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Florida 100.00 17.91 13.43 8.96 5.97 14.93 17.91 11.94 4.48 4.48
Georgia 100.00 41.94 25.16 7.74 7.74 10.97 3.87 0.00 2.58 0.00
Hawaii 100.00 0.00 0.00 0.00 0.00 25.00 50.00 0.00 25.00 0.00
Idaho 100.00 57.14 16.67 11.90 4.76 4.76 2.38 2.38 0.00 0.00
Illinois 100.00 34.65 27.72 11.88 8.91 6.93 4.95 2.97 0.99 0.99
Indiana 100.00 28.26 33.70 9.78 11.96 9.78 5.43 0.00 1.09 0.00
Iowa 100.00 38.78 42.86 8.16 3.06 5.10 1.02 1.02 0.00 0.00
Kansas 100.00 67.31 19.23 5.77 2.88 2.88 0.96 0.96 0.00 0.00
Kentucky 100.00 54.24 24.58 11.02 5.08 3.39 0.85 0.85 0.00 0.00
Louisiana 100.00 24.19 35.48 8.06 8.06 12.90 8.06 3.23 0.00 0.00
Maine 100.00 6.25 0.00 37.50 18.75 25.00 6.25 6.25 0.00 0.00
Maryland 100.00 0.00 12.50 16.67 20.83 12.50 20.83 12.50 4.17 0.00
Massachusetts 100.00 0.00 0.00 7.14 14.29 7.14 28.57 28.57 7.14 7.14
Michigan 100.00 21.69 27.71 13.25 13.25 13.25 6.02 2.41 2.41 0.00
Minnesota 100.00 26.44 39.08 11.49 9.20 8.05 3.45 1.15 1.15 0.00
Mississippi 100.00 34.57 38.27 9.88 7.41 7.41 2.47 0.00 0.00 0.00
Missouri 100.00 40.87 30.43 13.04 7.83 2.61 2.61 2.61 0.00 0.00
Montana 100.00 73.58 11.32 1.89 5.66 7.55 0.00 0.00 0.00 0.00
Nebraska 100.00 66.67 19.05 9.52 2.38 0.00 1.19 1.19 0.00 0.00
Nevada 100.00 53.33 6.67 13.33 13.33 0.00 6.67 0.00 0.00 6.67
New Hampshire 100.00 0.00 0.00 0.00 20.00 40.00 10.00 10.00 0.00 0.00
New Jersey 100.00 0.00 0.00 0.00 9.52 19.05 28.57 33.33 9.52 0.00
New Mexico 100.00 36.67 30.00 10.00 13.33 0.00 6.67 3.33 0.00 0.00
New York 100.00 3.28 11.48 16.39 26.23 13.11 13.11 4.92 6.56 4.92
North Carolina 100.00 19.00 20.00 17.00 17.00 16.00 8.00 1.00 2.00 0.00
North Dakota 100.00 74.00 14.00 4.00 6.00 2.00 0.00 0.00 0.00 0.00
Ohio 100.00 10.23 19.32 23.86 18.18 13.64 10.23 1.14 3.41 0.00
Oklahoma 100.00 44.74 26.32 15.79 7.89 1.32 1.32 2.63 0.00 0.00
Oregon 100.00 25.71 22.86 8.57 20.00 2.86 17.14 0.00 2.86 0.00
Pennsylvania 100.00 10.45 16.42 14.93 17.91 17.91 11.94 5.97 4.48 0.00
Rhode Island 100.00 0.00 0.00 0.00 20.00 60.00 0.00 20.00 0.00 0.00
South Carolina 100.00 23.91 19.57 13.04 13.04 10.87 15.22 4.35 0.00 0.00
South Dakota 100.00 69.84 23.81 3.17 0.00 1.59 1.59 0.00 0.00 0.00
Tennessee 100.00 28.42 26.32 16.84 11.58 9.47 4.21 1.05 2.11 0.00
Texas 100.00 44.58 18.33 9.58 10.42 6.25 5.42 2.50 1.25 1.67
Utah 100.00 53.57 10.71 10.71 3.57 10.71 3.57 3.57 3.57 0.00
Vermont 100.00 14.29 7.14 42.86 21.43 7.14 7.14 0.00 0.00 0.00
Virginia 100.00 41.41 21.88 10.16 11.72 6.25 7.81 0.00 0.78 0.00
Washington 100.00 20.51 17.95 20.51 12.82 7.69 12.82 5.13 0.00 2.56
West Virginia 100.00 60.38 18.87 7.55 11.32 1.89 0.00 0.00 0.00 0.00
Wisconsin 100.00 20.83 27.78 18.06 12.50 16.67 1.39 2.78 0.00 0.00
Wyoming 100.00 56.52 8.70 21.74 13.04 0.00 0.00 0.00 0.00 0.00
Puerto Rico 100.00 66.67 20.00 4.00 2.67 5.33 0.00 0.00 0.00 0.00
SOURCE: Author's calculations using 2017 LABELS, MGD, and merged ASA-MGD files.
NOTE: . . . = not applicable.

Further, as noted earlier, publishing county-level estimates by worker sex requires that a county contain a minimum of 20 self-employed individuals to meet the data disclosure threshold. Adding this restriction requires SSA to suppress more than 60 percent of the county-level estimates for self-employed individuals, and secondary cell suppression applies to the corresponding county estimates for wage and salary workers. Given the complexity of incorporating data disclosure procedures into the large number of county-level estimates that would have to be generated, ORES decided to forgo any attempt to compare the estimates of the amount of taxable OASDI earnings for the two methodologies. These circumstances highlight the importance of using a much larger sample of workers to generate the annual employment and earnings estimates.

Conclusion

This article presents two distinct assessments of the MGD process: a procedural evaluation of the completeness and consistency of the MGD data produced over time and a comparison of current-methodology and MGD-process assignment of residential location and demographic data for earners in tax year 2017. The procedural evaluation shows very consistent outcomes for the MGD process across tax years 2015–2020. Although the procedural evaluation identified some minor issues that ORES is investigating, it found that the MGD process is robust and working as expected. In comparing the estimated number of all workers with taxable earnings, the state code assigned in the MGD process matched that of the current methodology for 98.9 percent of the records (Table 9). As was expected prior to the evaluation, the match rate for county assignments was lower, at 94.5 percent (Table 17). The primary reason for occasional disagreement between the two methodologies is a difference in the level of detail with which geographic information is recorded. The current methodology assigns county codes using only the first five letters of the city name and the five-digit ZIP Codes reported on the workers' tax forms. Additionally, the current process uses the SCCs generated for two different data files within the CWHS system and does not consistently select the code from only one of those files. ORES believes that the MGD process is more accurate because it relies on more recently developed software that uses the full address information reported on workers' tax forms to assign SCCs.

Worker sex and age identified in the MGD process match those identified in the current methodology at very high rates (99.3 percent and 98.9 percent, respectively; Tables 13 and 15). The current methodology extracts age and sex information from either of two different CWHS files. In theory, the values in these files should match. Although the nonmatch rates for sex and age are low, ORES believes that the MGD process is the more accurate of the two methodologies because it assigns sex and age identifiers based on a single authoritative source.

The evaluation's results are encouraging. ORES will continue developing the MGD process to provide a streamlined, modern method of generating its annual earnings estimates using a much larger sample of earners. Using a larger sample will eliminate the need for cell suppression in many instances and enable ORES statistical publications to report county-level estimates with much greater depth and accuracy.

Notes

1 A tax year is the calendar year in which wage, salary, or self-employment income is earned.

2 The current methodology was developed in the 1990s, when limited computer storage capacity required ORES to abbreviate city names to their first five letters and use five-digit (rather than nine-digit) ZIP Codes in its geographic data fields.

3 IRS Form W-2 is the annual wage and tax statement that employers file on behalf of employees. Form W-2c, “Corrected Wage and Tax Statement,” is filed when a worker's original W-2 contained any errors or otherwise needs to be updated.

4 Finalist is capable of assigning SCCs using full addresses with nine-digit ZIP Codes rather than relying on the five-digit ZIP Codes, which sometimes cross county lines, and the abbreviated city names that the current methodology uses to assign SCCs.

5 The Numident contains records for all SSNs ever issued. The information is derived from SSA Form SS-5, the application for an SSN, which contains the individual's name, place and date of birth, and sex.

6 For all tax years except 2015, the percentage of workers who were not assigned an SCC by the OEIS/Finalist process was less than 1 percent. The lack of an assigned SCC may be caused by an incomplete address on the worker's tax form or the absence of an address in the underlying Finalist database that contains every U.S. postal delivery address. (The software cross-references the address reported on tax sources with the postal delivery data file to assign SCCs.)

7 This information is included on the tax forms but the OEIS process uses only the address information because its sole focus is on assigning an SCC for each job.

8 I discuss the results of those determinations later.

9 The COVID-19 pandemic led to a significant backlog in Schedule SE processing in 2021.

10 The invalid SSNs can be used in the process of assigning a single SCC to each worker and their use enables ORES to have a complete picture of the geographic location of the worker population in a given tax year. As previously noted, the sex and date of birth for these workers cannot be identified.

11 Compson (2022) discusses the limitations of the methodology currently used to assign geographic codes to workers in the CWHS.

12 The timing of the processing depends on the timing of the tax form submissions by employers and self-employed workers. In SSA, the processing year typically runs through December 15th, meaning that some forms are likely to be submitted and processed early. In addition, the COVID-19 pandemic led to delays in submitting and processing some tax forms.

13 Modifications are necessary because the published estimates are weighted and adjusted to reflect a nationwide population of workers based on a 1-percent sample. To enable a comparison of statistically compatible estimates, the modification entails using the unweighted and unadjusted raw data from the 1-percent CWHS that underlie the published estimates rather than the published estimates themselves.

14 These workers are included in the “Other” category in Earnings and Employment and the “Other and unknown” category in the Annual Statistical Supplement.

15 For brevity, the District of Columbia is referred to as a state throughout the discussion to follow.

16 Because wage and salary workers vastly outnumber self-employed individuals, similarity in the match rates for all workers and for wage and salary workers is a recurring pattern in the evaluation.

17 Hereafter, “counties” can be assumed to include county equivalents such as independent cities, parishes, and census areas.

18 Although Montana dissolved the area as a standalone county equivalent in 1978, the Census Bureau continued to recognize the area as a county equivalent until 1997.

19 Because the District of Columbia does not have county-equivalent subdistricts, it is included among the Earnings and Employment tables showing statistics by state but not among those showing statistics by county. Therefore, Tables 18 and 19 omit values for the District of Columbia (and include those for Puerto Rico, which is covered in the Earnings and Employment tables showing statistics by county).

References

Centers for Disease Control and Prevention. 2022. “Monthly Counts of Death by Select Causes, 2014–2019.https://data.cdc.gov/NCHS/Monthly-Counts-of-Deaths-by-Select-Causes-2014-201/bxq8-mugm.

———. 2023. “Monthly Provisional Counts of Death by Select Causes, 2020–2023.https://data.cdc.gov/NCHS/Monthly-Provisional-Counts-of-Deaths-by-Select-Cau/9dzk-mvmi.

Compson, Michael. 2022. “Improving County-Level Earnings Estimates with a New Methodology for Assigning Geographic and Demographic Information to U.S. Workers.” Social Security Bulletin 82(1): 11–28.

[SSA] Social Security Administration. 2017. “Updated 2018 Taxable Maximum Amount Announced.” Press Release. https://www.ssa.gov/news/press/releases/2017/#11-2017-1.