Earnings Public-Use File, 2006
Social Security is a social insurance program that pays benefits to insured workers and eligible family members based on covered earnings. The 2006 Earnings Public-Use File (EPUF) contains administrative earnings data (related to the Social Security program) and limited demographic data for more than 4 million individuals. The EPUF sample size is larger than any of SSA's other public-use microdata files containing administrative earnings data.
Extensive information about the Social Security program is available in:
The 2006 Earnings Public-Use File (EPUF) is a data file containing earnings records for individuals drawn from a systematic 1-percent random sample of all Social Security numbers issued before January 1, 2007. With a few minor exceptions, all of the data in this file are from the summary segment of the Social Security Administration's Master Earnings File, the administrative file used to determine an individual's eligibility for Social Security benefits and the amount of benefits paid.
The EPUF consists of two separate, linkable subfiles—one with demographic information (the demographic subfile) and one with annual earnings information for 1951–2006 (the annual earnings subfile). Each record has a unique, randomly assigned identifier allowing linkage across subfiles. The demographic subfile contains 4,384,254 records, one for each individual included in EPUF, and includes aggregate earnings information for 1937–1950. The annual earnings subfile contains 60,326,474 earnings records for 3,131,424 individuals who had positive earnings in at least 1 year during 1951–2006.
Because this public-use file is based on a systematic 1-percent random sample and the sample design is effectively equal to one, all records have a weight equal to 100. Variances and standard errors can be approximated with the standard formulas used for simple random sampling.
To prevent identification of an individual, SSA:
- Removed any identifiable information and evaluated the risk of disclosure from overlap with other SSA public-use files,
- Adjusted earnings amounts to create a range of uncertainty between the amount of earnings reported to SSA and the amount contained in EPUF, and
- Zeroed out earnings records because of age considerations.
Further details on our disclosure protection procedures are available in The 2006 Earnings Public-Use Micro Data File: An Introduction.
Available Files
- Data Dictionary and Field Descriptors
- 2006 Earnings Files, including data dictionary, demographic sub-file, and annual earnings sub-file
- CSV format (285 MB ZIP file, which unzips to 1.9 GB of sub-files)
- SAS format (362 MB ZIP file, which unzips to 1.5 GB of sub-files)
This data set is very large and will not work properly in Microsoft Excel. Data software capable of handling large files should be used.
Related papers: