Unable to load image

Study: Social Security numbers are predictable

Tldr ssns suck, if you know someone's birthday and place of birth they're super easy to guess.

https://www.computerworld.com/article/2526471/study--social-security-numbers-are-predictable.amp.html

Social Security numbers (SSNs) may not be as random as believed, as a new study contends that powerful mathematical techniques combined with open-source research can, in some cases, reveal a person's secret number.

The study, published on Monday in the journal Proceedings of the National Academy of Sciences, serves as a stark warning that SSNs are increasingly vulnerable, putting more people at risk of identity theft.

"Unless mitigating strategies are implemented, the predictability of SSNs exposes them to risks of identity theft on mass scales," the study said.

The study comes from Carnegie Mellon University's Alessandro Acquisti, an assistant professor of information technology and public policy, and Ralph Gross, a postdoctoral researcher.

The Social Security Administration responded on Tuesday, saying the public should not be alarmed since there is no foolproof method for predicting an SSN. However, the agency said it is developing a new system to randomly assign SSNs that will be in place next year, although those efforts are unrelated to the study.

"The method by which Social Security assigns numbers has been a matter of public record for years," the statement said. "The suggestion that Mr. Acquisti has cracked a code for predicting an SSN is a dramatic exaggeration."

Gross and Acquisti developed an algorithm that analyzed data from the Social Security Administration's Death Master File, a public database of some 65 million Americans who have died and their SSNs, which is used for antifraud purposes.

They looked for numerical patterns in the deceased's SSNs, drawing correlations between where a person was born and their birth date and how that data relates to their SSN.

"Our prediction algorithm exploits the observation that individuals with close birth dates and identical state of SSN assignment are likely to share similar SSNs," they wrote.

The first three digits of an SSN is an area number, which is based on the Zip code of the mailing address provided when a card was applied for. The next two digits is a group number, which assigned in a "precise but nonconsecutive order between one and 99." The last four digits is a serial number.

The algorithm, which the authors did not detail, successfully ascertained the first five digits for 44% of the records in the Death Master File for people born between 1989 to 2003. The complete SSN could be picked out for 8.5% of those people in under 1,000 attempts. For people born between 1973 and 1988, the algorithm could predict the first five digits for 7% of those in the Death Master File.

"SSNs were designed as identifiers at a time when personal computers and identity theft were unthinkable," the study said.

Other changes in how the Social Security Administration assigns numbers have made guessing even easier. In 1989, the agency stated a program called Enumeration at Birth, assigning SSNs to newborns as part of the birth certification process.

The changes, however, increased the correlation between a person's birth date and all nine digits of a SSN, especially for people in less populated states, making SSNs easier to discover, the researchers wrote.

Additionally, the proliferation of information on social-networking profiles, such as a person's hometown and birth date, puts people at greater risk, since that information could be used to infer SSNs.

"Such findings highlight the hidden privacy costs of widespread information dissemination and the complex interactions among multiple data sources in modern information economies," the researchers wrote.

Attackers could then take the SSNs they think are accurate and run them through credit approval services. Even though many of those services will limit the number of attempts to verify data, botnets could be employed to test vast numbers of SSNs to ensure they're valid, they wrote.

The Social Security Administration also said that it has cautioned the private sector against using SSNs as a personal identifier


https://www.pnas.org/doi/10.1073/pnas.0904891106

Paper abstract:

Information about an individual's place and date of birth can be exploited to predict his or her Social Security number (SSN). Using only publicly available information, we observed a correlation between individuals' SSNs and their birth data and found that for younger cohorts the correlation allows statistical inference of private SSNs. The inferences are made possible by the public availability of the Social Security Administration's Death Master File and the widespread accessibility of personal information from multiple sources, such as data brokers or profiles on social networking sites. Our results highlight the unexpected privacy consequences of the complex interactions among multiple data sources in modern information economies and quantify privacy risks associated with information revelation in public forums.

PDF:

https://www.pnas.org/doi/pdf/10.1073/pnas.0904891106?download=true


Research website:

https://www.heinz.cmu.edu/~acquisti/ssnstudy/

Q. What exactly does it mean that SSNs are "predictable"?

It means that information about an individual's state and date of birth can be sufficient to statistically infer narrow ranges of values wherein that individual's SSN is likely to fall.

"Can,'' because this is true (in general, and simplifying things a bit) only for individuals who received their SSN around the time of their birth (by 2005, at least 92 percent of SSNs assigned to US citizens were assigned at birth [SSA, 2006]; the percentages of individuals receiving their SSNs around the time of their birth started increasing dramatically in the late 1980s as a result of the Enumeration at Birth initiative).

"Ranges of values" means that the predictions are based on statistical inferences: in general, the first 5 digits can be predicted with a very high degree of accuracy with a single attempt - especially for individuals born after 1988 and in less populous states. In some cases, we were able predict the whole 9 digits of individual SSNs at the very first attempt. More often, the predictions produce windows of values that are likely to include the actual 9 digits. These windows can be very large (and, therefore, inaccurate) for certain years and states (for instance, for individuals born in California in 1973), but can get very narrow (and therefore more concerning, in terms of identity theft risks) for smaller states and recent years (for instance, 1 out of 20 SSNs of individuals born in DE in 1996 in our dataset could be identified with just 10 or fewer attempts per SSN).

Q. How do your SSN predictions work?

Our predictions are based on the fact that SSNs are assigned according to a complex yet regular - and therefore predictable - pattern. The prediction works based on the interpolation of an individual's date and state of birth with SSN issuance patterns derived from the so-called "Death Master File", a publicly available file reporting SSNs, names, dates of birth and death, and states of SSN application for individuals whose deaths have been reported to the SSA (also popularly known as SSDI or SSN Death Index). Part of the process is described in the PNAS paper. Certain details have been omitted from publication.

Q. How did you verify your predictions?

We ran two tests. In the first test, we plotted the SSNs of Death Master File (DMF) records versus time for data between 1973 and 2003. We observed statistical patterns that appeared in the DMF data; then, we used these patterns to predict the SSNs of DMF records. In a second test, we interpolated demographic data extracted from students' profiles on an online social network, with patterns extracted from the DMF, and used it to predict the profile owners' SSNs. We verified the accuracy of our predictions against the individuals' actual SSNs using a secure, IRB-approved, anonymized protocol which only produced aggregate statistics, without revealing to us the actual SSN of any individual in particular.

Q. If the algorithm only produces windows of values likely to include the correct SSN, why is this a concern?

Because various public- and private-sector online services may be attacked to test (using brute-force verifications) subsets of variations predicted by the algorithm.

Statistical predictions of windows of possible SSNs do not imply, alone, that an exact SSN will be found. However, when the range of values wherein an SSN is likely to fall gets dramatically reduced, a number of "brute force" attacks which would be otherwise inefficient or unfeasible become possible and feasible. When one or two attempts are sufficient to identify a large proportion of issued SSNs' first five digits, an attacker has incentives to invest resources into harvesting the remaining four from public documents or commercial services. When fewer than 10, 100, or 1,000 attempts are sufficient to identify complete SSNs for massive amounts of targets, attackers can exploit various public- and private-sector online services (such as online "instant" credit approval sites, as discussed in the paper) to test subsets of variations predicted by the algorithm in order to verify which SSN corresponds to an individual with a given birth date.

Q. Have you "broken" some secret code? Doesn't the Social Security Administration publicly discloses information about the assignment scheme?

No, we have not broken a secret code, and yes, the assignment scheme is publicly available. The SSN assignment scheme was created in the 1930s and was not designed to be "secure": back then, it was not imagined that one day SSNs would start being used for authentication. The assignment scheme is complex, and that complexity has led to the belief that the assignment, from the perspective of the user, is effectively random (see ``SSNs are assigned randomly by computer within the confines of the area numbers allocated to a particular state based on data keyed to the Modernized Enumeration System'' [SSA, 2001]). Indeed, we only used publicly available information, and ended up discovering, based on that information, that the randomness is effectively so low that the entire 9 digits of an SSN can be predicted with a limited number of attempts. We also discovered that certain interpretations of the assignment scheme held outside the SSA were, in fact, incorrect.

Q. Isn't this old news? Everybody knows that Area Numbers are associated with states (etc.)

Yes, the SSN assignment scheme is well known, and yes, the existence of a link between Area Numbers and states is public knowledge - but the patterns we discovered (and the accuracy of the predictions based on them) are not.

As noted in the manuscript, the SSN assignment scheme is public knowledge (p. 1). > In fact, previous work in this area used those patterns to estimate when and where a SSN may have been issued (p. 1 and [Wessmiller, 2002], [Sweeney, 2004], [EPIC, 2008]: that is, starting from a known SSN, and trying to infer the state and the range of years when it may have been issued. Instead, our work focused on the inverse, harder, and much more consequential inference: exploiting the presumptive exact date and location of SSN issuance to estimate, quite reliably, SSNs. This became possible because:

  • We discovered (p. 3) that the interpretation held outside the SSA about how Area Numbers are assigned was incorrect: contrary to a commonly held view about their assignment, the same AN is used for 9,999 consecutively assigned SSNs (under the interpretation of the assignment scheme held outside the SSA, the SSA was believed to rotate through all of a state's ANs for each assigned SN. Such scheme would render the AN random for states with multiple ANs, and the predictions we present in this article dramatically less accurate).
  • We discovered (p. 4) that the assignment of the last 4 digits is not only sequential (as indeed stated in the publicly available information about the assignment scheme), but in fact highly correlated with the applicant's date of birth, and therefore not random (note that the SSA states, instead, that ``SSNs are assigned randomly by computer within the confines of the area numbers allocated to a particular state'' [SSA, 2001]). In various cases, we were able to predict the entire 9-digits of an SSN at the first attempt (the odds of that happening by random guess are roughly 1 over 1 billion). This is particularly the case for SSNs assigned after the onset of the EAB (1987 onwards).
  • We discovered that the analysis of publicly available SSNs assigned to deceased individuals (and included in the DMF) allows the inferences of granular assignment patters that make it possible to predict the SSNs of individuals still alive. For instance, the relationship between Area Numbers and states, while public knowledge, would not be sufficient, alone, to predict Area Numbers except in very specific cases (see p. 1): low-population states (such as WY) and certain U.S. possessions are allocated 1 AN each - implying that knowledge that an individual applied for his/her SSN in that state or possession does indeed provide almost certain knowledge of the first 3 digits of his/her SSN. However, other states are allocated sets of ANs. For instance, an individual applying from a zipcode within the state of New York may be assigned any of 85 possible first 3 SSN digits. Therefore, knowledge that an individual applied for his/her SSN in that state provides low odds (1 over 85) of correctly guessing his/her first 3 digits with a single random guess. Those odds do not even include the probability of also correctly guessing the Group Numbers - which vary from 01 to 99 in combination with the different Area Numbers.

In short, without the discovery of patterns linking SSN digits to demographic data, knowledge of the assignment scheme would not be sufficient to predict neither the first 5 digits or in fact the entire 9 digits of an SSN with a degree of accuracy necessary to expose them to practical risks of identification. For instance, the probability of correctly guessing the first 5 digits of the SSN of an individual born in NY in 1998, even assuming knowledge that the SSN was issued within that state, would be 0.012%, and the probability of correctly guessing the entire 9 digits with fewer than 1,000 attempts would be 0.0012%. However, under the more granular understanding of the relationships between assignment scheme and demographic patterns described in the manuscript, those probabilities are 30% and 3% respectively: several orders of magnitude larger, and much more vulnerable to brute-force attacks. See Table 6 on p. 27 of the Supporting Information.

Q. Can the predictability of SSNs lead to identity theft? Does this research publication provides all is needed to acquire SSNs?

No. Aside from the fact that sensitive details were omitted from the article, to move from mere statistical predictions to actual identity theft an attacker needs to exploit holes and weaknesses in the U.S. identity "infrastructure:" the widespread availability of personal, demographic data for millions of individuals, the existence of large botnets of compromised computers, and the lax identity matching and authentication techniques adopted in the credit/financial sectors (among others). Our findings can help combat and decrease identity theft by showing why such known (yet underestimated) weaknesses in our identity infrastructure should finally be addressed; by alerting industry and policy-makers of a new exploit; and by highlighting the need to abandon SSNs as passwords and move toward more secure, efficient, and privacy-preserving means of authenticating identities.

Q. How does this differ from previous research?

Previous research in the area of SSNs focused on detecting SSNs in public databases, using SSNs to link data across multiple data sources, or - in the cases closest to our study - inferring the year[s] and state of issuance of known SSNs. Per se, the existence of SSN issuance patterns is well known - the SSA makes certain details available through public materials, and others (notably, Latanya Sweeney and her "SSN Watch") have used those patterns, plus a combination of public and private SSN data, to estimate when and where a <known> SSN may have been issued [Wessmiller, 2002], [Sweeney, 2004], [EPIC, 2008]. However, our work focuses on the inverse, harder, and much more consequential inference: it shows that it is possible to exploit the presumptive time and location of SSN issuance to estimate, quite reliably, <unknown> SSNs.</unknown></known>

Q. What data do you need to predict SSN? Isn't birth data hard to come by?

Data about SSNs from the so-called "Death Master File," which is publicly available, and demographic data (dates of birth and states of birth) from wherever it is available. Mass amounts of birth data for US residents can be obtained or inferred - often for free, or at negligible per unit prices - from multiple sources, including commercial data brokers (such as www.peoplefinders.com, which sells access to birth data and personal addresses for ``almost every adult in the United States''); voter registration lists (for most states); online free people searches (such as www.zabasearch.com); as well as social networking sites: our estimates indicate that at least 10 millions US residents make publicly available or inferable their birthday information on their online profiles.

Q. From which social networking site did you find data for one of your tests?

There is no specific networking site which is uniquely exposed. The data can be extracted from several such sites, as well as other sources, as noted above.

Q. Aren't SSN in fact as available as birth data?

They are not.

It is true that SSNs are widely available. They have been found in public records of federal agencies, states, counties, courts, hospitals, and so forth [The President’s Identity Theft Task Force, 2007], as well as in personal documents, such as online resumes [Sweeney, 2006]. Companies exchange SSNs in personal information markets, and individuals obtain ``credit reports,'' containing their SSNs, from credit bureaus; stolen SSNs are lucratively exchanged in underground cybermarkets [Franklin, 2007]. However, the GAO found that only few brokers offering SSNs for sale to the general public are actually able to sell whole SSNs [GAO, 2006]. Furthermore, the GAO also found that while still widespread, SSNs are becoming harder to find in public documents [GAO, 2008]. In fact, the number of SSNs widely available may also be decreasing because of numerous legislative initiatives in this area. Various recent initiatives have been focusing on removing SSNs from public exposure or redacting their first five digits [NCSL, 2007], [FTC, 2008], and [GAO, 2008]. On the other hand, birth data remains widely available, as noted above.


:#capysneedboat2::#capyantischizo::#space:

61
Jump in the discussion.

No email address required.

Social Security Administration's Death Master File

:#marseykvlt:

Jump in the discussion.

No email address required.

Everyone put their SSN in this thread to see how predictable they are. I’ll start

563-91-1377

Jump in the discussion.

No email address required.

All I see is *** - ** -***.

Jump in the discussion.

No email address required.

We should compare credit card numbers to see if the same thing happens there

Jump in the discussion.

No email address required.

420-69-1337

Jump in the discussion.

No email address required.

Mines 3218375531 haha oh no hope someone doesnt get my identity :marseybeantonguepoke:

Jump in the discussion.

No email address required.

867-53-5309

Jump in the discussion.

No email address required.

I was going to comment that! :marseyrain:

Jump in the discussion.

No email address required.

615-62-5912

Jump in the discussion.

No email address required.

567-68-0515

Jump in the discussion.

No email address required.

Truly our greatest president

Jump in the discussion.

No email address required.

867-53-0009

Jump in the discussion.

No email address required.

1

Jump in the discussion.

No email address required.

537-29-7197

can OP plox run Aquaph, Florida into the algorithm to see how many digits it would get right :marseyexcited:

Jump in the discussion.

No email address required.

9439-39-4278


:#marseytwerking:

:marseycoin::marseycoin::marseycoin:
Jump in the discussion.

No email address required.

My social was not assigned until well after my birth. I'm resistant to this attack vector.

Jump in the discussion.

No email address required.

Reported by:
  • Pog : Bardfinn

Let's see... Steve Atkins... :marseynotesglow:

Jump in the discussion.

No email address required.

bardfinn

Jump in the discussion.

No email address required.

:siren:BARD BOT ALERT!:siren: Reset the counter! Current counter was: 0 days 01 hours 11 minutes and 32 seconds

Record is 0 days 22 hours 00 minutes and 00 seconds by no_one

longest streak broken in the last 7 days was mrboondigga which was 0 days 09 hours 27 minutes and 14 seconds

███████████████████  19 2022-09-24
████████████████████ 20 2022-09-25
███████              7 2022-09-26

rdrama is currently running at 2.082042e-04 Bardyhertz with 46 total mentions since 9/24/2022

Jump in the discussion.

No email address required.

Well yes, I thought this was just basic info. They are all numeric and limited in length. But you can't do much with a social security number alone.


Krayon sexually assaulted his sister. https://i.rdrama.net/images/17118241526738973.webp https://i.rdrama.net/images/17118241426254768.webp https://i.rdrama.net/images/17156480765435808.webp

Jump in the discussion.

No email address required.

It's not only basic info, but also the perpetual topic of sneeding among crypto nerds, various forms of internet pedants (including the widly popular CCP Grey with 9+ million views on his rant about it ), and foreigners who express bemused bafflement.

The article linked by OP itself is over 10 years old, with no apparent attempt since then to make it more secure, or replace it with some even mildly sensible alternative.

Jump in the discussion.

No email address required.

Yeah well it delayed the application processing for my first apartment for a few weeks because some old mexican b-word used my number and racked up a shitty renting history.

:marseycry::!marseymariachi:

Jump in the discussion.

No email address required.

Darn dude. Did she have your name? She must have had more info than a social. My ex got his stolen and he thinks it was his hillbilly ex-gf or her family. Someone filed with the IRS and got $10k back with his social. Fricked up his taxes and he had to get a CPA to fix it.


Krayon sexually assaulted his sister. https://i.rdrama.net/images/17118241526738973.webp https://i.rdrama.net/images/17118241426254768.webp https://i.rdrama.net/images/17156480765435808.webp

Jump in the discussion.

No email address required.

Nah they apparently had never even ran it before but in previous cases accepted it allowing a history to build w/o anyone double checking the name lol

Jump in the discussion.

No email address required.

So what? You can predict with some degree of reliability first 5 digits of a confirmed (and recorded by SSA/VA/State/Local) dead dude’s SSN.

So what? After much calculation you can maybe guess a rando’s SSN not to mention the other data you’ll need to get any significant amount of money with someone’s identity.

Nigerian scammers aren’t PhDs in computer science, nor are they capable of doing all of these calculations. They’re experts in how r-slurred the average person can be.

Social engineering is way easier to pull off than all of these histrionic articles about algorithms or quantum computers cracking passwords and AES encryption lmao.

There’s a much larger chance that you’ll hand the data or money to a scammer yourself, lmao—and happily.

Lmao, wonder how long it will take for this algorithm to be leaked.

Jump in the discussion.

No email address required.

>The complete SSN could be picked out for 8.5% of those people in under 1,000 attempts.

I think any sort of brute force attack would be highly suspect in every single one of the situations where an SSN is ever used.

Jump in the discussion.

No email address required.

You only need to find it once tho


:#capysneedboat2::#capyantischizo::#space:

Jump in the discussion.

No email address required.

They have the first 5 numbers, and they’re guessing the last 4. There are only 10000 possible numbers. If they guessed entirely randomly, they could get 10% of them within 1000 tries.

:marseygigaretard:

Jump in the discussion.

No email address required.

Thanks for repeating the exact part I quoted. Next tell me which situation lets me submit a thousand applications using an ssn that wouldnt set off any alarm bells

Jump in the discussion.

No email address required.

My point is that their algorithm is worse than literally guessing

Jump in the discussion.

No email address required.

Oh my bad I thought you were just rounding it. Normally i would call this a victim of low sample size but their database was fricking massive i don know how they cocked that up

Jump in the discussion.

No email address required.

I didnt read this but whenever ive had my ssn run in the past year it comes back saying i dont exist and im really curious wtf that means

I want to go back to the plasma clinic for free money but they wont let me because of this >:(

Jump in the discussion.

No email address required.

Why is there a blank comment here. Weird.

Jump in the discussion.

No email address required.

You are an anomaly in the simulation

Jump in the discussion.

No email address required.

Jump in the discussion.

No email address required.

As with all things, if this was finna be a problem it would be already

Jump in the discussion.

No email address required.

Hey schizo I’d like to send you a birthday present, when were you born? Also thinking of sending ur birth hospital for giving us such a wonderful user so that would be good to know as well :marseywholesome:

Jump in the discussion.

No email address required.

Wait, me and my sister have SEQUENTIAL ssns? I just figured my mother applied for them at the same time. Is it instead a massive coincidence?


Secured my spot as a top 100 most memorable rdrama poster

Jump in the discussion.

No email address required.

Nope, that's normal. My grandma applied for my mom, my aunt and my uncles at the same time and they're also sequential

Jump in the discussion.

No email address required.

Hackers hate this one neat trick to extract any and all personal information

![](/images/16642121386644065.webp)

Jump in the discussion.

No email address required.

I dont have a SSN

Jump in the discussion.

No email address required.

Nothing is truly random

Jump in the discussion.

No email address required.

The government should never have been allowed to use numbers

When paying my taxes the government should only be able to comprehend that I made nothing, one dollar, a few dollars, or many dollars

Jump in the discussion.

No email address required.

Literally this has been public for years, they’re still extremely hard to guess. The article says it’s about 8.5% of people are vulnerable to a brute force attack of 1000 attempts, that’s extremely unlikely to succeed and it’ll only work till someone notices it. What a worthless article with extremely rudimentary combinatorics masquerading as groundbreaking.

Jump in the discussion.

No email address required.

Just don’t use ssn as i.d.?

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.