Pass Phrases, Not Passwords « Vincent Gable’s Blog

June 1, 2009

Pass Phrases, Not Passwords

Filed under: Accessibility,Research,Security,Usability | Authentication, English, Pass-Phrases, Password, Randomness
― Vincent Gable on June 1, 2009

Thomas Baekdal makes a convincing argument for using pass-phrases not passwords (via). It’s excellent advice, and I know I’m not alone in having advocated it for years.

My keyboard has 26 letters, 10 numbers, and 12 symbol keys, like ~. All but spacebar make a different symbol when I hold down shift, giving me 93 characters to use in my passwords. But the number of words that can make-up a pass-phrase is easily in the 100,000s. Estimating exactly how big is a bit tricky, but I will stick with 250,000 here (I think it’s an undercount, more on this later).

We Know How To Talk

The human brain has an amazing aptitude for language. But “passwords” aren’t really words, so they don’t tap into this ability. In fact, we often use words to try and remember the nonsense-characters of a password.

Wouldn’t it make more sense to just use the words directly, if we can remember them more easily?

Hard For Computers, Not Hard For Us

People feel that if security system A is harder for them to use then system B, then A must be harder for an attacker to bypass. But the facts don’t always match this intuition.

What authentication code do you think is harder for a bad guy to hack, the 7 character strong password “1Ea.$]/”, or the mnemonic for the first 3 characters, “One Elvis Amazon”? Certainly “1Ea.$]/” is harder for a person to remember. It feels like it should be harder to break. But a computer, not a person, is going to be doing the guessing, and all it cares about is how big the search space is. There are 93⁷ possible 7 character passwords. Let’s say there are 250,000 possible English words (more on that figure later). Then there are 250,000³ 3 word combinations — meaning an attacker would have to do 260 times more work to guess “One Elvis Amazon” than to guess “1Ea.$]/”.

With pass phrases, easier for the good guys is also harder for the bad guys.

Exactly How Much Harder

The “250,000 word” figure is a bunch of hand-waiving, but I believe it’s an undercount. I picked it, because I wanted a round number to crunch; it’s what Thomas Baekdal picked; and it’s about the size of the Mac OS X words file,

$ wc -l /usr/share/dict/words 
  234936

But liberally descriptive linguists say that the 1,000,000th word will be added to the English Language on June 10th, 2009. The more conservative Webster’s Third New International Dictionary, Unabridged list 475,000 English words. Obviously neologisms, slang, and archaic terms are fine for pass phrases. People like discovering quirky words. I see far more more people embracing the login, “kilderkin of locats”, then rejecting it.

Different conjugations (can) count as different words in pass-phrases. There’s only one entry in a dictionary for swim, but swim, swimming, swam, etc. make for distinct pass-phrases (eg. “Elvis swims fast”, “Elvis swam fast”, etc. Both phrases don’t show up in a google search by the way.) So the real number of words should be a few fold larger than a dictionary indicates.

But not all words are equally likely to be chosen — just as some characters are more popular in passwords. My earlier figure of “250000³ 3 word combinations” was based on the naive assumption that each of the 3 words is independent. But people do not pick things at random. And a phrase is by definition not completely random — it must have some structure. I’m unaware of research into exactly how predictable people are when making-up pass-phrases.

But given how terrible we are at picking good passwords, and how good we are at remembering non-nonsense-words, I am optimistic that we can remember pass-phrases that are orders of magnitude harder to guess than the “good” passwords we can’t remember today.

Fewer Ways To Fail

We’ve all locked ourselves out of an account because of typos or caps lock. But pass-phrases can be more forgiving.

Pass-phrases are caseinsensitive. There’s no need to lock someone out over “ELvis…”.

Common typos can be auto-corrected, much as google automatically suggests words. Consider the authentication attempt “Elvis Swimmms fast”. The system could recognize that “Swimms” isn’t a word, and try the most likely correction, “Elvis Swimms fast” — if it matches, then there’s no reason to ask the user if it’s what they really meant. (Note that only one pass-phrase is checked per login attempt.) I don’t have hard data here, but given how successful google is at interpreting typos, I’d expect such a system to work very well.

Pass-phrases might be more difficult on Phones, and similarly awkward to write with devices. Writing more letters means more work. Predictive text can only do so much. Repeatedly typing 3 letters and accepting a suggestion is clearly more work then just tapping out 6 characters. Additionally, there are security concerns with a predictive text system remembering your pass-phrase, or even a small part of it.

But for computers, pass phrases look like a clear usability win.

Easily Secure Conclusion

(In case you were wondering that was a unique phrase when I wrote this.) Using pass-phrases over passwords (which are really pass-strings-of-nonsense-sybols-that-nobody-can-remember) makes a system significantly harder to crack. Pass-phrases are easier for humans to remember, and a system that uses them can be very forgiving. But as always, the devil is in the details. It’s terrifying to be an early adopter of a new security practice, even if it seems sound.

Comments (7)

7 Comments »

Mnemonic, not pneumonic. The former pertains to a memorization device, the later to the lungs.

Comment by phleabo — June 1, 2009 @ 12:42 pm
Thanks phleabo!

Maybe words are harder then I make them out to be after all ;-)

Comment by Vincent Gable — June 1, 2009 @ 4:28 pm
Good post!

One thing of note is you ought to mention how dictionary attacks can drastically affect how easily a computer can guess a password or passphrase. If you chose a passphrase that shows up as a common ngram then it would be easily guessed – you need to pick something that is sufficiently obscure but it still memorable.

Comment by Toby — June 2, 2009 @ 4:08 pm
Toby,

I just don’t believe there is a way to choose a password (random string) that is strong but strongly memorable. Here’s Bruce Schneier’s advice, which includes a nice explanation of dictionary attacks, n-grams, etc. (If anyone has a better source, I’d love to read it!)

I know anecdotes are of dubious value, but can recite the first stanza of Jabberwocky cold; and it’s full of nonsense words, and I learned it in the 3rd grade. But I can’t remember the UTEID password I used 3 or 4 years ago — and it was insecure enough I was forced to change it. And I often can’t remember my passwords without typing them in.

I didn’t go into it at all, but there’s research into how people use rhyme or other structure to exactly recall epic ballads and whatnot. The analysis of that from a security standpoint is really scary though, because it obviously involves really complex inference rules. And that’s why my example was “One Elvis Amazon”, not “Elvis swims in the Amazon” — because I just don’t know how secure real English is, and I don’t want to guess.

But people are really good at dealing with language, and terrible at dealing with random strings. So I think there’s a lot of potential here.

Comment by Vincent Gable — June 6, 2009 @ 12:22 am
I think it’s a terrific idea! I’m amazed no one has developed it. My only password right now is my middle name and if I REALLY want to be secure, I add my birth year to it. I know it’s insecure but I’ve tried being tricky and always wind up calling Tech Support with a lame story about how I locked myself out of AllNudeDanceGirls.com. Besides, for once in my life it would make me feel smarter than the computer!
ernie
PS – the plural of advertisements is “ads”, not “adds”

Comment by Ernie — June 6, 2009 @ 1:50 pm
I think there are a few issues with this.

First is that enotropy of language is quite low. It is true that a language like English has more than one million of words, but it is also true that everyone only uses an embarassing portion of this daily, maybe a few hundreds or a most thousand words. Language has also grammatical structure. This is the reason why if you do a hearing test with hearing-impaired people, it is better to use unrelated words than phrases. The brain is just filling up what could fit. This are things which make chosing true random words much more difficult.

The other important issue is the attack scenario, and I think this is becoming more important. You argue that attacks are usually from outside, but this does not needs to be the case. On eexample. One day I set up one online banking account. The other day I got several phisihng mails, what I never had received before. it is quite plausible that in the same bank are working some guys who get money for transmitting email adresses of new customers to a criminal organization. The money made with that would be more than enough to pay them. Because the stupid users beome victims of “outside” phishing attacks, the persons inside tha bank are vera had to detect.

Take another example. Many people use site abc to order articles via internet. Let’s assume that this site has a very good reputation, highly skilled technicians and a sctrict security policy. No way for organized cybercrime to get in.

But as we all know, many users use the same passwords for all web accounts they have, like the nice guy above. So, the only thing that needs to happen is that enough of this users log in to a third-class web content provider – an ultracheap internet domain hoster, a mail service, a photo community – and that all these passwords are, as usual, stored in a hashed database in the server. Organized cybercrime just has to get ONE person in who copies that database and passes it outside. Finito. As the password cracker can be run on a fast system, and can test EASILY 200,000 passwords per second, a large number of passwords will be retrieved. Of course, the criminal organization won’t use this precious passwords to log in to the dumb and uninteresting photo community. It will use them to log in at the webshop of abc and buy a host of brand new wide-screen television sets.

The key point is that one, and it is clear to everyone who has ever managed a local area network or the like: Security is not a localized, isolated issue. In today’s webbed and multiply linked world, even a well-managed web shop isn’t more secure than a third-class photo community, JUST BECAUSE THE SAME PEOPLE USE THEM.

Comment by Zafolo — October 9, 2009 @ 4:52 pm
To clarify, I wasn’t suggesting using conversational English as passwords. I was saying that people are better at remembering words than random alphanumeric strings. So let’s use words, drawn from what we know, not just the ones we use to chew the fat.

But maybe conversational English is a good idea, because it’s easier to remember. Yes, entropy is low, from what I’ve read about 1 bit per character. But that might be enough. Say we need a 128 bit password, that requires less typing than one tweet. That’s not all that onerous.

But I don’t know all the details. More research is very much needed here.

RE security isn’t localized: Yes, but so what? It’s important stuff, but it’s outside the scope of what I’m talking about — making password (aka “something you know”) authentication both easier and more secure. Of course, if passwords were easy to remember, we wouldn’t “stupidly” use the same one for different things so often.

Comment by Vincent Gable — October 9, 2009 @ 5:28 pm