Vincent Gable’s Blog

June 1, 2009

Pass Phrases, Not Passwords

Filed under: Accessibility,Research,Security,Usability | , , , ,
― Vincent Gable on June 1, 2009

Thomas Baekdal makes a convincing argument for using pass-phrases not passwords (via). It’s excellent advice, and I know I’m not alone in having advocated it for years.

My keyboard has 26 letters, 10 numbers, and 12 symbol keys, like ~. All but spacebar make a different symbol when I hold down shift, giving me 93 characters to use in my passwords. But the number of words that can make-up a pass-phrase is easily in the 100,000s. Estimating exactly how big is a bit tricky, but I will stick with 250,000 here (I think it’s an undercount, more on this later).

We Know How To Talk

The human brain has an amazing aptitude for language. But “passwords” aren’t really words, so they don’t tap into this ability. In fact, we often use words to try and remember the nonsense-characters of a password.

Wouldn’t it make more sense to just use the words directly, if we can remember them more easily?

Hard For Computers, Not Hard For Us

People feel that if security system A is harder for them to use then system B, then A must be harder for an attacker to bypass. But the facts don’t always match this intuition.

What authentication code do you think is harder for a bad guy to hack, the 7 character strong password “1Ea.$]/”, or the mnemonic for the first 3 characters, “One Elvis Amazon”? Certainly “1Ea.$]/” is harder for a person to remember. It feels like it should be harder to break. But a computer, not a person, is going to be doing the guessing, and all it cares about is how big the search space is. There are 937 possible 7 character passwords. Let’s say there are 250,000 possible English words (more on that figure later). Then there are 250,0003 3 word combinations — meaning an attacker would have to do 260 times more work to guess “One Elvis Amazon” than to guess “1Ea.$]/”.

With pass phrases, easier for the good guys is also harder for the bad guys.

Exactly How Much Harder

The “250,000 word” figure is a bunch of hand-waiving, but I believe it’s an undercount. I picked it, because I wanted a round number to crunch; it’s what Thomas Baekdal picked; and it’s about the size of the Mac OS X words file,

$ wc -l /usr/share/dict/words 
  234936

But liberally descriptive linguists say that the 1,000,000th word will be added to the English Language on June 10th, 2009. The more conservative Webster’s Third New International Dictionary, Unabridged list 475,000 English words. Obviously neologisms, slang, and archaic terms are fine for pass phrases. People like discovering quirky words. I see far more more people embracing the login, “kilderkin of locats”, then rejecting it.

Different conjugations (can) count as different words in pass-phrases. There’s only one entry in a dictionary for swim, but swim, swimming, swam, etc. make for distinct pass-phrases (eg. “Elvis swims fast”, “Elvis swam fast”, etc. Both phrases don’t show up in a google search by the way.) So the real number of words should be a few fold larger than a dictionary indicates.

But not all words are equally likely to be chosen — just as some characters are more popular in passwords. My earlier figure of “2500003 3 word combinations” was based on the naive assumption that each of the 3 words is independent. But people do not pick things at random. And a phrase is by definition not completely random — it must have some structure. I’m unaware of research into exactly how predictable people are when making-up pass-phrases.

But given how terrible we are at picking good passwords, and how good we are at remembering non-nonsense-words, I am optimistic that we can remember pass-phrases that are orders of magnitude harder to guess than the “good” passwords we can’t remember today.

Fewer Ways To Fail

We’ve all locked ourselves out of an account because of typos or caps lock. But pass-phrases can be more forgiving.

Pass-phrases are caseinsensitive. There’s no need to lock someone out over “ELvis…”.

Common typos can be auto-corrected, much as google automatically suggests words. Consider the authentication attempt “Elvis Swimmms fast”. The system could recognize that “Swimms” isn’t a word, and try the most likely correction, “Elvis Swimms fast” — if it matches, then there’s no reason to ask the user if it’s what they really meant. (Note that only one pass-phrase is checked per login attempt.) I don’t have hard data here, but given how successful google is at interpreting typos, I’d expect such a system to work very well.

Pass-phrases might be more difficult on Phones, and similarly awkward to write with devices. Writing more letters means more work. Predictive text can only do so much. Repeatedly typing 3 letters and accepting a suggestion is clearly more work then just tapping out 6 characters. Additionally, there are security concerns with a predictive text system remembering your pass-phrase, or even a small part of it.

But for computers, pass phrases look like a clear usability win.

Easily Secure Conclusion

(In case you were wondering that was a unique phrase when I wrote this.) Using pass-phrases over passwords (which are really pass-strings-of-nonsense-sybols-that-nobody-can-remember) makes a system significantly harder to crack. Pass-phrases are easier for humans to remember, and a system that uses them can be very forgiving. But as always, the devil is in the details. It’s terrifying to be an early adopter of a new security practice, even if it seems sound.

April 10, 2009

Pre-announcing Prometheus

Filed under: Announcement,iPhone | , , ,
― Vincent Gable on April 10, 2009

For the past month I have been working on my first iPhone application, code-named Prometheus. It’s a dedicated editor for Simple English Wikipedia.

Simple Wikipedia

The Simple English Wikipedia is a wikipedia written in simplified English, using only a few common words, simple grammar, and shorter sentences. The goal is for it to be as accessible as possible. That’s something that really resonates with me. There are many reasons why I think it’s an important project, but I’ll only briefly mention one sappy example.

Children can’t understand all of the Grown Up Wikipedia. Simple Wikipedia helps put more knowledge within their reach. I started writing weekly reports in the 5th grade. I’m sure many schools make students start sooner, and certainly I had to write infrequent reports much earlier. An encyclopedia at a 4th grade reading level would have been a fantastic tool to learn more about the world.

Prometheus

In Greek mythology Prometheus was, “the Titan god of forethought and the creator of mankind. He cheated the gods on several occasions on behalf of man, including the theft of fire.”

Similarly, the Prometheus iPhone app brings knowledge from the American-English Wiki Gods the to the majority of the world that does not speak English natively.

Given the platform, I believe enabling small quick edits is the best way to go. I want to make it easy for a native English speaker to spend 30 seconds correcting grammar while waiting in line.

With more constrained language, writing-aids become even more helpful. Surprisingly, I find myself using a thesaurus more when writing Simple English, to be sure I’m using the clearest synonym I can. There’s a lot more an application like Prometheus can do to help, because it’s not targeting a complex language.

Roadmap

I’ll be brutally honest, the first release is not going to have any fancy analytical features I originally envisioned. My goal is to get a 1.0 release out quickly, then iterate on the feedback and what I learn. I also have a selfish motivation here, because I’m looking for a job, and not having something in the App Store yet has been a big barrier.

What I can promise for 1.0 is

  • A greatly streamlined interface, as compared to editing through Safari.
  • A safe, save-free, experience, where nothing you write is lost if you suddenly have to quit and do something else.
  • Shake to load a random page. (Look, I know that’s nothing to brag about. But it is something Safari can’t do.)
  • A $0 price tag.

I’m trying to have 1.0 ready in about a moth, but that’s an aggressive goal.

Beyond that I would like to add a “simple-checker” that can flag complex terms, and a mechanism for as-you-type suggestions of more common terms. But both of these are technically challenging, and my main priority will be building on what I’ve learned by putting something in front of people.

Teaser

Here’s a picture of what I have running today.

prerelease.jpg

I have a lot to say about the evolving design in future posts. But this should an you an idea of what I’m shooting for — as minimal an interface as I can manage, hopefully condensed into one toolbar.

I don’t have an icon yet, if you can help there please let me know. Unfortunately this is a free project all around, so I’m not in a position to hire someone. My current idea for an icon is a hand
holding a fennel stalk with fire inside, much like an olympic torch. I’d love to hear your ideas.

March 2, 2009

Initial Findings: How Long is an (English) Word?

Filed under: Research | , , ,
― Vincent Gable on March 2, 2009

My brief research into the English language revealed the average character count of a word is eight. Throw together a bunch of a smaller and bigger words, some single spaces and punctuation and you roughly end up with the average 140-character tweet being somewhere between 14 and 20 words. Let’s call it 15.

Rands in Repose

That contradicts the common wisdom I’ve heard: the average word is 5 letters, so divide your character count by 6 to get a word count.

But that was a rule of thumb from the days of typewriters. Hypertext and formatting changes things. For example, every time you see something in boldface on my blog, there are an extra 17 characters for the HTML code, <strong></strong>, that makes the text bold.

Just to poke at the problem, I used wc to find the number of characters per word in a few documents. What I found supports the 6 characters per word rule of thumb for content, but not for HTML code. The number of characters per word in HTML was higher then 6, and varied greatly.

The text of the front page article on today’s New York Times was 5880 characters, 960 words: 6 characters per word.

The plain text of Rand’s webpage claiming 15 chars per word was 6794 characters, 1175 words: 6 words per character. By plain text, I mean just the words of the HTML after it was rendered, so formatting, images, links, etc were ignored. The HTML source for the page, however, was 15952 characters, meaning 14 words per character.

What about technical stuff? The best paper I read last year was Some thoughts on security after ten years of qmail 1.0 (PDF). It has no pictures, just 9517 formatted words. A PDF represents it with 161496 bytes (17 bytes per word), but ignoring formatting it is 62567 characters (7 characters per word).

I’m still looking into how long English words are in practice. Please share your research, if you have an opinion.

January 26, 2009

Compressibility of English Text

Filed under: Research | , , ,
― Vincent Gable on January 26, 2009

Theory:

Some early experiments by Shannon67 and Cover and King68 attempted to find a lower bound for the compressibility of English text by having human subjects make the predictions for a compression system. These authors concluded that English has an inherent entropy of around 1 bit per character, and that we are unlikely ever to be able to compress it at a better rate than this.

67 C. E. Shannon, “Prediction and entropy of printed English”, Bell Systems Technical J. 30 (1951) 55. (Here’s a bad PDF scan)

68 T.M. Cover and R. C. King, “A convergent gambling estimate of the entropy of English”, IEEE Trans. on Information Theory IT-24 (1978) 413-421

Signal Compression: Coding of Speech, Audio, Text, Image and Video
By N. Jayant

Shannon says 0.6-1.3 bits per character of English — 0.6 bits is the lowest value I have seen anyone claim.

Practice:

Just as a datapoint I tried gzip --best on plain-text file of The Adventures of Sherlock Holmes, weighing in at 105471 words, and using 578798 bytes. The compressed file was 220417 bytes.

If we assume the uncompressed version used one byte (8 bits) per character, then gzip --best used about 3 bits per character.

Best so Far

The state-of-the-art in the Hutter Prize, a challenge to compress 150 MB of Wikipedia content, is 1.319 bits per character. But that’s with a program tuned just for that data set, and it took 9 hours to run.

September 26, 2008

Simple English

Filed under: Quotes,Usability | , , , ,
― Vincent Gable on September 26, 2008

There are 400 million native English speakers, but over a billion people who speak English as a second language. … At any given instant on this planet, most people who are speaking English are not native speakers.

Perhaps we should take a good look at common forms of incorrect grammar and see if they actually make our language easier to learn. Maybe we should give a loose leash to those who are trying to make English more accessible.

I am going to try to use simple language and limited slang in my writing. When one considers the population of the world, it seems rather rude to address only the native English speakers.

Aaron Hillegass

Powered by WordPress