Vincent Gable’s Blog

March 2, 2009

Initial Findings: How Long is an (English) Word?

Filed under: Research | , , ,
― Vincent Gable on March 2, 2009

My brief research into the English language revealed the average character count of a word is eight. Throw together a bunch of a smaller and bigger words, some single spaces and punctuation and you roughly end up with the average 140-character tweet being somewhere between 14 and 20 words. Let’s call it 15.

Rands in Repose

That contradicts the common wisdom I’ve heard: the average word is 5 letters, so divide your character count by 6 to get a word count.

But that was a rule of thumb from the days of typewriters. Hypertext and formatting changes things. For example, every time you see something in boldface on my blog, there are an extra 17 characters for the HTML code, <strong></strong>, that makes the text bold.

Just to poke at the problem, I used wc to find the number of characters per word in a few documents. What I found supports the 6 characters per word rule of thumb for content, but not for HTML code. The number of characters per word in HTML was higher then 6, and varied greatly.

The text of the front page article on today’s New York Times was 5880 characters, 960 words: 6 characters per word.

The plain text of Rand’s webpage claiming 15 chars per word was 6794 characters, 1175 words: 6 words per character. By plain text, I mean just the words of the HTML after it was rendered, so formatting, images, links, etc were ignored. The HTML source for the page, however, was 15952 characters, meaning 14 words per character.

What about technical stuff? The best paper I read last year was Some thoughts on security after ten years of qmail 1.0 (PDF). It has no pictures, just 9517 formatted words. A PDF represents it with 161496 bytes (17 bytes per word), but ignoring formatting it is 62567 characters (7 characters per word).

I’m still looking into how long English words are in practice. Please share your research, if you have an opinion.

February 20, 2009

Bad Apples

Filed under: Quotes,Research | ,
― Vincent Gable on February 20, 2009

What they found, in short, is that the worst team member is the best predictor of how any team performs. It doesn’t seem to matter how great the best member is, or what the average member of the group is like. It all comes down to what your worst team member is like. The teams with the worst person performed the poorest.

Jeff Atwood

February 19, 2009

“Enhanced” Sports

Filed under: Research | , , , , , , ,
― Vincent Gable on February 19, 2009


200px-Oscar_Pistorius-2.jpg


Oscar Pistorius
, “The fastest man on no legs”, uses carbon-fiber prosthetic feet to run … apparently more efficiently then an able-bodied sprinter. And if he isn’t more efficient today, it’s a sure bet that technology will surpass mere flesh in the near future (at least in sprinting).

The cultural, ethical, and even technological, issues surrounding cyborg/transhuman athletes are fascinating.

The Genie is Out of the Bottle

Let’s be blunt, technology plays a roll in every sport today, and there is no going back.

Technology goes into equipment as basic as a shoe — making them lighter, springer, and more adhesive then anything humans have worn before.

The impact of better equipment was popularly recognized by at least the 1920s (if you have an earlier source please share),

Much of Improvement in Baseball Is Attributed to Evolution and Steady Progress of Mechanics and Invention

WHEN Babe Ruth hits three home runs in one game or the home team cracks out a barrage of base hits to score seven or eight times in one inning, it does not necessarily mean that long-distance hitting in modern baseball comes from superiority of today’s players over those of years past. The truth is that much of the improvement in the game itself and in the proficiency of its players has come from evolution and progress in science and invention.

Popular Mechanics, May, 1924

Then there’s the elephant in the room: the athlete’s body, and the “stuff” that goes into it.

The prisoners dilemma essentially forces athletes to dope — because the only way to be sure your opponent does not have an advantage over you is to take advantage as well. (This is the best overview of the doping problem, and solution I have seen.)

But it’s not just drugs and steroids. There’s also nutrition, and sports medicine. Where exactly is the line between a supplement and a drug? More chemical sophistication goes into todays vitamins than the drugs of the past.

Modern training regimens and equipment seem to have more to do with the science of conditioning then the love of a sport. It’s interesting that someone who just played all day would be at a disadvantage compared to someone who used targeted exercise machines.

Genetic engineering might be the most interesting future trend to watch. Obviously genetics are a huge part of determining physical ability.

What do We Want?

We love to watch superhumans compete. Professional athletes are supermen, since they play significantly above average human ability.

But we also want a “fair” and “honorable” fight. I honestly don’t know exactly what it all means. It’s OK to have an unplanned genetic advantage. Drugs are bad, even if everyone has access to them. We love the underdogs the most, yet celebrate the winners who have the most funding going into their training.

What’s Sportsmanlike

It’s not whether you win or lose, it’s how you place the blame.

–Oscar Wilde

The problem with giving disabled athletes accommodations, like carbon fiber feet, is that they are only work until they start winning. Then accommodations become an unfair advantage. It doesn’t matter if they are unfair in reality, because they look unfair.

But there’s a quality of life problem with essentially saying, “you cripples can only play with the other cripples”.

Accommodations in the context of sportsmanship is a sticky issue, and I don’t pretend to have the answers. But I’m not necessarily against “play until you win”, as a lesser of many evils. Sometimes playing is more important then winning.

One analogue is gender differences. There is good reason behind having separate men, women, and weight categories for sports. But in recreational play, mixed gender teams are often the norm (Ultimate seems to work very well with mixed gender teams).

But there’s a good case to be made for letting “enabled” athletes to compete separately, but to their fullest — essentially making the Paralympics the Cyberlimpics.

Conclusion

Maybe these pretty women will distract you from realizing I don’t have any answers, (Via Sensory Metrics):

Bilde 1-1.png

17453042_p1_mullins2.jpg

February 12, 2009

The Values of Science

Filed under: Quotes,Research | , ,
― Vincent Gable on February 12, 2009

Science is not a monument of received Truth but something that people do to look for truth.

That endeavor, which has transformed the world in the last few centuries, does indeed teach values. Those values, among others, are honesty, doubt, respect for evidence, openness, accountability and tolerance and indeed hunger for opposing points of view. These are the unabashedly pragmatic working principles that guide the buzzing, testing, poking, probing, argumentative, gossiping, gadgety, joking, dreaming and tendentious cloud of activity — the writer and biologist Lewis Thomas once likened it to an anthill — that is slowly and thoroughly penetrating every nook and cranny of the world.

…It is no coincidence that these are the same qualities that make for democracy and that they arose as a collective behavior about the same time that parliamentary democracies were appearing. If there is anything democracy requires and thrives on, it is the willingness to embrace debate and respect one another and the freedom to shun received wisdom. Science and democracy have always been twins.

Dennis Overbye

January 26, 2009

Compressibility of English Text

Filed under: Research | , , ,
― Vincent Gable on January 26, 2009

Theory:

Some early experiments by Shannon67 and Cover and King68 attempted to find a lower bound for the compressibility of English text by having human subjects make the predictions for a compression system. These authors concluded that English has an inherent entropy of around 1 bit per character, and that we are unlikely ever to be able to compress it at a better rate than this.

67 C. E. Shannon, “Prediction and entropy of printed English”, Bell Systems Technical J. 30 (1951) 55. (Here’s a bad PDF scan)

68 T.M. Cover and R. C. King, “A convergent gambling estimate of the entropy of English”, IEEE Trans. on Information Theory IT-24 (1978) 413-421

Signal Compression: Coding of Speech, Audio, Text, Image and Video
By N. Jayant

Shannon says 0.6-1.3 bits per character of English — 0.6 bits is the lowest value I have seen anyone claim.

Practice:

Just as a datapoint I tried gzip --best on plain-text file of The Adventures of Sherlock Holmes, weighing in at 105471 words, and using 578798 bytes. The compressed file was 220417 bytes.

If we assume the uncompressed version used one byte (8 bits) per character, then gzip --best used about 3 bits per character.

Best so Far

The state-of-the-art in the Hutter Prize, a challenge to compress 150 MB of Wikipedia content, is 1.319 bits per character. But that’s with a program tuned just for that data set, and it took 9 hours to run.

January 17, 2009

Lessons From Fast Food: Efficiency Matters

Filed under: Design,Programming,Quotes,Research,Usability | , ,
― Vincent Gable on January 17, 2009

Every six seconds of improvement in speed of service amounts to typically a 1% increase in sales. And it has a dramatic impact on the bottom line.

–John Ludutsky, President of Phase Research, quoted on the “Fast Food Tech” episode of Modern Marvels, aired 2007-12-29 on the History Channel.

I wouldn’t expect things to be much different in the software world. The faster you get your burger bits the better.

UPDATED: 2009-02-05:
Apparently people want service much faster from software. Greg Linden reports,

Half a second delay caused a 20% drop in traffic. Half a second delay killed user satisfaction.

This conclusion may be surprising — people notice a half second delay? — but we had a similar experience at Amazon.com. In A/B tests, we tried delaying the page in increments of 100 milliseconds and found that even very small delays would result in substantial and costly drops in revenue.

If the Mr Ludutsky’s figure is accurate, a 20% drop in fast-food revenue would require a two minute delay. Does this mean every second spent waiting on a computer is as bad as waiting 4 minutes in meatspace? I don’t know — I’m doing a lot of extrapolation from hearsay. But it’s something to consider.

January 13, 2009

Wireless Network Names Don’t Tell You Much

Filed under: Design,Research
― Vincent Gable on January 13, 2009

It’s been my experience that the names of wireless networks do not tell you much about where they are. I do not use wireless network names as default names for anything.

Be aware that wifi network names are sometimes offensive! as I write this, one of the visible wireless networks is “DANKNASTY BALLASAUCE” — I kid you not. But more importantly, they are mostly generic and useless: “linksys”, “netgear1”, “2WIRE985”, etc.

Before you use a wireless network name to for something, make sure there isn’t something more appropriate you could use.

January 9, 2009

Biometrics

Filed under: Design,Quotes,Research,Security | , , , ,
― Vincent Gable on January 9, 2009

Summary of an article by Bruce Schneier for The Guardian,

Biometrics can vastly improve security, especially when paired with another form of authentication such as passwords. But it’s important to understand their limitations as well as their strengths. On the strength side, biometrics are hard to forge. It’s hard to affix a fake fingerprint to your finger or make your retina look like someone else’s. Some people can mimic voices, and make-up artists can change people’s faces, but these are specialized skills.

On the other hand, biometrics are easy to steal. You leave your fingerprints everywhere you touch, your retinal scan everywhere you look. Regularly, hackers have copied the prints of officials from objects they’ve touched, and posted them on the Internet. …

Biometrics are unique identifiers, but they’re not secrets.

biometrics work best if the system can verify that the biometric came from the person at the time of verification. The biometric identification system at the gates of the CIA headquarters works because there’s a guard with a large gun making sure no one is trying to fool the system.

One more problem with biometrics: they don’t fail well. Passwords can be changed, but if someone copies your thumbprint, you’re out of luck: you can’t update your thumb. Passwords can be backed up, but if you alter your thumbprint in an accident, you’re stuck. The failures don’t have to be this spectacular: a voice print reader might not recognize someone with a sore throat…

In Why Identity and Authentication Must Remain Distinct, Steve Riley cautions,

Proper biometrics are identity only and will be accompanied, like all good identifiers, by a secret of some kind — a PIN, a private key on a smart card, or, yes, even a password.

October 11, 2008

An AppleScript Quine

Filed under: Announcement,Programming,Research | ,
― Vincent Gable on October 11, 2008

Here is my first quine. It’s written in AppleScript, because I wasn’t able to find, another AppleScript quine.

When run quine.applescript will make Script Editor create a new window containing the source code. It’s particularly meta if you use Script Editor (the default application) to run the quine, because it’s not just printing itself, it’s writing itself in the IDE!

Fortunately, the problems I’d originally had with Script Editor and the quine seem to have been fixed.

EDITED TO ADD: Here’s the quine’s source, but you really should download it to run it, because wordpress has a habit of subtly mucking with copied code…

set d to "on string_from_ASCII_numbers(x)
	set s to ASCII character of item 1 of x
	repeat with i from 2 to number of items in x
		set s to s & (ASCII character of item i of x)
	end repeat
end string_from_ASCII_numbers
set set_d_to to {115, 101, 116, 32, 100, 32, 116, 111, 32}
set scriptEditor to {83, 99, 114, 105, 112, 116, 32, 69, 100, 105, 116, 111, 114}
set quine to string_from_ASCII_numbers(set_d_to) & quote & d & quote & return & d
tell application string_from_ASCII_numbers(scriptEditor) to make new document with properties {contents:quine}"
on string_from_ASCII_numbers(x)
	set s to ASCII character of item 1 of x
	repeat with i from 2 to number of items in x
		set s to s & (ASCII character of item i of x)
	end repeat
end string_from_ASCII_numbers
set set_d_to to {115, 101, 116, 32, 100, 32, 116, 111, 32}
set scriptEditor to {83, 99, 114, 105, 112, 116, 32, 69, 100, 105, 116, 111, 114}
set quine to string_from_ASCII_numbers(set_d_to) & quote & d & quote & return & d
tell application string_from_ASCII_numbers(scriptEditor) to make new document with properties {contents:quine}

July 30, 2008

Hell Hath No Fury…

Filed under: Research,Security | , , , ,
― Vincent Gable on July 30, 2008

The New York Times ran an article on research into the economics of vengeance. It’s fairly interesting, but to quote the article, “Most of (the) findings confirm what researchers in different disciplines have already found”.

The meat:

people who have been victims of the same kind of crime … tend to be more vengeful, but not if they have been victims of a different crime…

Vengeful feelings are stronger in countries with low levels of income and education, a weak rule of law and those who recently experienced a war or are ethnically or linguistically fragmented.

…most surprising was that women turned out to be more vengeful than men. If a woman had been a victim of (a crime), she was 10 percent more likely to (seek a stricter punishment); for men the figure was 5 percent.

« Newer PostsOlder Posts »

Powered by WordPress