www.manythings.org/voa/wm

September 2, 2004 - Wordcount.org


Download MP3   (Right-click or option-click the link.)

AA: I'm Avi Arditti with Rosanne Skirble, and this week on Wordmaster: counting words.

RS: If you wanted to show people the 88,000 most common words in English, how would you do it? Jonathan Harris thought of a sentence -- or something that looks like one. He works on interactive art projects. He laid out the words in a straight line, from the most frequently used to the least frequently used.

AA: This is all on a Web site, so you keep clicking to the right to read the words on the screen. Or you can look up specific words to see their ranking. There's also a visual trick that displays the words as a graph. The most common are in really big type; the least common are in really small type.

RS: Jonathan Harris is an artist in the field of "information visualization." What he created is wordcount-dot-org.

JONATHAN HARRIS: "The experience I was trying to create for the user was like an archeologist sort of sifting through sand. And you never really get a look at the whole language at any one time. You really have to zero in one specific part and explore there. And in this sense you can really spend hours just killing time on this and playing around."

RS: "You say it's like one very long sentence, but is there anything connecting these words?"

JONATHAN HARRIS: "That's what's really interesting, and this is the one aspect of WordCount that people have really gravitated toward, as I've found. Because the data is essentially random -- I mean, it's not random, but the fact that a given word is next to another word is only based on how often those words appear in normal English usage. But when you have 88,000 words placed back to back, chances are pretty good that a few of those sequences are going to form some pretty conspiratorial meanings.

"Every morning I sort of come into work and I check my e-mail and I have a pile of e-mails waiting for me from people all around the globe that have found interesting sequences in WordCount. Some of my favorites are words 992 to 995 are 'American ensure oil opportunity.' Then 4304 to 4307 is 'Microsoft acquire salary tremendous.'"

AA: "I like this one, 5283 to 5285, which is 'angel seeks supper.'"

JONATHAN HARRIS: "Exactly. I found that a lot of people suggest that this be used as a good device for people trying to come up with a name for their band."

RS: "How is it determined, the frequency of any given word?"

JONATHAN HARRIS: "The frequency is data that is not generated by me. The frequency data was all coming from this source data that I used, which is the British National Corpus and that's a collection of written and spoken English words that were collected over a few years, I think back in the mid-1990s, by this group in England. It's a little bit dated; I've found one word that people are often surprised does not appear at all in the archive is blog. So clearly the phenomenon of Web logging came up after this data was collected."

AA: "So now you describe this basically as an 88,000-word-long sentence, starting with the word 'the,' the most frequently used word in the English language. What's at the other end?"

JONATHAN HARRIS: "The other end is surprising, and this is a big point of contention for a lot of people that actually find what the last word is. But the last word, surprisingly or not, is conquistador. And if you look through the list and you spend some time with it, you'll find that there are many words much, much further in front of conquistador that you've never even heard of. So clearly there seems to be some errata in their data."

AA: "So conquistador, as in a Spanish conqueror?"

JONATHAN HARRIS: "Some other interesting sort of comparative rankings: war is 304 and peace is 1,155. Love beats hate, Coke beats Pepsi and love beats sex by over 1,000."

AA: "Now this is according to British usage from a few years ago, right?"

JONATHAN HARRIS: "That's right, so maybe this has all changed since then. WordCount went online about five months ago, and almost nobody saw it for about four months. And then back at the beginning of July a friend of mine posted it on his blog and within about a day or two days, the site was getting about 20,000 unique visitors a day.

"And I was getting e-mails from all over the world, mainly people taking issue with some of the apparent disparities in the data, how some seemingly obscure words were being placed ahead of seemingly more common ones, but other people that were just sort of touched by how fun it was. And people, you know, found these little comparisons entertaining, like the Coke and Pepsi, and the love and the hate, and the war and the peace. Things like this."

RS: Jonathan Harris, talking to us from Fabrica, a creative think tank for young artists where he has a year-long fellowship. It’s located near Venice, Italy, and it's where he developed wordcount dot o-r-g.

AA: And that's all for this week. Our e-mail address is word@voanews.com. And our Web site is voanews.com/wordmaster. With Rosanne Skirble.

www.manythings.org/voa/wm