News Column

A Tower of Babble, Built Out of Old Tweets

January 8, 2013

Rex W. Huppke

Like most Americans, I believe every passing thought I have is brilliant enough to be broadcast to the world, archived in the Library of Congress and made available for future researchers to study.

If you find that concept far-fetched, you don't know much about Twitter.

Back in 2010, the 140-character-per-blast social media site announced that all tweets -- past, present and future -- would be stored electronically at the Library of Congress, lending considerable gravitas to people who send tweets like "Eating a sandwich" or "What's up with this weather?? LOL."

While the whole archival arrangement sounded historically ideal, I have learned trouble is brewing.

A white paper released by the Library of Congress this month revealed that, despite 400 inquiries, researchers are not getting access to the library's treasure-trove of word blurts. The problem? A single search of just the tweets from 2006 to 2012 would take 24 hours to run.

Apparently the Library of Congress underestimated how much we all had to say.

The archive currently consists of 170 billion tweets, which consume more than 130 terabytes of server space. (A "terabyte" is a huge, winged, prehistoric reptile capable of holding a large number of tweets.)

Worse yet, the library is receiving about a half-billion more tweets each day and the "thousands of servers" needed to make the Twitter archive searchable are "cost-prohibitive and impractical for a public institution."

Clearly, this is a matter of grave national importance. Who knows how many people are out there right now desperate to read all the tweets I've written on the subjects of "narwhals," "flatulence" and "narwhal flatulence."

Not to mention the significant words of our political leaders and cultural icons.

Consider these actual tweets from Republican Sen. Chuck Grassley of Iowa (@ChuckGrassley):

"I now h v an iphone"

"U hv herad saying: 'deer in headlight look'. It is a frightening xperience when a real deer is there"

And who could forget: "P"

Imagine being a researcher studying the significance of one-letter tweets accidentally sent by political leaders of the early 21st century and having your incredibly important work halted because America is too cheap to make its Twitter archive searchable. It's madness!

Earlier this month, actress and person-who-is-pretty Megan Fox joined Twitter -- @meganfox -- and already she has sent out this critical update on her life: ":("

Kids once dreamed of growing up to study the cultural impact of celebrity emoticons. But those dreams are gone. And that makes me :(

Fortunately, I have some ideas to help the Library of Congress pare down its tweet stash to a more manageable level.

First off, sort all tweets that include the words "Kardashian," "Belieber," "Spears" and "Trump." Take that data, pack it into a large crate, launch the crate in a rocket to the surface of the moon and then fire at least 12 nuclear warheads at the moon. Use a giant space vacuum to collect the moon and tweet dust and then bury that dust at the bottom of the Atlantic Ocean.

Then find a new moon.

Now the Twitter archive should be roughly half its original size. The next step is to delete all tweets that include links to pictures of guys flexing their muscles in front of a mirror. (Hold on to the ones from former U.S. Rep. Anthony Weiner -- those are hilarious.)

The archive has now been reduced by another third.

To cut down the data flowing into the archive, library officials should go to the pipe that carries all the tweets into the building (it's probably in the basement) and install a Burrito Filter. Research has shown that Twitter users send approximately 17 million burrito-related tweets per hour. They range in significance from "Now that's a burrito!" to "literally going to get a burrito now."

What people do with their burritos is not the business of any future researchers, so let's just get those out of there.

The final, and probably most important, step is for folks at the Library of Congress to remember the old saying, "When life hands you a database of human babbling so gigantic that it's rendered useless, start making stuff up."

Consider this description of an actual research request the Library of Congress received: "The student is focusing on real-time microblogging of terrorist attacks. The questions focus on the timeliness and accuracy of tweets during specified events."

That's easy. As anyone familiar with Twitter knows, the tweets in the immediate aftermath of a terrorist attack would all basically say either "OMG, A TERROR ATTACK!!!!!" or "Traffic sux, was there a terrorist attack or sumthing??" Throw a few of those together, pop 'em in an email tube and send them off to that eager young student.

Research request complete, and at no added cost to the taxpayers.

In return for this advice, I ask only a small favor of the Library of Congress. Please delete that one tweet of mine from 11:39 p.m. on Oct. 31, 2011.

Let's just say it would ease the tension between me and a number of narwhal advocacy groups.


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: (c) 2013 the Chicago Tribune. Distributed by MCT Information Services