Number 236 - January 2003

Pattern recognition makes for
a new tool in fight against spam

Similar technology (exists) at PayPal, Yahoo!
by Dan Richman, P-I reporter, December 12, 2002


    You can read these characters, with their irregular edges, sizes and shapes, but computer "bots"--which also send unwanted e-mail--are unable to.

    Ironic.

    Rather than exploiting the things that computers do well--which is how the software industry usually works--Microsoft Corp. and others are trying to protect users by taking advantage of the things a computer can't do well.

    For example, computers can't usually recognize letters or numbers when their edges are irregular and they're set against a complex background. Despite enormous increases in the sophistication and power of computers, that kind of edge detection and pattern recognition remains a tough challenge.

    So yesterday; (December 11, 2002) Microsoft began using just such images on the registration page of its popular free Hotmail e-mail service, hoping that will reduce Hotmail's notorious spam glut.

    On a trial basis, registrants won't see just a space to type in the password they create. They'll also see a box containing a randomly generated series of numbers and letters--irregularly shaped, spaced and aligned, with some random marks thrown in on top of them.

    Registrants, which number in the hundreds of thousands per day; will have to read those characters and then type them into a box.

    Big deal? Not for humans, but it's nearly impossible for "bots," or automated pieces of software that can create accounts. Advanced bots capable of optical-character recognition can make out regularly shaped and spaced letters--but not these weird-looking things.

    The idea is to defeat bots, because they can generate untraceable, un-wanted e-mail, better known as spam, to legitimate users. Microsoft calls the technology Human Interaction Proof. It's the third phase of protecting users from spam, MSN spokesman Lany Grothaus said.

    The other two are server technol-ogy licensed from San Francisco's Brightwater Inc., which aims to stop spam before it even hits the network, and filters developed by Microsoft and built into MSN 8 that examine an e-mail's header, subject line and contents to determine whether it's spam.
    In addition to being annoying, spam consumes computing power and hogs network storage, Grothaus said. Microsoft hasn't quantified the cost savings its anti-spam efforts might provide, but he said "there will be an impact" to reducing it.

    Similar bot-resistant technology is also in use at Web sites PayPal and Yahoo! That technology was created at Carnegie Mellon University.

    It's called Completely Automated Public Test to tell Humans and Computers Apart, or--in the inevitable computer-industry acronym, though this one embodies a nice pun--CAPTCHA.

    One test, called Pix, takes advantage of something else a computer can't do: examine a series of images and say what they have in common.

    It presents six thematically linked photographs, drawings or paintings--of, say; infants or dogs--and asks the visitor what they portray.

    "Current computer programs should not be able to answer this question," the CAPTCHA Web site says.

    Another test, called Gimpy; uses five pairs of distorted, randomly generated, overlapped words against a multicolored background. Users are asked to type out any three of them.

    "While human users have no problem typing the words displayed, current bots are simply unable to do the same," Carnegie Mellon researchers say on their site.

    A variant on Gimpy picks a word or a sequence of numbers at random, renders it into a sound clip and distorts the clip. It then presents the distorted sound clip to its user and asks the user to type out its contents.

    Carnegie Mellon researchers say using CAPTCHAs in addition to passwords could not only reduce spam, but also prevent computer-aided attacks on computers. Such attacks use automated dictionaries that quickly generate millions of potential passwords in an attempt to enter a computer. If a password alone weren't sufficient to gain entry to a computer, such attacks would be irrelevant.

    Eventually; computers may be able to defeat CAPTCHAs and similar technology; But that would mean only that they've become capable of reading nearly anything, making inferences about the logical connections between images and generally acting intelligently And the benefits from that, researchers believe, could far outweigh the drawbacks.

    P-I reporter Dan Richman can be reached at 206-448-8032 or danrichman@seattlepi.com
  Number 236 - January 2003