|
Most search engines use
spiders, which crawl through the Web picking up every link on every
page, without considering relevancy. "It's ironic, but the bigger the
Internet gets, the more difficult it is to find a simple, accurate
answer to your questions," said Lawrence Page, a founder of Google,
along with Sergey Brin. (Google derives from the word "googol"--the
number 1 followed by 100 zeroes.) Even the largest search engine,
Inktomi, has indexed only about half the web. Yahoo! isn't really a
search engine at all; it has a team of editors who index the Internet.
If you want a page to show up on a Yahoo! search, you must submit a form
with information about the site.
One of the biggest problems to be solved in
trying to retrieve the information you want is "the verbal disagreement
problem" Verbal disagreement means that you and someone else may not use
the same words or phrases to describe what you're looking for or what
you've found. A search for "automobile," may miss pages that use "car."
Different search engines or indexes use
different methods for determining which pages will be at the top of a
listing. Some page makers repeat words many times on a page, in
invisible type, so a search engine thinks they are more relevant than it
otherwise would. Some searchers or indexers accept money (x cents per
click) from page owners, which encourages them to put certain pages
higher in their listings than they otherwise would.
What Google's founders saw, apparently before others, was that hyperlinks could be used to assign
|
values to web pages-the more links, the more
value. They go even farther by assigning values based on the pages
where those links are found. A link on New York Times' home page carries
more value than a link from a personal home page might, because more
pages link to the Times' page than the personal one.
IBM's Almaden Research Center's Clever
Project looks at links much like Google does, but can assign them
different values, depending on the search request. When it finds a page
filled with useful links on a subject, it calls it a "hub" page. "Then,
unlike Google, it analyzes the hubs to discover 'authorities'--pages
that online experts in [a particular subject] regard as the most useful
and interesting--and uses the authorities to judge the quality of the
hubs. Emerging from all that is ...[what IBM's Andrew Tomkins calls]
`the footprint of a community. The surprising thing is that as the
number of pages grows, the number of communities shrinks. ...This is a
way to understand the emergence of ...patterns ...trends, ideas,
communities. It could be beyond search. It could give people what they
are looking for.'"
Because there doesn't seem to be much money
in simply searching, many search engines and web indexers have become
portals, which encourage other uses and accept advertising.
The article ends with this, from Google's
Page, "The great thing about search is that we are not going to solve it
any time soon. ...I see no end to what we need to do. If we aren't a
lot better next year, we will already be forgotten."
|