The elusive needle in the haystack is getting harder to find. The needle's
not getting any smaller, but the haystack is growing by about 1,000 bales a
minute.
If you're interested in one small detail and you want to find it on the
Internet, can you? Several advanced search engines offer some interesting new
techniques for threading the eye at the touch of a key.
If you believe that when you log on to Yahoo! you're searching
the entire Web, you'd better reconsider. Yahoo! is one of the better search
engines on the Web, but like its vast brethren, it's anything but exhaustive.
When most people want to search, they often turn to a variety of
search-engine portals. These portals are all very similar, yet all different.
Some use Web crawlers, spiders, or robots to automatically go out and examine Web
sites, pursue their links, and store the information in a huge catalog or
database. When users come to these sites and enter their search queries, they're
being run against the huge spidered catalog.
Other search engine portals have human indexers go out and examine Web sites,
or respond to search-engine submission forms and catalog or classify Web sites
according to their content: Cars, entertainment, law, and the like.
When researchers search these portals, they're actually searching the
human-made Web site classifications, much in the way old library catalogs had a
single card describing a single volume on its shelves. When you search the Web
via these portals, you're actually searching the site's classified listings. Some search engine portals do a combination of both.
Other search engine portals fall into what are called metasearch sites. As it
is being used for Internet searching, metasearch means a search of everything
that search sites list. When you enter a search in these sites, your search is
rephrased and entered into a variety of other search engines, according to the
search syntax of each engine. These portals actually search 10 or more search
engines at once, displaying the results in a common view.
And still other search engines are beginning to offer a variety of more
advanced Internet search tools and techniques. Some of these tools involve more
powerful search syntax, search agents, or tweaking of some aspect of the search
process to make it more automated and efficient.
If you really want to leverage a search engine's more advanced features, take
any of the 20 or so top engines, figure out what they're searching--their content
domains--and learn their advanced search syntax.
Most of the top search engines enable you to perform a variety of more
specific searches. Some of those techniques include:
Phrase searching--searching for phrases, often entered in quotes.
Proximity searching--searching for words or phrases that might be similar
to other words or phrases. Some engines also enable you to identify the order of
your proximity terms.
Wild-card searching--searching for terms beginning or ending with any one
or more of a number of letters.
Exclusionary searching--sometimes referred to as NOT searches (retrieve all
documents with a particular word or phrase, but NOT with another particular word
or phrase).
HTML title searching--restricts your search to the title of a document.
Site searching--specifying the site against which you want your search to
be run.
In addition to advanced, more powerful search syntax, some sites are also
beginning to offer intelligent agents that go out and search the Web while you're
sleeping or attending to more pressing business.
These agents can be set up so that they'll perform searches (using any one of
the means previously noted) and periodically notify you via e-mail of the
results.
Still other search tools use a variety of techniques to streamline either the
entering of a search, its processing, or the way in which search results are
displayed. For example, some search-engine tools can be used to display all
results at once, rather than in groups of 10--thereby eliminating the need to
constantly click on the NEXT button to continue reviewing sites.
Contributing Editor Cary Griffith is the president of the Electronic Book
Co., a Minneapolis new media firm.
Sidebar
Next-Generation Search Sites
If it's access to the largest possible index of Internet sites you're after,
consider Fast: Search. To date, no search engine indexes the
entire Web, but these guys are among the giants. Using spidering technology,
Fast: Search has indexed over 300 million Web sites, and counting. At this site
you'll also be given access to potential subsearch categories (its ftp search
function netted 100 million files, an MP3 search, 1 million files), as well as
advanced search techniques. Some of the advanced search features include
multilingual support, and easy-to-use Boolean and domain-search filters.
Fast: Search's latest effort is Fast WAP Search wap.fast.no, one of the first
wireless Web search engines.
For a simple, advanced, novel, and thorough approach to locating Internet
information, why not Ask Jeeves? One of the best features of
this service is the ability to enter your requests in plain English (also
referred to as natural-language searching). For example, if you wanted to know
about electronic books, you could simply enter: "Can you tell me about electronic
books?" Jeeves, of course, will answer.
Ask Jeeves actually catalogs questions and results, and relies in part on
those when other users ask questions. Also, it now offers users the ability to
customize a newspaper for themselves with its Personal Jeeves service.
Some advanced search-engine techniques use algorithms to decide what to
display and how to display it. These engines are called popularity engines; i.e.,
they display search results by Web sites according to how often they've been hit.
Direct Hit, HotBot, Lycos,
LookSmart, MSN Search, and Snap all
either automatically display results according to popularity, or give users the
option to display results in order of popularity.
All of the preceding search engines are also important for a variety of
different reasons. Lycos owns HotBot, providing us with one example of how the
search engine market includes a great deal of market overlap.
Several of the preceding services (and others listed here) also use Open
Directory for its classified listing of Web sites. The Open Directory
project is Netscape's continuing effort to open up the Web. It relies on 20,000
volunteer Web site catalogers to produce the most comprehensive directory of the
Web currently available. Any search engine service can use the Open Directory,
and several do--one of the most notable being Google.com.
Google.com enhances the Open Directory by using its own search technology to
create a tool that analyzes the content of each Web page, sorts the results,
determines if the site is active and, in certain instances, classifies results
into categories.
With regard to modifying how results are displayed, another interesting
advanced search engine is Quickbrowse.com. Here is another
example of a search engine with a twist.
Via a QB-Masterpage, users can easily metabrowse all search portal sites.
Quickbrowse.com enables you to stitch together any number of search engines
portals into one long page. This collection of pages can then be saved,
bookmarked, and even e-mailed to you on a daily basis.
Similarly, the site's QB-Search page enables you to enter a search in a
particular engine, and instead of viewing only the first 10 results at once,
indicating whatever number you want to view. The entire process simplifies
browsing search engines and search results.
Quickbrowse, and other sites listed here, fall into a category of advanced
search engines known as metasearch engines. These run searches against multiple
search engines, displaying the results in a common interface and format.
Mamma.com is another example of a metasearch engine. It also
has a compelling tag line: "The mother of all search engines." The fact that it
takes your search, modifies it for each of 10 major search engines, and then
recompiles the results in its own format means that, in effect, you are searching
10 major search engines at once. One of its more advanced search features is the
ability to search by major categories.