Teoma’s model might be just what users are looking for.
The vast majority of users navigate the Web via search engines. Yet searching can be the most frustrating activity using a browser. Type in a keyword or phrase, and you’re likely to get thousands of responses, only a handful of which are close to what you’re looking for. And those are located on secondary search pages, only after a set of sponsored links or paid-for-position advertisements.
Still, search engines have come a long way in the past few years. Google developed a sophisticated algorithm that measures the popularity of pages on the Web and posts those listings first in search results. It also separated sponsored links from paid positions to give users a better sense of which pages are advertiser-driven. Ask Jeeves developed a natural language interface that allows users to ask direct questions. AllTheWeb boasts a comprehensive index of Web sites, resulting in a larger list of search results. And metasearch engines like Dogpile list results from those three and others for even more comprehensive searches. And there are dozens of others.
So which search engine should you use regularly? My new favorite is Teoma–the engine behind Ask Jeeves and others. After using the engine for a couple of weeks, I was so impressed with the quality of the search results that I just had to talk with folks from Ask Jeeves. In doing the research, I found out that Teoma is owned by Ask Jeeves properties; so it made sense to talk to folks from both organizations. What follows are conversations with Teoma’s head of Research and Development, Apostolos Gerasoulis, and Steve Berkowitz, president of Ask Jeeves Web properties.
Teoma’s Apostolos Gerasoulis
The buzzword in search engines is relevance. Everyone is claiming to produce search results that are more relevant to the user’s interest than everyone else. When you say you have greater relevancy than, say, Google, what do you mean?
Of course you’ve touched on the central issue here, what separates one search engine from another? For us, relevance is multidimensional, consisting of textual relevance, authority, and–our new thing with Teoma 2.0–community.
You start with a string of words and try to match sites that contain those words to the user’s query. We call this textual relevance. But the Web is chaotic; there’s no uniform way of matching words or phrases exactly to what people want. Still, you try to come up with the most comprehensive list of sites that contain that string of words. The next step is to list the pages in order of authority. We define authority as the most respected pages on the Web.
Sort of like Google, right? They sort results on the basis of the popularity of pages on the Web. If one site is hit more often than another, it will rank higher on the search results?
Here we do things a little differently than Google. Google counts hits without giving them any value. Everyone votes and no vote is weighted differently. We think better authority is defined by experts in the subject matter–these are the people who vote on which sites are the most authoritative, at run time. We call this subject-specific popularity.
Once we have the results ranked by subject-specific popularity, we sort them into community clusters. This is the really interesting thing about Teoma 2.0. Every community uses words differently; they have different rules for the way the language is used. And community members even search for things differently using the same words. Let’s say you type in “Apple,” you might get results for Apple Computer, apple butter, apple picking, etc. These results will be grouped by their community meanings for the word apple. The user can narrow their search by the community they belong to.
So the basic premise is, what is relevant to one member of a community is relevant to other members of a community, and this is how community clusters enhance relevance?
Members of the same community have a common language, they use the same expressions–jargon and what not. If you recognize this, you can use community to find exactly what you mean–the exact match–for certain words. I call this fuzzy matching–these three words are common in this community, but not in that one, etc. My dream is this: We can go into the next level of search by taking advantage of a strength of the Web, which is how sites are interlinked in community clusters. And if we use this, it’s a lot easier for users to narrow their search from community to subcommunity to the exact meaning in the context it was intended.
As these interlinked communities grow within the Teoma index and as users discover the power of Teoma, we’re talking about search on a massive scale. How do you manage this infrastructure?
We use a distributed Linux architecture that harnesses massively parallel computing. The beauty of this is it scales as we need it. We can add more and more nodes in this cluster to maintain peak performance as our traffic scales up. Search Engine Watch recently ranked us third, behind Google and MSN. But we’re catching up. We’re talking about millions of queries per day from Ask.com, Lycos, MetaCrawler, InfoSpace, and Excite, all driven by Teoma. We have 500 million pages in our index and it’s growing as the population discovers the power of community. But as the system scales up, we have not seen performance slow down: The average page loads in 250 milliseconds, and that’s been constant.
How did you discover the formula for Teoma’s success?
I have a computational background. In applied mathematics, you realize that everything is averages. Everything is computational in matters of degree. The key to the success of Teoma is that you can compute averages much faster than trying to compute exact answers. The problem I was confronted with when I started working on Teoma at Rutgers in the early ’90s was how to find a site that matched a given search phrase. I approached this as I had approached other computational problems, in terms of averages. Once you have an approximate answer, you narrow your search down.
This is the way Teoma is built. You get an approximate answer with the textual relevance, then you narrow it down with subject-specific popularity, then you finally get an exact match with community matching. Other search engines tried to get exact textual matches, but language is too flexible to do that. That is why Teoma succeeds where others have failed.
Vital statistics: Apostolos Gerasoulis
Apostolos Gerasoulis founded Teoma Technologies in 2000. There he served as chief executive officer, chief technology officer and was a member of the board. Gerasoulis now serves as Teoma’s Vice President of Research & Development. He graduated from the State University of New York at Stony Brook with a doctorate in applied mathematics. Gerasoulis has been a professor of Computer Science at Rutgers University since 1979.
The business side of searching
Q&A with Steve Berkowitz, president of Ask Jeeves Web properties.
Ask Jeeves is profitable and growing. How did you achieve this success?
There are lots of factors. Buying Teoma.com and making it the search engine on Ask.com was the biggest. That’s resulted in a 28 percent jump in traffic since we made the switch in January [this interview was conducted in February]. But we’ve done lots of other things. Our new spell check product really helps users find information even when they spell the words wrong. And we’ve removed banners from the site and now rely totally on sponsored links, which are clearly set apart from regular search results for a better user experience.
We’ve also added direct answers to users’ questions. We have a small team of editors that will match queries with answers, such as, What’s the capital of Minnesota? Ask Jeeves has always had the natural language interface, but behind the scenes that meant a lot of query rewriting and the results of direct questions were always lists of sites where users could find the answer. Now, in about 2 percent of the cases, we can just give them the answer ourselves, and that percentage is growing.
I presume your editors do other work, like qualifying sites and weeding out link farms.
Yes. They analyze queries and use technology to find the best sites out there on certain topics.
What has Teoma brought Ask Jeeves?
Teoma 2.0 drives our relevancy up, which is the linchpin that pulls everything on the search page. Our relevancy has dramatically improved.
What do you mean by relevancy?
We can measure it in terms of these metrics: It has a larger index, for one–500 million sites. That’s one of the largest site indexes on the Web and it’s growing fast. In terms of how we measure the quality of user experiences we offer, Teoma has vastly improved our offering. The click-through rates are up; people are clicking our search results much more often. This is a measure of how well our search results match queries. We also measure how many pages are abandoned as soon as users click on them. If a site is obviously not what they want, they’ll abandon it to go back to the search results page. The rate of abandonment went from 40 percent in December of 2002 to 19 percent in January 2003. We expect that to go down even further as we continue to refine the interface. Finally, we measure time on the site, which has dramatically increased as users click on more search results per query. This is also an indicator of the quality of those results. As they drill down into communities, they want to explore each link in that community cluster.