The problem with words 3/20/01 Enterprise Pursuits hed: Filter Foibles dek: The problem with words by Nelson King
I know that corporations suffer spam right along with their employees. It ties up bandwidth, clogs servers, distracts workers, and generally is something more than a nuisance. The prescribed solution in the form of a variety of software has almost always been filtering. This “technology” is supposed to catch and eject unwanted material in all kinds of situations–e-mail or surfing. But does it work?
In the sense that filtering rejects, ejects, or avoids certain content, yes it works. However, since the introduction of commercial filtering programs, people have been warning that it could also exclude information that we might want or need.
One of the most repeated stories comes from the outfits that issue virus warnings. The recent “Naked Wife” virus warning was a classic. Almost without exception, filtering programs keyed on the word “naked” and either trashed the message or bounced it. The bounced traffic alerted the issuing companies to the problem.
The underlying problem of filtering is familiar to database management people, and should be familiar to anyone who does extensive searches on the Internet. Cast a few crumbs upon the waters and you get a whole bakery in return. One search for the word “filtering” nets 12,365 hits. That’s because the word occurs in many contexts–cars, coffee, water purification, and so forth. People who are skillful with searches (and database management) learn how to include related words to help qualify the search–for example, “software filtering” might eliminate references to coffee and oil filters.
The filtering placed on e-mail or Internet browsers is like a search in reverse–the filter allows only those items that don’t fit the criteria. The principle is the same. Whether searching or filtering, the basic operation is called, in computer jargon, “string matching.” This simply means that if you specify a word (filtering) then the software will look for occurrences of filtering in the document or Web site.
This is about as sophisticated as deducing somebody’s personality from their name. For the most part, string matching does not take context into account. If the matches can’t be conditioned adequately, the results may become virtually useless. This certainly happens with Internet search engines, and it can easily happen to filtering programs.
Then there is the problem of images. This includes the vast bulk of pornography. If a Web site or e-mail contained no forbidden words but instead said something like “Interesting gallery” followed by a string of prurient pictures, no filtering program would catch it.
The dilemma caused by filtering has a couple of aspects worth noting. How can a company quantify the advantages of removing temptation and spam versus the loss of useful or even critical information? The other aspect is the role of a technology fix: Sure, the current filtering techniques are basic, but won’t artificial intelligence provide solutions?
Well dear reader, serious AI has been around for a couple of decades, but you don’t see it being applied very effectively on Web searches. Even though there’s a lot at stake, including big money, the truth is that competent searches and filtering will take better hardware and AI than we currently muster. How many years? Nobody is predicting.
In the meantime, what does it take to overcome the lack of technology and increase the effectiveness of searches and filtering? If you intuited “people,” you are probably not an administrator, economist, or accountant. The fact is, if you want good filtering, meaning removing the unwanted material and leaving things which are similar but wanted, then a company must invest in the best possible filtering software and assign capable staff to maintain and configure it.
There is no 100 percent either way. Some things will slip through that shouldn’t, and you’ll miss some things you wish you didn’t. However by using the intelligence of your IT people, you might be able to cobble together a system that reduces the risks. Of course, you could also just rely on the intelligence of your employees to do their own filtering. Ah, sheer folly.
Editor at Large Nelson King also writes Pursuits monthly for ComputerUser magazine.