the importance of fricking full-text search

by phil on Saturday Apr 30, 2005 10:28 AM

like omg, if you make a fricking search engine, why don't you just make it full-text, and have it search everything?

iTunes search? why doesn't it also search the file names?

Outlook search? why doesn't it default let me search every single piece of text? and why does it default to a subfolder? if i knew what folder the email was in, would I have to be searching?

Firefox bookmark search? why doesn't it search the description field of the bookmark? what use is the description field if I can't search it?

I have to put my bookmarks in gmail now, and label them as bookmarks, just because Gmail search just works. it's comprehensive.

I bet outlook's search engine was just like one employee's sideproject that he made it "just good enough" so that his upper's wouldn't bother them.

seriously, you know how long it took for the Internet to make fulltext search? Sure there were web search engine's since 1994 (webcrawler), but like, many many search engines up until Google in 2001 actually wouldn't do full-text searches! they'd instead use some fancy "relevance matching algorithm" that figured out what you wanted!

In fact, my primary interest in using Google in 2001 was not PageRank, but the guarantee that the search terms I was typing would all be on the pages that I go to.

Before then, if I wanted to guarantee that my search terms would be on my search results, I would use opentext.com or something.

here is the golden rule for searches:

1. Every search term must be found on each result. (intersection searches)
2. Every searchable term should be visible to the search engine. (comprehensive searches).

1. In intersection searches, the more terms you enter shouldn't give you more results. If I wanted more results, I'd run more queries with separate keywords, not combine them. To combine terms means to create a filter list.

2. Comprehensiveness. I didn't spend my entire year emailing people, typing in all those email addresses, and full names, and subjects, and bodies, from different email accounts to have only some of that information search able? Every field that is not included in your search is about a Library of Congress full of text that is just trashed.

If it can't be searched, it doesn't exist. So all you search engine designers, make all fields searchable, and allow me to drill down. You may not have pagerank because your data dosen't link to each other, but this is a minimal decent practices solution.

You'd think this was common sense, even among smarty-pants programmers, but 9/10 programs I use don't pass this basic test. wtf.


Creative Commons License