Today I searched on Google for the words "spider identification" and got a site with identification of brown spiders .
I meant to search for "search engine spider identification" but I was so sure that "spider" is a computer program that I forgot the fact that this word is a homonym, a word that has multiple meanings. It quite amused me and I thought it would be nice to collect false positives on this web page and to update the page whenever I find new ones. Readers are invited to add False Positives on their comments form.
On Wikipedia I found a False Positive right in the definition of the term "False Positive".
In computer database searching, false positives are documents that are retrieved by a search despite their irrelevance to the search question. False positives are common in full text searching, in which the search algorithm examines all of the text in all of the stored documents in an attempt to match one or more search terms supplied by the user.
Most false positives can be attributed to the deficiencies of natural language, which is often ambiguous: the term "home," for example, may mean "a person's dwelling" or "the main or top-level page in a Web site." The false positive rate can be reduced by using a controlled vocabulary, but this solution is expensive because the vocabulary must be developed by an expert and applied to documents by trained indexers.
3. Pray and 'Bare feet'
Beware of homonyms:
Two words are homonyms if they are pronounced or spelled the same way but have different meanings. A good example is 'pray' and 'prey'. If you look up information on a 'praying mantis', you'll find facts about a religious insect rather than one that seeks out and eats others. 'Bare feet' and 'bear feet' are two very different things! If you use the wrong word to describe your search you will find interesting, but wrong, results.
If you enter the word "apple" into Google search looking for the tree or the fruit of that tree you'll have to scan a few hundred results about the company by that name before you find what you asked for.
5. organization and color
Aside from cultural differences there are spelling differences as well. American spellings vary from English; you might be missing your answer by only searching organisation (organization) and color (colour).
can affect your search: China/china or Polish/polish.
If you type in police, you get a lot of pages about the rock group.
searching is likely to retrieve many documents that are not relevant to the search question. Such documents are called false positives. The retrieval of irrelevant documents is often caused by the inherent ambiguity of natural language; for example, in the United States, football refers to what is called American football outside the U.S.; throughout the rest of the world, football refers to what Americans call soccer. A search for football may retrieve documents that are about two completely different sports.
href="http://www.sims.berkeley.edu/courses/is141/f05/lectures/jpedersen.pdf">Jan Pedersen, Chief Scientist, Yahoo! Search wrote on 19 September 2005 about The Four Dimensions of Search Engine Quality and in the chapter about Handling Ambiguity
brought nine pictures of different things called "cobra" (snake, car, helicopter etc.)
Quickly finding documents is indeed easy. Finding relevant documents, however, is a challenge that information retrieval (IR) researchers have been addressing for more than 40 years. The numerous ambiguities inherent in natural language make this search problem incredibly difficult. For example, a query about "China" can refer to either a country or dinnerware.
An intelligent search program can sift through all the pages of people whose name is "Cook" (sidestepping all the pages relating to cooks, cooking, the Cook Islands and so forth),
Phenomena that may cause problems include polysemy, words which have multiple related meanings (a window can be a hole or a sheet of glass); synonym, multiple words with the same or similar meanings (tv and television, or Netherlands/Holland/Dutch) and plural words (cat and cats)