Tuesday, February 28, 2006

Concept Search

While preparing my previous posting about keyword search subject search and automatic tagging I stumbled upon the term "concept search" which was unfamiliar to me. Here are my findings about this kind of search. My first impression is that Concept Search brings even more results than the too many results that keyword search is bringing. The fact that Concept Search stopped being used in Excite is suspicious. It seems that uses prefer keyword search no matter how good are the alternatives - and keyword search is not good enough for me: when you're looking for a monkey in a certain zoo you usually get many other animals and many other zoos.



A search for documents related conceptually to a word, rather than specifically containing the word itself.



Concept searches solve the term mismatch problem in the sense that a concept search will return documents that relate to the same concept as the query word, irrespective of the specific word chosen by the user and the specific words in the documents. At the same time, concept searches contribute to the other fundamental goal of information retrieval systems: increasing coverage.

By returning all documents that relate to the same concept, a concept search tremendously decreases the risk of missing important documents that do not contain the exact word selected by the user but pertain to the same topic.



Excite used to be the best-known general-purpose search engine site on the Web that relies on concept-based searching.  It is now effectively extinct.

Unlike keyword search systems, concept-based search systems try to determine what you mean, not just what you say.  In the best circumstances, a concept-based search returns hits on documents that are "about" the subject/theme you're exploring, even if the words in the document don't precisely match the words you enter into the query.  

How did this method work?  There are various methods of building clustering systems, some of which are highly complex, relying on sophisticated linguistic and artificial intelligence theory that we won't even attempt to go into here.  Excite used to a numerical approach.  Excite's software determines meaning by calculating the frequency with which certain important words appear.  When several words or phrases that are tagged to signal a particular concept appear close to each other in a text, the search engine concludes, by statistical analysis, that the piece is "about" a certain subject.



Some search engines use automatic concept searching as a default. Many advanced online researchers are accustomed to keyword searching, where the exact string of characters typed in is searched. Thus, an advanced researcher who unwittingly uses a search engine with a concept searching default can become frustrated. Concept searching occurs when the engine not only searches for the exact character string, but also for word forms, and even synonyms and other words that statistically appear with the typed word.



Concept search does what we naturally do in conversation with each other — account for the individual differences in the way we express similar ideas. Using Convera’s unique and powerful Semantic Networks, RetrievalWare can "understand the meaning" of the query terms and expressions instead of merely treating them as keywords. This understanding results in expansion of the query terms to other relevant concepts rather searching for exact keyword matches.



Rather than formulating an all-encompassing query, you can perform concept searching. This is done by clicking the Concept Search radio button, entering a simple query describing your area of interest, and clicking the Get Results button.

In a concept search, PsycCrawler first generates a list of terms that are statistically related to the words in your query. This list is similar to the operation of the Relate Advisor. Those words that have a significant degree of co-occurrence with your query words are deemed related within the context of the current database.

After generating the aforementioned list, the concept search operation then performs a conventional search using the original query words as well as the related terms. You will find that many of the records retrieved, while perhaps not having occurrences of your original query words, will nonetheless contain information that is relevant to your search interests.



Advanced natural language Claims/Concept Search returns patent documents corresponding to the concept of your search query even if there are no matching keywords. See Concept Search Tips for further details and suggestions on successful Concept searching.

Select the Use Most Recent Concept Query checkbox to use the last concept entered during a Claims/Concept search. This will only display if a concept query was previously done during the current session.



The concept search can be a powerful tool in that the concept itself can be a boolean expression.E.g., Intel used the name of a River as the development name for it's 64 bit microprocessor chip, Merced. You could define a concept, "Merced" so that any time you searched for Merced, you are really searching for "Merced and not Merced River".



Concept search software looks at a portion of text, or perhaps an entire document, and characterizes it by certain words, their frequency, and synonyms to those words. Using that entire package of interpretation and various search algorithms, the software finds elsewhere what it deems conceptually-similar passages.


Much more powerful than single word, multiple word, and Boolean word searches, concept searching appears most commonly in litigation support. As the techniques improve, concept searching will be of inestimable value more generally in law-department knowledge management and even helping clients.



Powered By Qumana

Powered By Qumana

No comments: