Tuesday, February 28, 2006

Concept Search

While preparing my previous posting about keyword search subject search and automatic tagging I stumbled upon the term "concept search" which was unfamiliar to me. Here are my findings about this kind of search. My first impression is that Concept Search brings even more results than the too many results that keyword search is bringing. The fact that Concept Search stopped being used in Excite is suspicious. It seems that uses prefer keyword search no matter how good are the alternatives - and keyword search is not good enough for me: when you're looking for a monkey in a certain zoo you usually get many other animals and many other zoos.



A search for documents related conceptually to a word, rather than specifically containing the word itself.



Concept searches solve the term mismatch problem in the sense that a concept search will return documents that relate to the same concept as the query word, irrespective of the specific word chosen by the user and the specific words in the documents. At the same time, concept searches contribute to the other fundamental goal of information retrieval systems: increasing coverage.

By returning all documents that relate to the same concept, a concept search tremendously decreases the risk of missing important documents that do not contain the exact word selected by the user but pertain to the same topic.



Excite used to be the best-known general-purpose search engine site on the Web that relies on concept-based searching.  It is now effectively extinct.

Unlike keyword search systems, concept-based search systems try to determine what you mean, not just what you say.  In the best circumstances, a concept-based search returns hits on documents that are "about" the subject/theme you're exploring, even if the words in the document don't precisely match the words you enter into the query.  

How did this method work?  There are various methods of building clustering systems, some of which are highly complex, relying on sophisticated linguistic and artificial intelligence theory that we won't even attempt to go into here.  Excite used to a numerical approach.  Excite's software determines meaning by calculating the frequency with which certain important words appear.  When several words or phrases that are tagged to signal a particular concept appear close to each other in a text, the search engine concludes, by statistical analysis, that the piece is "about" a certain subject.



Some search engines use automatic concept searching as a default. Many advanced online researchers are accustomed to keyword searching, where the exact string of characters typed in is searched. Thus, an advanced researcher who unwittingly uses a search engine with a concept searching default can become frustrated. Concept searching occurs when the engine not only searches for the exact character string, but also for word forms, and even synonyms and other words that statistically appear with the typed word.



Concept search does what we naturally do in conversation with each other — account for the individual differences in the way we express similar ideas. Using Convera’s unique and powerful Semantic Networks, RetrievalWare can "understand the meaning" of the query terms and expressions instead of merely treating them as keywords. This understanding results in expansion of the query terms to other relevant concepts rather searching for exact keyword matches.



Rather than formulating an all-encompassing query, you can perform concept searching. This is done by clicking the Concept Search radio button, entering a simple query describing your area of interest, and clicking the Get Results button.

In a concept search, PsycCrawler first generates a list of terms that are statistically related to the words in your query. This list is similar to the operation of the Relate Advisor. Those words that have a significant degree of co-occurrence with your query words are deemed related within the context of the current database.

After generating the aforementioned list, the concept search operation then performs a conventional search using the original query words as well as the related terms. You will find that many of the records retrieved, while perhaps not having occurrences of your original query words, will nonetheless contain information that is relevant to your search interests.



Advanced natural language Claims/Concept Search returns patent documents corresponding to the concept of your search query even if there are no matching keywords. See Concept Search Tips for further details and suggestions on successful Concept searching.

Select the Use Most Recent Concept Query checkbox to use the last concept entered during a Claims/Concept search. This will only display if a concept query was previously done during the current session.



The concept search can be a powerful tool in that the concept itself can be a boolean expression.E.g., Intel used the name of a River as the development name for it's 64 bit microprocessor chip, Merced. You could define a concept, "Merced" so that any time you searched for Merced, you are really searching for "Merced and not Merced River".



Concept search software looks at a portion of text, or perhaps an entire document, and characterizes it by certain words, their frequency, and synonyms to those words. Using that entire package of interpretation and various search algorithms, the software finds elsewhere what it deems conceptually-similar passages.


Much more powerful than single word, multiple word, and Boolean word searches, concept searching appears most commonly in litigation support. As the techniques improve, concept searching will be of inestimable value more generally in law-department knowledge management and even helping clients.



Powered By Qumana

Powered By Qumana

Monday, February 27, 2006

Keyword Searching Subject Searching and Automatic Tagging


Keyword Searching is different than subject searching:


"Keyword Searching searches in any number of fields; any significant words can be considered keywords; number of items retrieved is potentially large; Keyword Searching may retrieve irrelevant items


Subject Searching searches only in the subject field; search terms must come from the database's thesaurus; number of items retrieved is potentially smaller; there's a high degree of relevancy".


My main interest is to get high degree of relevancy from search engines. According to the above comparison if we could have replaced Keyword Searching  with Subject searching we could have solved this problem, but Subject searching needs manual processing and the amount of documents that need that kind of treatment  is enormous and growing from day to day. In fact that's why Keyword Searching became more prevalent than Subject Searching – in Keyword Searching we can use millions of documents that weren't processed.


So we had a system of high quality search and moved on to another system with low quality search – and now we are stuck with the fruits of our choice.  


In Hebrew Keyword Searching relevancy is even worse since "Almost every word in Hebrew script can be read as one of an average of three words".


I believe that Automatic Tagging is the way towards a solution of this problem. Automatic Tagging will process millions of documents and enable their retrieval by tags which are in fact Subject Searching tools. 


Powered By Qumana

Sunday, February 26, 2006

Emily Chang

Today QTSaver was added to Emily Chang's eHub. I applied three times in the last months  and I guess it was accepted this time because I added a link to a newspaper that published a press release about QT/mobile.


Conclusion: You need the press in order to impress.


 I'm so grateful that I decided to dedicate this QTSaver retrieval to Emily Chang Ehub:


Thanks, Emily!

Search results for Emily Chang ehub



Emily Chang - eHub eHub is a resource by Emily Chang, an award-winning strategic designer and...



Hub Web 2.0 Emily Chang's eHub is a constantly updated list of next generation web applications.



Emily Chang and partners at eHub, are not only collecting new web applications but also are now doing short interviews with the principals of alpha, beta and new online applications.



My Tickler File has been listed on Emily Chang’s eHub. eHub is her personal resource list where she posts information about web apps, services, blogs, etc that she finds useful or interesting.



September 16, 2005 eHub cited by Stowe Boyd in Corante In today’s Get Real column at Corante, Stowe Boyd, President/COO of Corante, the world’s first blog media company, writes about discovering new web apps at eHub, a new resource by co-founder and Ideacodes principal Emily Chang.

...eHub Ranked in Del.icio.us and Blogged Around the World Emily Chang’s eHub web 2.0 software resource has risen to the delicious popular page since Tuesday, Sept 13 with over 460 bookmarks, has been blogged and linked by bloggers in the United States, Spain, Germany, Japan, China, Sweden, Hungary, Italy, and Portugal, and has received over 5000 unique global visitors to the site since its launch on Monday, September 12.



Emily Chang has a nice roundup of some of the ehub interview answers she's collected over the past 4 months, trying to fixate on some common design...

I saw that Emily Chang's eHub mentioned Veetro a new project management, billing, time amanagement, crm, "jeepers-creepers, is there anything they...

If there’s a better list of Web 2.0 applications than Emily Chang’s eHub then I haven’t seen it. Good place to go if your desperate for round corners...









Saturday, February 25, 2006

Tag Cloud

Tag clouds are much more meaningful to their creators than to casual visitors. I know it from my own experience. My tag cloud on Blinklist illustrates clearly my main interests but I will prove to you that they are subjective and reveal more to me than they reveal to you. This is a problem that tag architects should consider.

  1. I divide my Blog posting by using two tags:
    "QTSaver" - my Blog posting about QTSaver

    B. "Blog" - my postings about my blog.
  2. "Free web School" is a collection of links about free study resources that I made when I was dealing with Education for All.
  3. "Publishing" is a tag I give for every site that mentions QTSaver. Each time I discover through my Site Meter search engines that I didn't know.
  4. "Micro-content" is my main field of research.
  5. "Research" is a tag I gave mainly to links I collected for a research about gambling that was meant to explore the power of QTSaver to learn a new subject.
  6. "Comment" is a tag for links to comments I left on other Bloggers sites.

And here's MyMicroPedia about the term "Tag Cloud":


  1. A tag cloud on the popular photo sharing site Flickr.
  2. A tag cloud (more traditionally known as a weighted list in the field of visual design) is a visual depiction of content tags used on a website. Often, more frequently used tags are depicted in a larger font or otherwise emphasized, while the displayed order is generally alphabetical. Thus both finding a tag by alphabet and by popularity is possible.
  3. Selecting a single tag within a tag cloud will generally lead to a collection of items that are associated with that tag.
  4. The first tag cloud appeared on Flickr, the photo sharing site. That implementation was based on Jim Flanagan's Search Referral Zeitgeist, a visualization of web site referrers. Tag clouds have also been popularized by Technorati, among others.
  5. VZ Local Search - Verizon's tag cloud based on popularity of user's local search terms
  6. Pacificepoch.com - enhanced tag cloud with related tags highlighting, and shades to indicate relationship strength
  7. Tagrolls generates HTML code to display your del.icio.us tag cloud


  1. I actually find my tag cloud quite handy because it lists all my tags on one page, and I can see what topics I post about most frequently quite easily.I also use it as a way to see which tags I have already used, so I can be consistent when tagging posts.
  2. At this point you may be wondering what's a tag cloud?
  3. In my tag cloud I list all my tags, but if you have a lot of tags you may want to limit the min number of occurrences using a HAVING statement.
  4. You can define the distribution to be more granular if you like by dividing by a larger number, and using more font sizes below. You will probably need to play with this to get your tag cloud to look good.
  5. There are probably lots of different ways to build a tag cloud, but this is the first method that came to mind.
  6. Tag Cloud per AVBlog Andrea Veggiani - Blog personale


  1. Chris Gemignani at Juice Analytics has a much better treatment of tag cloud animation than the one I came up with the other day.


  1. Businesses are shoveling them into interface makeovers, with predictably mixed success. Thus Lulu, a company that helps people publish their own books, CDs, and other products, offers a half-hearted tag cloud to help customers browse categories.


  1. Here is a "tag cloud" for all of the folksonomy tags used so far on EchoChamberProject.com.
  2. The first tag cloud is ordered by frequency and the second is alphabetized:
  3. Below is the alphabetized tag cloud...
  4. I'm going to pass this link along to some Drupal developers to see anyone is interested in coding this type of tag cloud feature into a Drupal module -- I think it'd be a relatively simple thing to automate.
  5. UPDATE: Greg Heller pointed me to Development Seed's tag cloud, and says that it's probably the "pop tags" Drupal module.So it may look like that this may already create this type of tag cloud.I have other ideas for what I want to do with this type of feature and Greg suggests that it might be possible to build on top of this module.
  6. UPDATE: The http://drupal.org/project/tagadelic developer for Drupal actually dropped by my site to see the tag cloud I hacked together and left a comment that says that he's interested in potentially collaborating with what I've come up.So there you go -- I throw a proof-of-concept together and the ball has already started rolling to modify an existing solution that I didn't even know existed before this afternoon.


Powered By Qumana

Friday, February 24, 2006

My Tag Cloud

I copied my Tag Cloud from Blinklist to Photoshop and added a cloud filter

Thursday, February 23, 2006

That Canadian Girl

For those of you, Bloggers, who want to increase the traffic to your sites
here's a great tip that I got straight from my Site-Meter:  
 Early in the morning I commented on "google gets it right again" 
and late at night I saw on my Site Meter that 20 visitors 
were referred to my Blog from That Canadian Girl's Blog. 
20 visitors are 40 percents of my average daily traffic!
Conclusion: commenting on the right Blogs delivers.
Recommendation: Comment!


Powered By Qumana

Wednesday, February 22, 2006

Last fm

In the last days I was preoccupied with automatic tagging. That's how I stumbled upon Last fm which tags songs automatically.

So I registered to Last fm, downloaded the plug-ins in five minutes and started listening to Bob Dylan. It's too soon for me to review it but I decided to let those who want to know about Last fm enjoy the power of QTSaver to gather lots of relevant information on one page.



 Last.fm is a streaming radio station with a built-in collaborative filter that attempts to learn its listeners' likes and dislikes.Based on data gathered, the station delivers a personalized radio stream to each of its listeners.

Here's how it works for Last.fm: Users can either fill out a profile or just begin listening.If a song plays to the end, the system logs this as a thumbs up. But if the user doesn't like a song and hits the Change button in the Last.fm player, it's marked as a thumbs down.

"It is all very intuitive," said one of Last.fm's co-founders, Michael Breidenbruecker."If you don't like what you hear, press the Change button. It's like flipping radio channels, or zapping TV.

Technology pundit Clay Shirky has predicted that services like Last.fm will be "revolutionary."

Last.fm is not the first project to apply collaborative filtering to music.In the late 1990s, about half a dozen companies, including the high-profile Firefly, tried to build music-recommendation systems based on lists of users' preferred songs or bands.

Like Last.fm, some streaming radio stations -- such as Yahoo's Launchcast streaming radio service -- have used collaborative filtering to match streams to individual listeners' tastes.

However, unlike Last.fm, but in line with past efforts, Launchcast users must manually rate songs to build their preference profiles.With Last.fm, listeners' preferences are automatically inferred from their listening behavior.

Since its public launch around Christmas, Last.fm has grown to about 6,000 registered users.The service offers about 30,000 tracks from all kinds of musical genres, ranging from classical to avant-garde electronica.



I started out using Last.fm (in its incarnation as audioscrobbler) to spy on my music listening habits and report them to me and others. You will see some of that data on the left sidebar of my blog in the big red badges. They show what I listened to the most last week and the music I have listened to the most since I started using Last.fm.

I found some friends of mine in the service and connected to them. I check out what they are listening to via Last.fm. And I've found people I've never met in Last.fm who have similar taste in music to me.

For some reason, I was never compelled to download the Last.fm player. I have iTunes, Rhapsody , and emusic and that gives me a fair amount of leeway to sample whatever music I want to check out on my computer. But several weeks ago, I downloaded the Last.fm player.

First, that you listen to a lot of music on your computer. If you don't there is no way for Last.fm to capture your music listening data. Second, that you download the plugin so that they can in fact capture your music listenting data. And third, that you are interested in an online social experience for discovering music.


Pandora is a lot simpler and maybe that's why people like it so much. But for me, Last.fm is a lot better. I don't want a computer recommending music to me. I want other people, people who share my taste in music, recommending music to me.



September 15, 2005 Company: Last.fm Location: London Launched: 2002, redesigned in 2005

Last FM (now merged with the Audioscrobbler project) allows you to generate a profile of your musical taste based on what you like or listen to the most.

This information is used to create a personal radio station and to find users who are similar to you. Last.fm can even play you new artists and songs you might like.

Last.fm, like Pandora, suggests bands and tracks based on your current taste, but it is Last.fm’s social network-based approach that makes it interesting. Last.fm generates recommendations from your musical tastes by compiling a list of your musical neighbours (people who listen to the same things you do) and suggests bands they also play and that you don’t. This organically built ecosystem of relationships between people and their musical tastes is what makes Last.fm stand out from the competition.

People who are curious about the architectural diferences between Last.fm and Pandora can read this post on my personal blog, that talks about just that.

The Last.fm Player is also really neat.

one of the best sites for music, all the genras and the music you can handel. if it’s out there last.fm has it give it a shot i bet you will love it



Last.fm users collaboratively build stations by 'tagging' music they like with keywords. You can then tune in to these stations.

All you need to tune in is the Last.fm Player, a small, free, open source application that connects to our network to play music just for you.




Last.fm has done a re-design and has fully integrated with Audioscrobbler. You can tag music now too.

I'm also somewhat annoyed at having to install another proprietary player (which hasn't even been made available yet). Lots of potential for Last.fm, but it seems to be unrealized as of yet.

I have been a very early adopter of last.fm but because it is so well integrated did not visit the page that often. Seeing the redesign was like finding a present on your pillow, looks fantastic!

I like most. last.fm watches what I listen to and then figures out other artists that I might like. Because I listen to very esoteric stuff, finding more music that I like is difficult. It took three tries before I found an artist that Pandora knew anything about.



Pandora's recommendations are based on the inherent qualities of the music. Give Pandora an artist or song, and it will find similar music in terms of melody, harmony, lyrics, orchestration, vocal character and so on.


On the nurture side (as in, it's all about the people around you), Last.fm is a social recommender. It knows little about songs' inherent qualities. It just assumes that if you and a group of other people enjoy many of the same artists, you will probably enjoy other artists popular with that group.


Like Last.fm, most music-discovery systems have been social recommenders, also known as collaborative filters. Although much of the academic work in the area has focused on improving the matching algorithms, Last.fm's innovation has been in improving the data the algorithms work on. Last.fm does so by providing users an optional plug-in that automatically monitors your media-player software so that whatever you listen to—whether it came from Last.fm or not—can be incorporated into your Last.fm profile and thus be used as the basis for recommendations. Compared to relying on users to manually provide preferences, this automatic and comprehensive data capture leads to far better grist for the data mill.


Pandora and Last.fm are both about helping people discover new music, so let's consider their approaches in terms of discovering truly "new" music—that is, artists who are just appearing on the music scene. If we assume that both services put new artists into their database at the same rate, Last.fm will be slower in surfacing them as recommendations. This is due to the "cold start" problem that afflicts social recommenders: Before something new can become recommendable, it needs time to accumulate enough popularity to rise above the system's noise level. In contrast, because Pandora is only comparing songs' inherent qualities—not who they're popular with—it should be able to recommend a new artist the first day that artist is in the system.



Internet radio stations have long been popular because of the wide variety of music they offer and the relative lack of commercials. But for those who crave musical playlists tailored to their personal tastes, it might be difficult to find a service more useful than Last.fm.

Last.fm is an online radio site -- but with a twist. It works hand-in-hand with Audioscrobbler, a small software plug-in that works with popular software music players like Winamp and iTunes. The plug-in scrutinizes the music files on users' computers and sends the information to a server.

From that, Last.fm creates a personalized Internet radio station based on each user's taste.

"It's ideally suited for lazy people who like music," said Last.fm technology chief Richard Jones. "Even when you're not listening to Last.fm, the Audioscrobbler plug-in is helping build your profile without you doing anything. So next time you come back to Last.fm, the radio is even better."

In addition, Last.fm lets users sample friends' musical choices. Listeners can "ban" friends' songs they don't like and designate others they love, and, in that way, diversify their musical preferences.

He said Last.fm, which has a collection of more than 100,000 songs of its own, works closely with record labels and is fully legal. He explained that the service, which is free to users, brings in revenue by offering promotional and market research services to labels and indie artists. It then turns around and pays for a worldwide streaming online-radio license from the The MCPS-PRS Alliance, the U.


Currently, Last.fm and Audioscrobbler have 40,000 subscribers combined, said Jones. Many of them have very large music profiles, including one member with more than 30,000 songs.


While much of the attraction of Last.fm is the ability to listen to songs from one's music collection anywhere there is an Internet connection, some prefer the ability to learn about music outside what they already know.


Powered By Qumana

Tuesday, February 21, 2006


 Usually in order to succeed in making people tag a picture, a link, or any other microcontent – you need the support of a company. A company means more people, more publicity, and more connections. That's how Flickr, Google Maps, Del.icio.us etc. made it.


My initiative to create skype-yellow-pages as a peer production is doomed to failure only because it is a private initiative; unless a company adopts it …`


That's the reason I chose to collect the following info for MyMicroPedia:



Folksonomy, a portmanteau word combining "folk" and "taxonomy," refers to the collaborative but unsophisticated way in which information is being categorized on the web. Instead of using a centralized form of classification, users are encouraged to assign freely chosen keywords (called tags) to pieces of information or data, a process known as tagging. Examples of web services that use tagging include those designed to allow users to publish and share photographs (Flickr), bookmarks (del.icio.us), social software generally, and most blog software, which permits authors to assign tags to each entry.

Folksonomy and the Semantic Web

A combination of the words folk (or folks) and taxonomy, the term folksonomy has been attributed to Thomas Vander Wal. "Taxonomy" is from the Greek taxis and nomos. Taxis means "classification", and nomos (or nomia) means "management".

"Folk" is from the Old English folc, meaning people. So "folksonomy" literally means "people's classification management".The features that would later be termed "folksonomy" appeared in del.icio.us in late 2003 and were quickly replicated in other social software.Thomas Vander Wal has stated that folksonomy is a subset of tagging and it is "tagging that works".

Folksonomy may hold the key to developing a Semantic Web, in which every Web page contains machine-readable metadata that describes its content. Such metadata would dramatically improve the precision (the percentage of relevant documents) in search engine retrieval lists. However, it is difficult to see how the large and varied community of Web page authors could be persuaded to add metadata to their pages in a consistent, reliable way; Web authors who wish to do so experience high entry costs because metadata systems are time-consuming to learn and use.

For this reason, few Web authors make use of the simple Dublin Core metadata system, even though the use of Dublin Core meta tags could increase their pages' prominence in search engine retrieval lists. In contrast to top-down controlled vocabularies such as Dublin Core, folksonomy is a distributed classification system with low entry costs. If folksonomy capabilities were built into the Web protocols, it is possible that the Semantic Web would develop more quickly.

Since folksonomies are user-generated and therefore inexpensive to implement, advocates of folksonomy believe that it provides a useful low-cost alternative to more traditional, institutionally supported taxonomies or controlled vocabularies. An employee-generated folksonomy could therefore be seen as an "emergent enterprise taxonomy". Some folksonomy advocates believe that it is useful in facilitating workplace democracy and the distribution of management tasks among people actually doing the work.

Jordan Willms on Gardened hierarchical folksonomy

Folksonomies - Cooperative Classification and Communication Through Shared Metadata by Adam Mathes Widely praised paper on folksonomy

Bruce Sterling article on folksonomy from Wired

Freetag, a generalized open source folksonomy implementation for PHP / MySQL applications

The Hive Mind: Folksonomies and User-Based Tagging by Ellyssa Kroski from InfoTangle.de:Folksonomy

Wikipedia:Articles for deletion/Folksonomy : http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Folksonomy

Keep and prune example list. From my brief review of the Google results it looks like "folksonomy" has caught on.

Keep: tagging is just taking off. While I'm not fond of the word "folksonomy" to describe tagging, more and more people do use it.

Keep: While I hate neologisms like blogosphere and folksonomy, the concept is certainly relevant to a significant web population and should remain as an article.The list of examples is too messy, in my opinion (though I cleaned it up a bit).

Strong Keep: The term Folksonomy has risen quickly into the lime-light. 

Wikipedia:Articles for deletion/Gardened hierarchical folksonomy : http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Gardened_hierarchical_folksonomy

Folksonomy is a close call, making this a clear delete. 

Wikipedia:Wikipedia Signpost/2005-07-18/Folksonomy and GNAA : http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2005-07-18/Folksonomy_and_GNAA

The latter article is Folksonomy, which was nominated for deletion by an anonymous user who said, "Just because some self-proclaimed, vain 'online journalists' repeat a meme on their web-site in every post doesn't mean it is fit for inclusion in an encyclopedia. "The term, a neologism for collaborative categorization that has gained considerable usage, is often defined even in other sources by reference to the Wikipedia article.


A recent neologism, folksonomy, should not be confused with Folk Taxonomy (though it is obviously a contraction of the two words).Those who support scientific taxonomies have recently criticized folksonomies by dubbing them fauxonomies. 


Some sites offer a buddy system, as a well as virtual checking out of items for borrowing among friends. Folksonomy is implemented on most sites. Examples include discogs.com for music and bibliophil.org for books. 


Tags are descriptors that individuals assign to objects, in the practice of collaborative categorization known as Folksonomy. 


With the August 2005 relaunch, Last.fm supports end-user tagging of artists, albums, and tracks to create a sitewide Folksonomy of music. Users can browse via tags, but the most important benefit is tag radio, permitting users to play music that has been tagged a certain way. This tagging can be by genre ("garage rock"), mood ("chill"), artist characteristic ("baritone"), or any other form of user-defined classification ("singers Sarah would like").

Powered By Qumana