Common ways to determine if my results set is good

« NIE Newsletter

What are some common ways to determine if my results set is good? - Ask Dr. Search

Background

When you are in a field, common wisdom is you don't wave red flags at a bull. It makes the bull angry, he may chase you, and at least one of the possible outcomes is not desirable.

But in the search business, we wave red flags at our users all the time. And when they get angry, or complain about how bad search is, or start asking "Why don't we just use Google", then you may have some undesirable outcomes as well.

Our red flags are not just in the form of poor relevance, but in many more subtle ways that draw attention to the shortcomings of most current enterprise search implementations. This month we'll cover the top 5 result list red flags; and we've added a free bonus red flag for your consideration. We know you don't do any of these, but perhaps you know someone who does: pass this on to them.

1. Too many hits

Try this on a web search engine like Google or Yahoo: search for 'java' or another common term and the engine reports something like 'Showing 1 of approximately 24,346,783 results' - your mileage may vary.

You may think this problem is limited to internet search, but chances are your intranet is getting pretty big - and what frustrates an end user more than seeing his or her search has 25,000 related documents? Put yourself in the user's head a moment: "The darned thing can find tens of thousands of results and the pearl I KNOW is there is nowhere to be found on the first result pages!".

Best practices suggest that, if you feel you must show off how much content you have, you report "Showing results 1-20 of more than 200" - some number that is more comprehendible to the average human. Do you really think that anyone will scroll through thousands of results page? Nope - they might go to the second page if they are really desperate for the document they want.

2. Cute but useless relevance scores

Humans know relevance; machines know (algorithms?)

When you include a relevance score in the result list, you are inviting the user to critique your search engine. Don't do it.

Imagine a site where a user does a search, and the best document, the one that is the definitive answer for the search, is second in the result list. [Yes, Virginia, sometimes the right result DOES show up in the top two or tree].

Imagine this happening on a site where the result list displays no relevance scores. [It's easy - do a search on the Google public web site and observe]. The user sees his or her answer in the top two or three results and thinks "Wow, it's not perfect but it's pretty good AND I have my answer."

Now imagine that same search result list, but the top ranked result shows a score of 92.235% - but the pearl, the right document in second place, has a relevance of 91.157%. Trust me, your user is thinking "What a screwy system, it's OBVIOUS that the top document is wrong. Lucky I found my answer!"

The same applies for the cute little bar graphics that seem to do nothing but make a 'Search 1.0 result list" look more animated. Avoid numeric and graphics relevance scores like the plague.

Search best practices suggest that users know relevance - machines know algorithms. Don't give your users a reason to nitpick about your search results. They know it's not perfect and they are generally willing to cut you some slack - when you don't flaunt it.

3. Top relevance: 2.542%

If you are jumping into the list at item 3 and you still think relevance scores are a good idea, how about when you do a search for your company CEO's name and find the top document has relevance of 2.251%. He runs the company, there are press releases all over your site, and the best you can do is generate a relevance that looks like an earned run average. And do you suppose he's pleased with that score when he checks his own name on your search engine?

Search best practices suggest you do not display relevance scores (see above). But if you insist that your users want to see scores, at least create a search histogram for your most important search terms and make sure your tuned relevance algorithm gives a good range of scores across the range of numeric values.

4. Navigator links to no hits

Navigators are a great way to facilitate and encourage subsequent or drill down search to counter the fallacy of "single shot searching". Lots of companies now offer advanced navigators in the form of parametric or faceted search, but technologies that look similar at first take can differ significantly.

Perhaps the most frustrating result list red flag is when a company uses the 'out of the box' result list and navigators, and either because of the engine or the company's taxonomy, when a user clicks on a result list navigator, it returns 'no results found".

There's no reason for this.

Best practice here dictates that you need to insure every navigator you provide to help the user improve the search results returns better results - not no results. This includes spell suggests, taxonomy suggests, and parametric value links. If you can't guarantee results, I can promise someday your users will grumble.

5. Users cannot find one of your products when they search by name

This one is more a search issue than a result list issue - but it makes your search engine look awful.

Try this: turn over your laptop computer and find the part number on the underside. It may be a simple model name like "T60", or a complex string like "PCG8902B". But I promise you, when you go the vendor site and search for that exact string, you expect to find results. Our experience with many companies - especially laptop computer vendors - is that entering a valid part number returns zero most of the time.

What about your site search and your product names? Are you sure that your product names return good results? Pick your top two product numbers or names and try a search - is your site working? Best practices - and business sense - says it better!

6. Bonus: Two queries that appear the same should behave the same

Consider these two terms: '401(K)' and '401K'. Now, you and I know the special characters are sometimes treated differently by search engines - it's not always easy to make these two terms equivalent to your search engine.

Nonetheless, I can promise you that your users see no different whatsoever in these two terms. Search best practices dictate they should return the same result list. Make sure they do.

In Summary:

When you frustrate your search users with easily avoided red flag items, you invite them to become critical of the good work you are doing on your site search. Avoid the big ones, and your users will forgive the small ones.

We hope this has been useful to you; feel free to contact Dr. Search directly if you have any follow-up or additional questions. Remember to send your enterprise search questions to Dr. Search. Every entry (with name and address) gets a free cup and a pen, and the thanks of Dr. Search and his readers.