Search this site:
Enterprise Search Blog
« NIE Newsletter

Search Analytics Corner: Tracking 'Matching Documents'

Last Updated Feb 2009

By: Mark Bennett, New Idea Engineering, Inc. - Volume 3 Number 2 - February - March 2006

What is Search Analytics:

Search Analytics is the study of what users are searching for on your site.  It focuses on the actual search terms that were typed in, which shows exactly what they were looking for; this is in contrast to the older "Web Analytics" which only tracked "clicks", and would then try to infer what those clicks meant.

Search Analytics eliminates guessing, and therefore tells you a lot more about your users and site; Search Analytics quickly identifies important problems on your site!

In our last installment of Search Analytics Corner we said:

* Make sure to log the original search terms that the user typed in.  Some systems modify or "cook" the query, which is fine, but in your search analytics logging it's important to still maintain what they originally typed in as well – maintaining both versions in separate fields is certainly OK.

* And we double checked that you are, in fact, capturing this pricessless data – you ARE doing that, right!?  If not, please do so immediately, or seek professional help, ASAP!

This Month's Subject: Log how many matching documents each search found

Like last month, our topic this month isn't rocket science.  We're still covering the basics of what data needs to be captured.  If you don't capture the proper data when searches are performed, you won't be able to run critical reports in your Search Analytics tool.

When a search is performed, the search engine reports how many documents / web pages matched that search.  You may or may not be displaying this information to the user, but internally most search engines have this data.

As an example, a search on Google for a term like "java" will report back with a message like "Results 1 - 10 of about 342,000,000" – so about 342 million documents were about Java.

This may not be a very interesting example for you – your focus is likely your own search engine, not Google's public search, and you may not even care about Java.  But the point is that YOUR SEARCH ENGINE should be giving back a similar statistic for each search – you need to LOOK FOR THIS and make sure that it is being recorded.

More importantly, when NO MATCHES are found on your search engine, then you'll really want to know about it.

Checking your Search Engine's "number of matches" statistic:

Does your engine display the statistic we want?

The first step is to see if your engine shows the statistics at the top of the results list.

Why is this important?

If your engine does display it, then you should write down the exact number of matching documents; we want this number so that, when we check the reports or logs, we can compare it to the number they have, and make sure the numbers match.

Even if your search engine does not display this data in the results, you should still keep reading; it may still be logging this data but just not showing it.  The advantage of having it display is that, when we look at the log files, we can compare the displayed number to what we find in the logs.

Trying it on your Search Engine

At this point you should go to your own enterprise search engine and search for something.  A good candidate for a search term is "budget", that will usually return some documents, and we will use it in our examples.  Examine the results and look for a message with statistics.

You should make sure to do a search that does bring back some results.  If "budget" doesn't work, you might try the name of your company.  Or a common term like "error" or "sale".

To be thorough, you might want to try more than one search.  Maybe make a table of search term, number of results, and the time you ran it.  If you do see some searches with no results, it's also a good idea to make a note of them as well; we'll need that information later on.

What to look for in your Web Browser

Below are some examples of how this might be worded – sometimes these messages contain more than one number, so it's important to understand which number is the most important.

In these examples, keep in mind that I'm pretending that you had 35,712 matching documents related to the term "budget".  Your engine will have a different numbers of course, and you might have used a different search.

Examples of how your engine might display statistics for the search "budget":

Your search for "budget" matched 35,712 documents

Showing results 1-10 of 35,712

35712 matches for "java"

"java" matched 35,712 of 1,450,812 documents

Found 35,712 of 1,450,812 documents

Searched 1,450,812 documents and found 35,712

Notice how the "35,712" keeps jumping around!  Notice the last two examples, they say essentially the same thing, but transpose the numbers; this can be confusing.  Don't worry that some versions include other numbers – we'll discuss those in a future column.  Other variations are whether the large numbers include commas or not, and whether the original query is included in the message – although interesting, don't let those differences distract you.

Your search engine should give a message similar to one of those shown above; your exact numbers will be different, but what is the overall format?

Which type of message did your engine come back with?

If it gave you back a number, that's great.  Make a note of it.  You should also jot down the date and time.

Don't worry if it looks messy.  In most cases you don't need to actually "parse" this message; you're just making a note of the number for your test search to compare with what we find later.

What if it's NOT displayed? Please keep reading!

If you didn't get back any such message, even for searches that did produce some results, you still might be OK but we'll have to do some additional checking.

If it's not displayed, there's still a good chance that the search engine does have the number, and is just not displaying it.

Having the number just makes the next step more certain.  Whether you saw the number of results or not, you still need to see if it's being logged.  One doesn't imply the other.

Checking the Reports or Logs on your system

Where to Look

There are typically three places to look for your test search:

  1. The best place, if you have it setup, is your Search Analytics reporting console.
  2. The second best place is the log files created by your search engine.
  3. The third place, the least desirable, is in your web server's log files; this will probably not have the statistic we're looking for.

Looking for the Entry

Check these sources for your test search; the term "budget" in our example.  See if that entry (either in the report or in the log file) has the number you noted above.  If not, you might see if there are other entries for that period of time; sometimes a search can show up more than once, depending on the system, so don't give up if the first line doesn't show it.

If you can't find your test search anywhere, then something strange is happening.  Fixijng it is beyond the scope of this column; if you get stuck do drop us an email!

Assuming you find the test search

The next step, after having found your test search, is to see if it has the number we noted earlier, the number of matching documents for this search.

Does the entry have the Correct Number?

No matter format of report or log you have, the line you find with your test query is likely to have numbers – but check them against the number you noted earlier.

If you did find your test search in the reports or logs, does it have the number of matching documents displayed?

Assuming it does have the CORRECT Number

That's great!

Your Search Analytics or Search Engine server logs seem to be recording the right data and have passed the test for this month!

What if you did not find the correct number?

 If not, and you're using a Search Analytics package, you might see if there is another report you can run that has more details.

If you're looking at the search engine log, you may need to reconfigure the amount of details it records; most search engines that do real logging should be recording this number.

And if you're looking at your web server's main logs then there's not much that you can do.  These log files rarely contain enough data to run thorough Search Analytics against.  Using only web logs, you will probably not be able to fix this – we really suggest you consider an alternative solution.  See the next section below: "What to do if you do NOT find your test search?"

What to do if you did NOT find your test search?

If you didn't find any search activity anywhere, or the statistics were wrong, this can probably still be fixed.  The specific course of action is very system specific; you or your search engine administrator will need to take some action to fix it.

Some general ideas about fixing this problem:

If you have a Search Analytics package, then somehow your data isn't getting logged.  This is probably a configuration issue, either with your analytics package or with the host search engine.  You will need to contact the vendor or administrator, or bring in some outside help.

If you're not using a Search Analytics package, have you considered upgrading to one?  You may be able to get this from the search engine vendor, or you can look at 3rd party solutions that do search analytics for all the engines (like NIE's SearchTrack product).

Another approach for those not using a fully analytics package is to reconfigure the Search Engine itself to log more data.  Most modern engines have a web based Administration console.  There may be some settings for controlling what level of search activity details are logged.

If you are just surviving by parsing through your main servers' web log files, you really should consider a technology upgrade.  It's going to be difficult to get the information you need from these simple log files.

A quick "web logs" fix that some folks have tried:

This one is a bit technical, and doesn't fix all the problems anyway.  So if you don't understand it, don't worry, it's not a great idea anyway!

Most web servers will not log the data from search forms that use the "POST" method of the HTTP protocol. 

The "clever" workaround is to change the submit method from a "POST" to a "GET", in the hopes that longer URLs will be recorded in the web server log files, and that those longer URLs will contain the extra data that can then be parsed out.

There are two main problems with this workaround:

  1. It still will not log the "number of matching results" that this article has been talking about.  Yes, it can be used to record what the user typed in for a search, and that is better than nothing, but it won't capture the number of matches because that data isn't submitted by the web form – that data is calculated later after the form was submitted.
  2. Even if you could settle for that limited amount of data, parsing the log files is a hassle.  Some folks can do it, we've even used this method in the past.

IF you decided to go with this workaround anyway, you could theoretically run the queries "offline" and back-populate the search statistics. Again, more complex coding; if you're that advanced, and have that much time, you should be writing this article instead of reading it.  J

A web consultant or after market product may also be able to accomplish this.

Another potential workaround:

This workaround is also a bit technical, but it does leverage some of the third party Search Analytics products, such as SearchTrack.  If this method sounds interesting to you and you'd like to discuss it, please drop us an email.

Many enterprise search engines display their results with the help of a templating language, such as JSP, ASP/.net, Cold Fusion or even Verity Search97's old SearchScript .hts files.

These templating languages can usually "call out" to another web process or script.  In this case they would call would call out to a "direct logging" script or process.  They would need to pass that external process all of the variables that need to be logged, including the number of matching documents.  There are several variations of this method.

NIE's SearchTrack does accept this type of "direct logging" data injection, as does some NOC software such as Ganglia. 

This is a viable method, if not somewhat involved.  Please let us know if you need more info.

A third potential workaround:

The third workaround is only mentioned here for completeness.  Though used by some vendors and programmers (including NIE), we do not suggest you try coding this yourself unless you are pretty good programmer and have a bit of time; and if you really do want this type of solution, you  can buy it instead of coding it, and save a lot of time..

The general idea is to try and parse or "scrape" the data from the results list itself.  Among other complexities, this involves acting as a "middle man" for search engine traffic, talking to the web server, the search engine, and a database all at the same time. This also implies full use of threading on a busy server.  In addition, remember way back when we said the format of the displayed statistics message didn't matter?  In this workaround, it does.   Your parser would need to do that parsing as well, and be adjusted should that pattern ever change.

How does your Search Engine handle "NO Results" !?!?

If a user types a search, and there are no matching documents, then most programmers would say that "the ‘number of matching docs' was zero".  Logically, this is correct: "no matches" = "zero matches", that certainly seems reasonable.

However, there are three reasons we have called out this specific situation here:

  1. This is one of the most important Search Analytics reports a system can give you.  If you aren't familiar with it, you should be!  We'll revisit it in future columns, but if you system is up and running you should run this report ASAP.
  2. Some systems behave quite differently when there are no results.  On the surface, they may display different text to the user.  It's a good idea to check what your system displays.  Also, it's considered bad design to just say "0 results" and leave the user wit ha blank result screen – most sites try to put up some search hints, or perhaps offer an email address to send inquiries to.  In a few rare cases, some systems even "crash" or give nasty errors when no matches our find.  Our point is that you should certainly check this!
  3. Behind the scenes, some systems may log "no results" as a different class of event, or perhaps not log it at all!  Ideally, it would just log it the same way it does all other searches, and just record a "0" for the number of matches – but again, given the importance of this particular information, it's worth double checking.

In Summary

Most search engines can provide a count of how many documents matched each search.  Even if not displayed to the users, this data can usually be logged.  You should check that this data is in fact being captured; if you find that it isn't being captured, you should fix it!

If you are still surviving by just looking at generic web server logs or "click based" reports, or just waiting for problems to be reported, it's probably time for a software upgrade and/or some consulting.  You are not fully capturing vital business data, and your site is likely suffering as a result.  It is not safe to assume that a lack of complaints means that everything is OK; it's far more likely that frustrated users simply give up and abandon search all together.

In our next installment we will start talking about some of the first Search Analytics reports you should become familiar with.  And as always, if you have any questions, please do write us!  support@ideaeng.com