First Look: IBM OmniFind Yahoo! Edition Search

« NIE Newsletter

First Look: IBM OmniFind Yahoo! Edition Search

Images are unfortunately no longer available for this article.

You may have read elsewhere about the partnership between IBM and Yahoo! and the release last December of the free version of the IBM OmniFind Discovery Edition officially known as the IBM® OmniFind™ Yahoo!® Edition. The first release, Version 8.4.0, is based on the popular iPhrase search technology acquired by IBM in 2005 and marketed as the entry product in the OmniFind family.

You might expect that the free version of a product IBM sells for tens of thousands of dollars - or more - has to be a stripped-down bare bones engine. On that issue, you would be wrong. And you'd be even more wrong when it comes to the newest release, Version 8.4.1 now available on the IBM download site.

The product does have license limitations, but that's really not the right word to use when dealing with this product. Briefly, the OmniFind-Yahoo! Search (called OY!S internally) license "restrictions" include:

No more than 500,000 documents per instance of the software
You cannot redistribute the code - it must be downloaded form the OY!S site
The bundled Stellent Transformation Server can only be used in conjunction with the OY!S software

It is this Stellent Outside In Transformation Server that provides support for the hundreds of different file formats supported by OY!S. And the same Stellent technology allows OY!S to support search in more than 30 languages; IBM provides the interface in 15 languages, including many Asian and other multi-byte languages.

When you consider what is included with the software, you can really get excited. You get the ability to:

crawl web sites and file systems
customize the search result page
customize the result list page
tune results based on metadata in your documents
full control over advanced search
result list tem highlighting
Best bets
Synonyms
Limited Query Reporting

And using the REST-based API, you can

return results in both ATOM/RSS and HTML snippets
add and remove documents using the REST HTTP API
use sample command line Java, PHP and XSL programs to search,
insert and delete documents

Overall, not a bad start. We'd call it pretty darn good search dial-tone.

Installation

As the release note points out, the new version still installs with only three clicks (on Windows, at least). It takes a pretty decent system to run any meaningful content. OY!S is not a desktop search client - it's a full search engine that would work find for most departments or small companies. The suggested configurations start with a 1.5GHz system with 1GB or memory and 80GB of free disk space; IBM suggests this not be installed on a shared server.

Indexing

Adding a web site to spider (shown in Figure 1) is simple. It has the simplicity and feel of Ultraseek, with a starting URL along with URL patterns to include and exclude. The on-screen help (not shown) is nice, but minimal, but just about everywhere in OY!S, you'll find a help link to the local HTML documentation. You don't have the same control over spider revisiting which defaults to a 36 hour initial revisit period which evolves over time.

Figure 1 - Add Web Site Form

You can force a new update if you change a few parameters (proxy server, include or exclude directories, etc) but it's pretty much you get what you get. For example, the documentation indicates that it may take a while for documents to be removed. There is a screen that lets you query the status of a particular URL, which at least lets you know what's really in there.

Indexing a file system is even easier - simply provide a starting directory path, and any directories to be excluded. On one of our 3GHz systems with 2GB of memory, we saw an index rate of between 250 and 700 documents per minute; but to be honest the disk drive is not particularly robust, and we violated IBM's guidelines to use OY!S on a dedicated server.

The only downside to the file crawling capabilities is the lack of any useful log reports. It would be nice to know which documents were successfully indexed and those that were not. And it would be nice if there were a way to map a file directory to a URL root, so OY!S could index web content by crawling a file system and mapping the view link to a valid URL.

Security

OY!S supports the ability to crawl password protected sites, both with basic challenge-response authentication and with more complex forms-based authentication. Here, too, the spider control looks very much like that found in Ultraseek, and it should serve well for most sites that use authentication.

Figure 2 - Index security Options

Finally, for those companies that opt to write custom applications using the API, OY!S will provide a 'key' that the application must specify in order to access the content, providing a bit more security for indices from unauthorized programmatic access.

Relevance

The documentation describes how OY!S handles relevance ranking:

'Search results are returned based on four ranking factors: document modification date, URL or directory path depth, Web links analysis, and keyword match. Keyword match carries the most weight and cannot be disabled. The search engine uses a preset ranking that is appropriate for most Web site and file directory searches.'

The keyword match component, which cannot be disabled, is based on both word frequency and on query term proximity for multi-term queries. Newer documents rank higher by default, a bias we have long maintained is beneficial to good relevance. The interesting twist is path depth: the further "below" a document is from the top URL, the less important it is. This probably makes sense for most sites, but use care if an important part of your site is located far down the directory path. (In that case, you might create two collections, one that includes the entire site except the deep content; and a second that starts at the top of the deep pages). In the same way, link popularity tends to rank pages that have the largest number of links pointing to them.

If the default relevance doesn't work for you, the good news is that the newest release provides a way for site owners to exclude date, depth, or link analysis.

Search Look and Feel

The default search and result formats are illustrated in Figure 3.

Figure 3 - Default Search Box with Branding

Unlike many other free and low cost services, OY!S lets you completely remove any branding from both the search screen and from the results list. You can also add your own graphics if you want in place of the IBM and Yahoo! images. Figure 4 shows a brand-free search and result list.

Figure 4 - Unbranded Search Box and Results

The result list is professional, with the option to show document summaries, search term highlighting, title, URLs and just about any other property you could want. What is particularly interesting is how easy it is with OY!S to get excellent document summaries, and that term highlighting takes no special effort. Because of the Stellent Transformation Server, there is even a 'View as HTML' link that works petty darned well.

Advanced Features

OY!S has some excellent advanced features including featured links (best bets), synonym creation, and limited search activity reporting. These are good to have and quite easy to implement. Honestly, we're a little disappointed with the limited search activity reporting; but then we think full search analytics are critical to running web search properly. There is a good spider track-back page, showing the full interaction between the spider and the web site in Figure 5.

Figure 5 Track-Back Reporting

Documentation

The documentation with OY!S is clear, easy to read, and compact. The Administration and troubleshoot Guide is by far the largest of the four documents that come with the product and weights in at a whopping 77 pages with cover sheet, table of contents, index and the works. The API guide adds another 28 pages full of coding examples; and the installation guide, at 13 pages, may be the smallest we've ever seen for an enterprise-class search engine.

In the actual product, help in the form of the manuals presented as HTML documents is available from just abut every page. The documentation is well hyperlinked for easy reading and it is easy to find what you are looking for.

The only drawback for the entire documentation set is that there are some questions that just seem to naturally come to mind for anyone implementing the product in more than a trivial way. For example, since OY!S only supports a single index, how can you have two or more different webs sites in the index and make sure you can search only one site at a time?

The good news is there is a reasonably active discussion forum, and some of the IBM engineers are active there and in their own blogs (see Resources below).

Sample Code

The product ships with sample command line code in Java and PHP and a sample XSL template to use for serving search results as an ATOM feed. We found the Java example would not compile using Java 1.4, and in fact had problems getting it to run on Java 1.6 (the release notes indicate this version is based on Java 1.5.4).

We did implement a custom search form that uses some of the more advanced search options new to OY!S Version 8.4.1 to perform a simple 'sub-site' search form. The code, listed in Figure 6, lets us search for parts of our web site that include only the enterprise search newsletter - the pattern is that the string "/entsrch/" is in the URL. We've based this code on sample work provided by Todd Leyba of IBM whose blog is among the resources listed below.

<html>
    <head>
    <Title>Test Newsletter Search</title>
    </head>
    <body>
    <b>Find results only on the newsletter section of the site</b>
    <p>
    <p>

    <form method="get">
    <input type="text" name="Query" value="" size="25">
    <input type="button" value="Search" onclick="runSearch()">
    </form>
    
    <SCRIPT LANGUAGE="JavaScript">
    
    <!--Begin
    function runSearch()
      {
        var dest    = "http://localhost:8080/search?";
        var params  = "index=Default&start=0&results=10&query=";
        var userQuery = escape(document.forms[0].Query.value);
    	userQuery = userQuery + escape('+url:/entsrch/');
        var request = dest + params + userQuery ; 
        alert("user query is " + userQuery ); 
    
        window.open(request,                   // complete search url
    	            "OmniFind Search Results", //  Title of the window
                    toolbar=1,                 // toolbar provides back/fwd 
                    resizable=1,               // allow them to resize window
                    scrollbars=1,              // and to scroll as well
                    height=500,                // and I like smaller windows
                    width=400,                 // of this size and position
                    left=80,top=80);           
      }
    // End -->
    </SCRIPT>
    
    </body>
</html>

Figure 6 - Sub-Site Search Form

With a few minor changes, this may work for your sub-site search needs.

Resources

Some of the resources you might want to use to learn more about implementing OY!S:

The IBM OmniFind Yahoo! Edition Forum
The IBM OmniFind Yahoo! Edition Blog
IBM engineer Todd Leyba's Search Blog
The Independent Search Developers User Group

And of course we have the Enterprise Search newsletter, the Enterprise Search Blog, or you can e-mail Dr. Search with your question.

Summing It Up

The IBM® OmniFind™ Yahoo!® Edition, for having all those special characters in its name, is a great little search engine that will work as a good solution for many departments and companies. Its introduction is perhaps a shot at the Google Search Appliance - Google did, after all, add new capabilities and lower the price of their "mini" not long after the OY!S was announced. However, it's also an indication that the age of high-price, high-margin search technologies may be coming to an end.