|
Locator: NIE Home / Publications / Enterprise Search Newsletter / Issue 7 / Article 3 Ask Dr. SearchThis month's question: How do you create your own thesaurus file in K2?The Verity K2 engine supports a user-defined thesaurus capability that allows users to define synonyms for specific terms. Once enabled, a search that uses the <THESAURUS> operator will expand to include documents that include the defined synonyms defined in the custom thesaurus. Creating a Thesaurus File Creating and using a thesaurus for Verity K2 is a multi-step process:
Let's look at how to implement these steps. 1. Identify the equivalent terms and create the file Verity's thesaurus provides a way to define terms that are essentially equivalent. You create the association between synonymous terms, and enter them into a 'source thesaurus file'. The format for each detail line of the source file is:
In this case, when the user does a search for " Note that if a term appears in multiple circular lists, the terms in both lists become synonymous:
A thesaurus search for "
All of the terms are, by default, circular. It's possible to specify a 'one way' relationship for terms on a given line by using the 'key' operator:
Once this thesaurus file is active, searches for " Generally, we recommend using the thesaurus as a synonym file with each item being equivalent. Also, we suggest that lists be self contained with no terms spanning multiple lists unless necessary. 2. Compile the source file Once you have created a thesaurus source file, you are ready to compile it. Open a command window (cmd.exe on Windows servers), and change to the working directory where you saved the source thesaurus file. Verify that the Verity binary directory is in your path, and enter:
3. Move the compiled source file into the Verity directory structure Once the SYD file is compiled with no errors or warnings, you can copy the binary file (vdk30.syd) into the active Verity directory. Normally, this is the verity\k2\common\english directory. Before you copy the file, be sure to make a backup copy of the existing vdk30.syd file in the Verity directory structure. Because Verity keys on the file type/extension SYD, it is best to either move the existing SYD file into a different directory; or to rename it to a file with a different extension. What we generally recommend is to renamed older SYD files based on a version number, pre-pended to the file extension SYD. For example, we would name the backup of our original vdk30.syd as vdk30.v0syd. Once you have saved the previous version of the file, simply copy the new binary file into the verity\k2\common\english directory. Any <THESAURUS> searches will now use the new file. Un-compiling Thesaurus Files Verity supports the ability to extract the original source from an existing binary SYD file. To do so, open a command window and, with the Verity binary in your path, run:
You can edit the source, adding or deleting terms as you see fit. We recommend against using/extending the standard Verity English thesaurus because most sites have specific vocabulary desired in the thesaurus file. For example, one entry in the standard English file is a line that includes:
Thus any search that included large (as in " Remember, thesaurus terms will only impact your results if you use the <THESAURUS> operator in your user query. Since very few of your users will want to do that, be sure your search script appends the <THESAURUS> operator with the users query term. See our article on Intelligent Query Pre-Processing in our August/September Enterprise Search last year.
Write us with your Enterprise Search question at support@ideaeng.com. Return to the Table of Contents |