Search this site:
Enterprise Search Blog
« NIE Newsletter

Collection Mirroring with Verity K2 Enterprise 4.5.1

Last Updated Mar 2009

By: Mark Bennett, NIE Enterprise Search: Issue 3 - July, 2003

If your company owns Verity K2 enterprise search, one of the features for high availability and load-balancing situations is called collection mirroring, which enables identical collections to be accessed on two Verity K2 servers simultaneously. To mirror collections you need a minimum of two K2 servers in your search engine architecture. The collections must also be named identically on both servers in order to enable mirroring.

On large collections we found that indexing the collections separately would provide a different amount of documents in each collection, thus the collections would be out of sync and would provide different results in some queries. By using the RCADMIN a command line tool that comes with Verity K2 and a series of batch files we can synchronize the collections fairly easily

Before writing your batch files first login to your K2 servers using the RCADMIN command line utility and take note of what commands need to be used in the text file that will be piped to the RCADMIN utility. After logging in with RCADMIN you will get a prompt, enter “indexstateset” and then enter the name of your server. Select “C” and then enter the name of your collection. Select which state you want your collection in and exit the utility. Refer to your K2 enterprise documentation for further RCADMIN commands.

To completely automate this process a series of commands need to be used in a single batch file. Here is a breakdown of the commands to be used to synchronize the mirrored collections:

  1. First in our batch file we used RCADMIN because we needed a way to turn off the collections to enable us to copy the collection’s directory cleanly and without errors to the other K2 server. Use the RCADMIN utility in the batch file and a text file to pipe in the commands to turn off the collection on the first K2 server.
  2. Then in the batch we will need to index the first collection that is now turned off, preferably with the VSPIDER command line tool.
  3. Delete the contents of the temporary directory to provide an empty directory for the copy of the first collection’s directory.
  4. With the collection still turned off you want to copy the first collection to an the empty temporary directory using XCOPY or CP depending on your operating system.
  5. When the copy is complete turn the collection on the first K2 server back on with RCADMIN.
  6. Again using RCADMIN turn off the collection on the second K2 server.
  7. Next in the batch file we need to delete the files in the collection on the second K2 server.
  8. Once this is complete we can now move the contents of the temporary directory on the first K2 server to the collection on the second K2 server.
  9. Finally when the copy is complete turn on the second collection with RCADMIN to have both collections back online.
  10. If you have caching enabled on your servers you may want to automate the restart of the K2 servers and brokers to refresh the cache.

Example of using RCADMIN in a batch file:

rcadmin < offline1.txt

vspider -cmdfile d:\spider.cmd

del \\server1\data\colls\first_coll_temp\*.* /S /Q

xcopy \\server1\data\colls\first_coll \\server1\data\colls\first_coll_temp /S /E /Y

rcadmin < online1.txt

rcadmin < offline2.txt

del \\server2\data\colls\second_coll\*.* /S /Q

xcopy \\server1\data\colls\first_coll_temp \\server2\data\colls\second_coll /S /E /Y

rcadmin < online2.txt