new idea ENGINEERING         Home  | Products  | Services  | Newsletter  | Resources  | About Us | Contact Info | Privacy Policy        

  Specializing in Enterprise Search since 1996 - including FAST, Autonomy IDOL, K2, and Ultraseek, OmniFind and Lucene

Locator: NIE Home / Publications / Enterprise Search Newsletter / Volume 2 Number 5 / Ask Doctor Search

Not a subscriber? Sign up at http://www.ideaeng.com/subscribe.html

Ask Doctor Search
Volume 2 Number 5 - April 2005

This month's question comes from a customer who is using K2 in an enterprise search application.

Question: In previous versions of K2, we could rebuild a collection from scratch simply by taking the collection off-line, running mkvdk with the "-purge" option, and bringing the collection back online again.

We're now using the K2 Spider; and when we use our old scripts, the collection does come up empty; but the K2 spider seems to have its own database of indexed documents. When we restart the spidering job, no new documents are indexed.

How can we reset the spider to tell it to spider everything again?

Dr. Search answers: Your assumption that the K2 spider maintains an independent list of indexed documents is correct. VSpider does a similar thing, and you need to use vsdb to maintain the vspider database. But it's easier to manage the K2 spider database using the rcadmin as long as you are using K2 Spider jobs to manage your collection.

As you see in the script shown in Figure 1, all you need to do is use indexstateset and jobpurge. In the example, the server hosting the collection is named bean_server; and the collection is test_coll. The job that builds the collection is build_test_coll_job.

	# rcadmin purge script created $fileDate
	# login
	# username
	# password
	#
	indexstateset
	bean_server
	c
	test_coll
	0
	y

	jobpurge
	build_test_coll_job
	bean_server
	y
	
	indexstateset
	bean_server
	c
	test_coll
	2
	y
	quit
	
Figure 1: rcadmin script to clear a collection and theK2 Spider cache

Note that the login is commented out in this script because we use automated login on our servers. If you do not, you will need to either set up automatic login, or include the login, user name, and domain in the script.

You can start this script in a K2 job; and even chain it to the indexing job to insure that your collection is clean before you begin indexing new documents.

Ask Dr. Search

Remember, Dr. Search is here to solve your technical problems with your search engine. Don't hesitate to email us any time, or contact us. We're here for you!

Home  | Products  | Services  | Newsletter  | Resources  | About Us  | Contact Info  | Privacy Policy
Copyright New Idea Engineering, Inc 1996 - 2008