new idea ENGINEERING         Home  | Products  | Services  | Newsletter  | Resources  | About Us | Contact Info | Privacy Policy        

  Specializing in Enterprise Search since 1996 - including FAST, Autonomy, Google, Endeca, Dieselpoint and Lucene

Locator: NIE Home / Publications / Enterprise Search Newsletter / Volume 3 Number 4 / Article 2

Not a subscriber? Sign up at http://www.ideaeng.com/subscribe.html

Command Line K2: rcvdk

By Miles Kehoe. - Volume 3 Number 4 - Summer 2006

When we work with new customers, one of the first tools we tend to use it the powerful K2 utility rcvdk. Sometimes, customers are amazed that in this day of graphic dashboard, Java and JSP we continue to rely on a command line retrieval client. Sure enough, over time, our clients start to rely on rcvdk as well. Why?

The rcvdk tool provides direct low level command line access to a K2 collection at the file system level. It opens the collection directly, not through brokers, servers, or ticket servers, and it offers the best way to confirm that, at the lowest possible level, a collection is valid.

You can also use rcvdk as a tool to understand how query tuning will impact your results. You can enter any valid VQL query, and rcvdk will show you the relevance scores that K2 will create for you when you move upstream into the server/broker environment. And of course, until you actually ask to view the documents, any ticket server security will not hinder you as you experiment to create the best possible results ranking algorithms for your site.

But besides a great tool to verify collection contents and test relevance, it is incredibly easy to script, and use in automated collection 'sanity checks', whether you have updated existing content or added new content to your collection. You can redirect input from a file into rcvdk; and redirect output to a file; or even better, into a Perl script to parse for success.

Just Enough rcvdk

We've recently added a detailed write-up of rcvdk and its most popular options. In this article, we wanted to show you the most useful rcvdk options and commands so you can begin using this great tool in your collection management. You can find a simple example of rcvdk in a previous article, Rediscover the Poor Man's Entity Extraction in K2 in the Fall 2005 issue of Enterprise Search.

Starting rcvdk

Because it is a command line utility, fire up a telnet/ssh session to your Unix/Linux system; or start a CMD command window on Windows. In both environments, you'll need to have the K2 binary directory in your PATH; but on Unix/Linux you'll also need to have your LD_LIBRARY_PATH set to the binary directory as well.

Change to the directory that has a K2 collection, and start rcvdk:

	D:\colls>rcvdk niedocs
	rcvdk  Verity, Inc. Version 5.5.0
	Attaching to collection: niedocs
	Successfully attached to 1 collection.
	Type 'help' for a list of commands.
	RC>
Depending on how your collection was built., you may find you need to specify the locale. Fortunately, rcvdk will tell you what locale you need to specify in the error message:
	D:\colls>rcvdk niedocs
	rcvdk  Verity, Inc. Version 5.5.0
	Attaching to collection: niedocs
	>> Error   E3-0036 (VDK): Collection's locale(englishx) not compatible with Session's locale(uni)
	Error attaching to collection: niedocs
	Type 'help' for a list of commands.
	RC> quit
	D:\colls>rcvdk -locale englishx niedocs
	rcvdk  Verity, Inc. Version 5.5.0
	Attaching to collection: niedocs
	Successfully attached to 1 collection.
	Type 'help' for a list of commands.
	RC>
Once you've started the program, you can enter 'help' or '?' for a list of valid commands. Note that there are two modes of rcvdk - Novice (the default) and Expert. The commands you see in the help screen depend on which mode you are in. Personally, I almost always use Expert mode because I invariably need one or two of the expert commands and I find it less of a hassle to turn it on at the beginning of my sessions. Use the 'x' command to toggle Expert and Novice modes:
	Novice Mode Commands
		Available commands:
		search       s  Search documents.
		results      r  Display search results.
		clusters     c  Display clustered search results.
		view         v  View document.
		summarize    z  Summarize documents.
		attach       a  Attach to one or more collections.
		detach       d  Detach from one or more collections.
		quit         q  Leave application.
		about           Display VDK 'About' info.
		help         ?  Display help text; 'help help' for details.
		expert       x  Toggle expert mode on/off.
		user         u  Set user. username[:password][:domain][:mailbox]

	Expert Mode Adds the following commands:
		source          Set default source.
		sort            Set default sort.
		disable         Disable/enable collections.
		debug           Toggle internal debug flag.
		fields          Set fields to display.
		highlight       Set highlight display.
		hlmode          Toggle index-based/stream-based highlighting.
		markup          Toggle markup display on/off.
		qparser         Select/List K2 Query Parsers.
		history         Show query history.
		precision       Set score precision.
		time         t  Toggle display of search execution time.
		checkid         Check document access with a list of VdkDocIDs.
		checkkey        Check document access with single or a list of VdkDocKeys.
		pbs             Configure passage-based summary.

There are a number of these commands you may never use; let's look at the ones you'll want to know from the start.

Searching and Viewing Results

Typically you will use rcvdk to confirm the number of results. Use the 's' command to search; and the 'r' command to review the results:
	RC> s
	Search update: finished (100%).  Retrieved: 500(4735)/4735.
	RC> s search track
	Search update: finished (100%).  Retrieved: 47(47)/4735.
	RC> r
	Retrieved: 47(47)/4735
	Number  SCORE   VdkVgwKey
	1:      0.9771  /Newidea/Thumb/NIE/pdf/Search_Tracking.pdf
	2:      0.9771  /Newidea/Marketing/Web Site/NIE/pdf/Search_Tracking.pdf
	3:      0.9771  /Newidea/Marketing/Web Site/Arc/2005_Dec_31/NIE/pdf/Search_T
	4:      0.8169  /Newidea/Dev/niesrv126/README.txt
	5:      0.7967  /Newidea/Thumb/NIE/pdf/Search_Tuning.pdf
	6:      0.7967  /Newidea/Marketing/Web Site/NIE/pdf/Search_Tuning.pdf
The numbers rcvdk reports in this second search tell us the search is complete (100%); and that rcvdk has retrieved 47 documents out of 47 that meet the search criteria; and that there are 4735 documents indexed in the collection. The first null search illustrates that rcvdk never returns more than 500 documents no matter how many the collection has.

Seeing the VdkVgwKey is nice; but sometimes you want to see other field values. Enter Expert Mode and use the 'fields' command. For each field you want to see, you need to specify the K2 field name and the width of the display column:

	RC> fields title 30 author 10
	RC> r
	Retrieved: 47(47)/4735
	Number  title                          author
	1:      Search Tracking                Scarlet
	2:      Search Tracking                Scarlet
	3:      Search Tracking                Scarlet
	4:      Upcoming newsletter articles   miles
	5:      Search Tuning                  Scarlet
	6:      Search Tuning                  Scarlet
	7:      Search Tuning                  Scarlet
	8:      datasheet.PDF                  Administra
	9:      Complete the behavioral pictur Theresa Md
	10:     PowerPoint Presentation        Mark
Note that rcvdk will let you specify undefined fields, so before you decide your fields are not being populated correctly, be sure to check your spelling! The 'fields' command always displays the current result list in the specified format; so you generally do not need to perform a search again just to see different fields.

If you have more than 25 results, you can view results beyond the initial page by following the 'r' command with a numeric value to specify the (new) starting result to view.

	RC> r 40
	Retrieved: 500(4735)/4735
	Number  title                          author
	40:     Microsoft Word - PM Q3_final_e geramac
	41:     Fact Sheet Q3 Internet.xls     RFlohr
	42:     FASTTaxExp.book                dambrosio
	43:     If youÆre looking for a way to Kevin
	44:     SS-Price list.xls              Kevin
	45:     Project_History.PDF            Miles
	46:     Searchbutton Features          Carl Grimm
	47:     PRIVACY STATEMENT?             carol
	48:     Case Study Format              Tracey
	49:     SEARCHBUTTON                   darshini

Checking Search Syntax

One thing we use rcvdk for is to test the VQL queries we want to use when we work to improve results with 'query cooking'. The program accepts any valid VQL statement, so you can test the queries you want to use to normalize your relevancy curve for top queries. If you are going to modify the display fields, be sure to use the field name Score to display the relevance for each document.
	RC> fields score 5 title 30 author 10
	RC> s query tuning
	Search update: finished (100%).  Retrieved: 25(25)/4735.
	RC> r
	Retrieved: 25(25)/4735
	Number  Score Title                          Author
	1:      0.835 Proposal for Stanford GSB      Miles Kehoe
	2:      0.816 Executive Summary              Miles Kehoe
	3:      0.796 Microsoft Word - Response_2006

	RC> s ( [0.85]titletuning, [0.90] r
	Retrieved: 12(12)/4735
	Number  Score Title                          Author
	1:      0.900                                miles
	2:      0.850 Search Tuning                  Scarlet
	3:      0.850 This article discusses relevan Miles Kehoe
	4:      0.850 This article discusses relevan Miles Kehoe
You can see from the above results that you can fine-tune your weighting until you get a reasonable relevancy distribution which generally provides for better discrimination between the results.

In Summary

You have seen a few of the features of rcvdk, a very useful tool for verifying the contents of a a K2 collection. You can go far beyond the simple capabilities we've shown here, incuding viewing with highlights, viewing dynamic summaries, and other powerful K2 capabilities - all from the comfort of your command line!

As always, if you have any questions about the script, or any other search technical tasks, feel free to mail us any time!


Home  | Products  | Services  | Newsletter  | Resources  | About Us  | Contact Info  | Privacy Policy
Copyright New Idea Engineering, Inc 1996 - 2008