new idea ENGINEERING         Home  | Products  | Services  | Newsletter  | Resources  | About Us | Contact Info | Privacy Policy        

  Specializing in Enterprise Search since 1996 - including FAST, Autonomy, Google, Endeca, Dieselpoint and Lucene

An Introduction to the Autonomy/Verity K2 RCVDK Tool

As part of our packaged and custom training classes and seminars, we create application notes and tutorials. We are happy to share these with the Verity community, and this application note represents our first contribution: a write-up on the useful utility, rcvdk.

Using rcvdk for Collection Diagnostics


rcvdk is a handy utility that Verity ships with most, if not all, of their products. Using rcvdk, a tool with minimum documentation, you can perform almost any kind of retrieval operation on an existing collection, which provides a great way to test your query syntax, the values within fields, or even the validity of a collection. rcvdk is located in the platform 'bin' directory, which is normally in your system path if you use Search 97, K2, Cold Fusion or other products that bundle Verity products.

Run-Time Options

When you run rcvdk, you can specify a number of command line flags which can be useful in some situations. However, many of the command line options are provided for specific and uncommon fringe cases, and should generally not be used unless you really know what you want to do. We've included these options at the end of the command line section in italics. Use at your own risk!


  -expert Start rcvdk in expert mode; command line equivalent to the 'x' or 'expert' command. Expert mode provides additional commands, listed in the 'Commands' section below.

  -debug Enable internal debugging; command line equivalent to the 'debug' command. Enables internal debugging, which produces additional console output during some operations (such as 'attach') and produces additional output in any logfile (see 'logfile' and 'loglevel').

  -datapath Define an alternate path in which the actual data files reside for a collection; use this option if the data files indexed in a collection are located in a different directory than initially indexed. This parameter can be a relative path or a fully qualified path,which will be pre-pended to the vdkvgwkey value.

Example:

 rcvdk -datapath d:/docs/newdir testcoll 
Be sure to specify all parameters before providing the collection name

  -maxfiles Historically, Verity applications request and attempt to allocate every free file handle on the system up to a limit of 100.

Sometimes you don't want to let Verity have all these handles, although it is less important now that DOS is, for the most part, not an active platform.

To limit the number of open files that rcvdk will attempt to allocate, specify a number here. The fewer you provide, the slower searches will generally be.

Example:

 rcvdk -maxfiles 50 testcoll 

  -numpages rcvdk allocates memory for caching index files (not data files as the command line implies). You can specify how much memory to allocate, in 1K blocks, by using this switch. As with the number of files, the less memory you provide, the slower the searches. There is probably an effective max as well, beyond which more space doesn't significantly improve search, but that varies widely from system to system and collection to collection.

Example:

 rcvdk -numpages 1024 
This example allocates 1MB of memory - 1024 pages of 1024 bytes.

  -logfile Specify the name of a text file to which rcvdk (and the VDK engine) will log status and messages, depending on the loglevel defined.

rcvdk will append to a log file rather than overwrite it, so use care to specify a non-existent logfile name unless you really do want to append the new messages to the old log file.


  -loglevel Provide a decimal numeric value which corresponds to the level of log detail you want from all rcvdk operations. The Verity collection building manual discusses the log levels; some are included here:
	Level		Meaning
	--------	-------------------------------
	1		fatal errors are logged
	2		error messages are logged
	4		warning messages are logged
	8		status messages are logged
	16		info messages are logged
	32		verbose mode is used
	64		debug messages are logged

	Add the values to increase logging levels. For example,
	for fatal and error message logging only, use 3 (1+2):

			rcvdk -loglevel 3 testcoll

	For the maximum logging, use 127 (1+2+4+8+16+32+64).

Note that mkvdk allows you to use English words which map to various log levels (ie, verbose, debug, etc), but these do not appear to work properly in rcvdk.

  -vdkhome Specify the Verity install directory where rcvdk can find the critical Verity system files (license, messages, etc).

Normally, the Verity code expects to locate these files two levels above the Verity 'bin' directory in a directory called 'common'; but if you don't want to include Verity's bin directory in your system path, you can provide a location using this option.

This can be helpful if you maintain more than one version of Verity on your system; and you run the Verity utilities by specifying the full path name to the executable.

Example:

 /u/mydir/rcvdk   -vdkhome   /u/search97	
Note this is the Verity 'home directory', in which common, and the other support files, are found.

  -topicset Used to specify a topic set directory created using the mktopic utility. If you want to do any topic searching in rcvdk, you need to specify the topic set here.

Example:

	mktopic /user/mydata/mytopics -otl myfile.otl 
	rcvdk mycoll -topicset /user/data/mytopics 

  -startupKB Used to specify the location of a Verity 'knowledge base' file which can contain simple topics or even a thesaurus.

Full discussion of KBs versus topic sets is beyond the scope of this paper.


  -persist Specifies that the VDK engine should remain 'resident' and perform its optimization in the background when rcvdk is not using bandwidth.

This isn't a useful option since most sites running in a production environment where performance is critical will be using a Verity server like K2 to perform the ongoing optimization.


  -locale Specify the localization information to be used by rcvdk during this session; The default is English.

This lets you specify the subdirectory within the 'locale' directory that contains the localized message files so local language messages are displayed as appropriate.

Example:

	rcvdk -local deutsch 

  -charmap String Specify the charmap to VDK.

  -dynamichl Specifying this flag enables stream-based highlighting, which means that any search term highlighting is inserted into the stream at display time rather than have the Verity kernel use the calculated offsets it generate at index time.

Normally, this option isn't very useful; but if search terms are not highlighting properly in K2, you can use this option to help diagnose the problem. Most likely, the problem is that the documents may have changed since the last index run; if so, highlighting should show properly in dynamichl mode.


  -nomarkup When viewing HTML indexed files, extract and do not display the HTML tags.

  -noexit Both command line flags result in the display of a brief summary of valid command line options


RCVDK Commands

Each valid rcvdk command is listed below along with its meaning and usage. Some of the commands are available only in 'expert' mode; these are marked with a plus character (+).

  search The search, or s, command is used to perform a search of all currently attached collections. You can use any valid Verity query syntax; and in fact, rcvdk is a good way to test your understanding of search syntax.

Using the search command with no parameters performs a null query which returns all documents; this is a good way to see how many documents are actually indexed in a collection.

Example:

	s 'cats'		
		search for the term 'cat' and all valid 
		stems of the word (i.e., cat will match)

	s "cats"
		search for documents that have only the
		word cats in them. Documents with only
		the word 'cat' will not be returned.

	s "packard bell"
		search for documents which have the
		phrase "packard bell"

	s title"cat"
		search for documents which contain an exact
		for the word cat in the title field 
The syntax supported by the search command depends on the active 'query parser' selected using the qparser expert mode command. The actual Verity syntax is beyond the scope of this document.

  results The results, or r, command displays the result list for the most recent search.

By default, the results command displays field information on up to 25 documents in the result list, sorted by relevance. In the default result list, the fields shown are relevance and the file name (actually, the vdkvgwkey field). You can use the expert mode 'fields' command to change the layout of the result list display.

When you first display results, you see the first 25 results. If you want to see the next set of up to 25 results, specify the starting result row number on the command line. For example, to see the third screen of results (i.e., starting at row 50), use the command "r 50".

Example:

	s 'cat'
	r
		show the first (most relevant) documents
		that contain the word 'cat' or any of its
		plurals (i.e., cats)

	s "dog"
	r 10
		show the 10th through 35th result row, in
		relevance order, of documents that contain the
		word dog. (Note: If only 20 documents are returned,
		this sequence will show the 10th through the 20th
		document.) 

  clusters The cluster, or c, command, displays results in cluster order.

The view, or v, command, displays the streamed text of the current document, or of the specified document if one is given.

Example:

	s "dog"
	view
		display the first (most relevant) document
		in the result list returned by the search for
		the word dog. The search term (here, 'dog'),
		will be highlighted.

	s "cat"
	v 3
		display the third document in the result list
		returned by the search for the word cat. The
		term will be highlighted.  
Note you can also specify just a single integer, and rcvdk will view that (result list) document number.
  view

  summarize The summarize, or z, command, will summarize a document using the Verity summarization technology. It works much like the view command, in that it operates on the current document if no result list document number is given; or in the document number specified.

Example:

	s "dog"
	summarize
		display the document summary for the first 
		(most relevant) document in the result list 
		returned by the search for the word dog. 
	s "cat"
	v 3
		display the document summary for the third 
		document in the result list returned by the 
		search for the word cat.  

  attach The attach, or a, command, is used to add a collection to the rcvdk search.

If you specify a collection on the command line, the attach command lets you specify additional collections to search.

If you did not specify a collection on the command line, the attach command will specify the collection against which all searches are done. Note that, until you have attached to at least one collection on the command line or with the 'attach' command, you cannot use most of the other commands.

Example:

	attach c:/data/mycoll
	attach rootcoll
		Adds two collections to the search set.
		(Note that, with Verity, forward slashes are
		usually preferable to backslash, even on 
		Windows platforms.) 

  detach The detach, or d, command, lets you remove a collection from the search set.

Example:

	detach rootcoll 
Subsequent searches will not include any documents from the 'rootcoll' collection.

Note that if you detach all collections, many of the commands cannot be used.


  quit The quit, or q, command exits rcvdk.

If you invoke rcvdk with the 'noexit' command line option, you are prompted before rcvdk exits.


  about about The about command provides version and copyright information.

  help The help, or ?, command, displays help text, including the command list.

If the appropriate help file is in the current directory, you can also use any of the commands as a parameter to help, and learn more about specific commands.

The commands that help displays are a function of the setting of the 'expert' flag. In standard mode, only standard commands are listed. Once in 'expert' mode, all commands are listed.

Example:

	help help
	help fields
	help attach 
Again, the help file must be in the current directory.

  expert The expert, or x, command, toggles between expert mode and standard mode. In expert mode, a number of additional commands are available, including those marked with a plus (+) in this tutorial.

  source The source command defines a special query which 'pre-filters' all results before applying any user defined search. In K2 parlance, this defines the 'sourceQueryText' that causes subsequent search commands to only include documents meeting this source query.

For example, assume you have a field associated with each document in your collection that indicates the department permitted to view that document. Thus, documents which can be viewed by Marketing have the text "MKT" in the 'DEPT' field. Further, let's assume that you have 1000 documents indexed; but only 100 should be viewed by marketing.

When a Marketing department user submits a query, you might define the sourceQueryText (or source) as:

	DEPTMKT 
Once you have done this, the Verity engine will automatically AND this source query with whatever the user enters for a query.

This, if this marketing user performs a search for "*", all documents, the result list will report '100 of 100 found'; the user is not even aware that an additional 900 documents are available in the full collection.

To reset the source query, specify "-r".

Example:

	source DEPTMKT
	s	
		(only 100 docs returned)
	source -r
	s
		(all 1000 docs are returned) 

  sort The sort command lets you define different sort order for document result ordering. By default, the results to a search are listed in relevance order. However, you can specify any other field in the collection as the primary sort field, in either ascending or descending order.

When you specify a sort order, you need to specify the field followed by 'asc' or 'desc' for asecnding or descending, respectively.

To reset the sort order, use the "-r" option.

Example:

	s
	sort title asc
	r
		the resules are listed sorted by title in
		ascending order

	s
	sort author desc
	r
		the results are listed sorted by author
		in descending order

	s
	sort author desc title asc
	r
		display the results sorted by author in
		descending order and by title in ascending 
		order 
Remember to reset the sort order:
 sort -r 

  disable The disable command lets you disable or re-enable collections that are part of a metacollection (or defined within a collection map file).

This is much like the 'detatch' command; although if a collection is defined within a collection map file, and you have attached to the clm collection, you can use disable to disallow searching on that collection.


  debug The debug command provides additional information either on the console or in the logfile.

This command corresponds to the '-debug' command line option; the parameters are used in the same way the parameters are used there.


  fields The fields command is an ezxpert mode command that lets you define which fields are to be displayed as part of the results command list. In the fields command, you specify one or more fieldname-column width pairs of fields to be included.

Example:

	fields relevance 5 title 30 vdkvgwkey 35
	r
		the result list display shows relevance in the
		first column in a field 5 character positions in 
		width; followed by the document title in a field
		30 characters in width; and the Verity document 
		key, normally the filename, in a field 35 characters
		in width. 
Note if the number of display columns exceeds 79m you will see some wrap-around which will make the result list more difficult to read.

If the field contents is wider than the number of columns, the rcvdk will truncate the field display.


  highlight The highlight command defines the characters to be used when marking search terms when viewing documents using the view command.

  hlmode The hlmode command corresponds to the 'dynamichl' command line option, and is used to toggle between stream-based and index-based term highlighting.

In stream based highlighting, the highlight characters are added to the output stream dynamically as the characters are displayed. In index-based highlighting, the Verity engine notes the offset of search terms in the document, and inserts highlighting characters around the appropriate terms.

Normally, both modes will produce identical results. However, if a document has changed since the Verity index was last updated, index-based highlighting may cause the wrong words to be highlighted.

For example, if the original file was only the line:

 This is a test file 
and you search on the word 'file', either hlmode should produce the same results when the docume is displayed:
 This is a test >>file<< 
However, if you modify the file to read:
 This is a new test file 
and search again in index-based mode, you will see:
 This is a new >>test<< file 
This happens because, in index-based highlighting mode, the display engine uses the information that the word 'file' is the fourth word in the file. But in stream-based highlighting, the correct word will be highlighted because the highlighting is inserted based on matching the search term at display time.

Because of the way these two modes work, it can be handy using this capability to identify when you may have documents that have changed since the collection was created. If index-based highlighting is not working properly, perhaps the documetns have changed and should be re-indexed.


  markup The markup command corresponds to the 'nomarkup' command line flag, and is used to toggle between viewing and hiding HTML tags in HTML documents.

  qparser The qparser command lets you select which of the supported Verity query parsers (and hence which search syntax) you want to use. The available query parsers are:
	  Name              Description
	  ----              -----------
	  Simple            Simple Query Syntax
	  BoolPlus          BooleanPlus Query Syntax
	  FreeText          Natural Language Query Syntax
	  Internet          Popular web search query syntax
A discussion of the differences between the parsers is beyond the scope of this document.

  history The history command displays the last several search commands that were executed.

This capability is informational only; there doesn't seem to be any way to recall previous commands.

Note: In windows 95 and Windows NT, the keyboard buffer maintains the complete command history; and using the up arrow and down arrow keys, you can scroll up and down through the previous commands just as you can at the cmd shell.

This about concludes the brief overview of rcvdk syntax. Look for otehr notes which describe useful applications for this handy tool!  


Home  | Products  | Services  | Newsletter  | Resources  | About Us  | Contact Info  | Privacy Policy
Copyright New Idea Engineering, Inc 1996 - 2008