Search this site:
Enterprise Search Blog
« NIE Newsletter

How can we assign a display URL in K2 so clicking on a link jumps to the right page, even though the Verity K2 vgkvgwkey is a fully qualified file name? - Ask Dr. Search

Last Updated Mar 2009

By: Mark Bennett, Volume 3 Number 5 - Summer 2006

Dear Dr. Search:

Our web content includes links to other parts of our company as well as to external partner web sites. To get full control over what gets indexed for search on our site, we have decided to crawl our file system rather than try to set spider rules and depth that varies by which site we index. How can we assign a display URL in K2 so clicking on a link jumps to the right page, even though the Verity K2 vgkvgwkey is a fully qualified file name?

Dr. Search answers:

There are a few methods that come to mind immediately. One uses the standard indexing tool mkvdk; and the other uses the command-line tool vspider with a map file that was first put in place about the same time Cold Fusion started bundling K2. Of cpourse, you could simply mdify your applicaiton: as you read the VdkVgwKey, replace the base file path withthe base UR. This requires you to hard-code path information in your applicaiton, or to store the path map in your application properties file. We'd suggst you stick with one of the indexing options since they are so straightforward.


When you crawl a file system, the K2 vgkvgwkey field typically contains the fully qualified file name - for example, on our web site, this top level page is actually /usr/local/data/index.html. To reach that file from the web, you type (Note: because of web server tricks, you can often omit the actual file name for index pages; we're just showing the fully qualified URL here for clarity). With the file name in the vgkvgwkey field, and also in the doc_fn field, you need to do something to make sure the search result page link doesn't try to open a file on your user's computer.

Typically, what you want to do is replace the root filepath name with the top level URL; that is, for every record/document in K2, change /usr/local/data/ with "". As mentioned, you can do this under your web application control; but a better solution is probably to solve the problem at index time so K2 always knows the right URL. This also insures that K2 command line tools like rcvdk and rck2 know how to view the document as well.

Using a bulk insert file and mkvdk

The brute force method involves the creation of a bulk insert file for mkvdk, in which you provide the URL field as well as the VdkVgwKey. Start by creating a file list of all files you want to index, using the Unix find command, or the Windows DIR command. The important thin in eaxh case is to include the full path and file name. Figure 1 shows the commands in both Unix and Windows.

Unix Command:

find /usr/local/data -name "*" -print > filelist

Windows CMD Command:

dir c:\inetpub\wwwroot\* /s/b > file_lists

Figure 1: Obtaining a List of All Files Under a Starting Directory

Once you have the file list, it's a fairly simple matter of writing a shell or Perl script to convert your file list into a bulk insert file like the one in Figure 2.

Sample Bulk Insert File

vdkvgwkey: /usr/local/data/index.html
vdkvgwkey: /usr/local/data/about.html
vdkvgwkey: /usr/local/data/pubs/figures.pdf

Figure 2: Sample Bulk Insert File

A clever shell script person with some regex experience can probably do it with a simple shell command or two!

Using vspider and a mapfile

While the bulk insert file mthod is fine, and may make sense when you have additiponal metatata to dd, you may find the easiest solution is to uyse vpiser and the map file that has been supported in K2 for years but little known adn even less understood.

The map file lets you specify an automatic substitutiona and copy from one style file field to another. For example, consider the map file shown in Figure 3:

Sample map file map

URL vdkvdwkey /usr/

Figure 3: Sample Map File for vspider

This will cause vspider to copy the contents of the VdkVgwKey field into the URL field during indexing, replacing the string /usr/local/data with anywhere it occurs in the original field.

Now, when you are ready to index a file, use the additional vspider commands prefixmapfile to specify the file containing the map; and abspath to direct vspider to use fully qualified file paths as follows:

vspider Command LIne

-collection website
-start c:\inetpub\wwwroot\
-prefixmap /usr/local/config/mapfile.txt

Mapfile for vspider

vdkvgwkey C:\inetpub\wwwroot\ url http://localhost/

Note that you must use abspath; and while some documentation says you must use double backslashes ('escaped' in Unix terms), we found that not to be the case. Also note that you will want to specify the trailng backslash (or forward slash in Unix) at the end of both patterns.

When you have completed the ijndex, you can view the contents of your fields using rcvdk, a technique we describe elsewhere in this issue of Enterprise search.

rcvdk	x	fields url 35 vdkvgwkey 35	s	r

As you can see, the prefixmapfile is apowerful tool, but be sure you understand that indexing a file system for the web comes with some risks. First, you may find that your web file directores have multiple versions of documents, or even worse, pore-release versiosn which you are not quite ready for general release. Thew vspider tool traverses an entire directory unless you specify an exclude pattern so use care.

In Summary:

K2 has a number of options that ba provide powerful capabilities when you need them. You can use any of several tools to populate fields, but whether you do it manually or using the vspider prefixmapfile option.

We hope this has been of some use to you; feel free to contact me directly if you have any follow-up or additional questions. Remember to send your enterprise search questions to Dr. Search. Every entry (with name and address) gets a free cup and a pen, and the eternal thanks of Dr. Search and his readers.