Search this site:
Enterprise Search Blog
« NIE Newsletter

Subscriber Interview: Ruth McDunn of SLAC

Last Updated Mar 2009

By: Mark Bennett & Sean Murphy, New Idea Engineering, Inc., Volume 2 - Issue 2 - July / August 2004

I caught up with Ruth McDunn and Bebo White of the Stanford Linear Accelerator Center at the Verity User Group in San Francisco earlier this year and conducted a brief interview via E-mail that I thought would be of interest to many of our subscribers. In particular:
  • the What's New in Searching the SLAC Web page she mentions is viewable by the public and contains a number of search tips worth stealing ("amateurs plagiarize, professionals steal") for your own help pages.
  • a little known fact for your next trivia contest: SLAC was the first web site in the United States and the 5th in the world.
  • Ruth was the lead organizer for Interlab 2003 which has more than three dozen presentations available on-line that address a range of topics including: standards, accessibility, open source, blogs/wikis, portals, and content management.

Please let me know what you think of this new feature and feel free to email me at sean@ideaeng.com if you would like to be interviewed. Ruth has two specific questions on Microsoft Sharepoint and UltraSeek security that she would appreciate insight on from our readers, see below for the details.

Q: What changes have you effected in the last year?

Upgraded to Verity Ultraseek 5.2 (from an old Inktomi version) and swapped platforms (from Unix to Windows). This upgrade allowed us to index material on our intranet - which was previously excluded.

Redefined collections for minimize overlap and maximize value. One collaboration wants everything indexed, with the thought that their users can dig through the results to find what they need. For our public collections, we've tweaked the collection definitions to try to maximize return of "good" results. In some of our collections we allow indexing of cgi-scripts and query strings, in others we don't - depends on the nature of the material being indexed. For example, our "SLAC Today" site (http://today.slac.stanford.edu/) is almost entirely dynamically driven and is updated several times a day, so this collection allows following query strings and is indexed nightly.

Modified the search entry forms/pages and the results. We now are explicit about what collections the form/page will search - public, private, limited to a specific area, or everything. We have implemented a few custom search results pages, but I'm still debating the value of these pages relative to the time it takes to set them up and maintain. The results pages have been reformatted from the default and tips customized to SLAC-specific suggestions.

After reviewing the Verity Ultraseek log files, I started implementing quick links based on top requested documents results. We now have about 25 Quick Links in place (try "cafe" or just about any spelling of the word). I continue to monitor the results weekly and add keywords to the quick link definitions as needed. I've also added quicklinks to the more useful sites on our intranet. Since I can define the description that will appear, I can make sure the results are appropriate for public disclosure and indicate that authentication is required to access the link.

Developed "What's New In Search" and continue to publicize the upgrade and changes. (See http://www-group.slac.stanford.edu/wim/search/whatsnew.html for details). I've given presentations to several groups and at managers meetings. I offer monthly training sessions for web authors and I've been pushing tips on how they can improve indexing and searching of their pages. One big step forward was with our BaBar collaboration website. This is a large site (100K pages) with 600+ collaborators who all have write access to the entire site (yes, it is a potential nightmare). The site was set up with a wrapper that surrounds almost all pages in the web with a top and left navigation system - to keep at least some consistency in the site. When we ran some queries, we noticed that the navigation was showing up very frequently as the description in the results display. Since they "rewrap" the site every night, we had them add the stop/start index comments around the navigation - worked like a charm. Now the passage based descriptions have value - rather than just reiterating the navigational elements.

Q: What are some key "lessons learned" from last 12 months: what went well and what went poorly (aka "The Good, The Bad, and The Ugly")

Back up the configuration files. Our server was compromised before we went public with the upgraded product, and it was taken off the network. Once the server was fixed, I had to reconstruct the entire set of collections and indexes. Learned a lot about the interface, but this is not the most efficient method.

Good search takes resources and constant vigilance. A group of four have been meeting just about every week for the past year, initially to define our search requirements, evaluate products, purchase Verity Ultraseek, and then to work out implementation details. The group consists of the technical staff and managers and has proven to be a valuable resource.

During the product evaluation process, we did not fully understand what it would take to implement secure searching. We decided on the Windows platform with the understanding that we could integrate search with existing authentication procedures - but that has not been as easy as it sounds. The Security module allows us to index SSL web sites, but without implementing XPA or a proxy system, we've had to make a somewhat kludgey interface to the public and private collections. More about this in challenges.

Q: What are some of the significant challenges/opportunities that still remain? What's your biggest "pain point" today?

Our computer security group is concerned about possible "leakage" of information when displayed in the search result descriptions. Our fix right now is to only provide forms that include our private collections on private web sites - but this is not terribly secure and it reduces usability for the end users. In the best of all worlds, there would be one search box that would recognize who you are (by IP or authentication) and then display only the results you are entitled to see. Another solution would be to not display the description for "private" results. We have not found a solution to this problem that won't cost programming time/money that we just don't have available to us right now. I had hoped that the addition of web services to the 5.2 version would provide some built-in functionality to solve this problem - but it hasn't. In addition, at the user conference we asked the Verity folks if this problem was on the radar to be fixed - but no, it wasn't.

Q: What sources do you regularly consult for practical information on search, navigation, and content management? (list URLs, newsgroups / yahoo groups / e-mail lists, conferences, user groups, magazines, etc..)

We have an annual conference of DOE Webmasters (called InterLab) and this has been a valuable resource for us. Although conference attendance is restricted, our proceedings are public. You might want to check out the web site for the conference in October (https://public.ornl.gov/interlab/). There is a link to previous conference proceedings. We hosted the event in 2003 (http://www-conf.slac.stanford.edu/interlab03/program.html).

Q: Is there a question or issue you would like readers of the Enterprise Search Newsletter's assistance with?

We have started evaluating Share Point and Share Point Portal Server to provide collaborative work space. Some think we will be able to expand the search feature in these products to all of our web site and eventually eliminate Ultraseek. I'd be curious of anyone has experience with these products, and especially the search features.

It seems that we are not unique in having a search that includes public and private information. It would be interesting to hear how others have solved the issue of information "leakage."

Note: please reply directly to Ruth (mcdunn at slac.stanford.edu) on these two questions if you can offer her some insight.

Ruth also offered some background on her configuration and usage statistics to provide a perspective on her answers:

SLAC Web Statistics

  • Our web site averages over 6 million page views a month.
  • For the five full months since we upgraded Verity Ultraseek, we average per month:
    • 11,178 - Queries
    • 8903 - Unique Queries
    • 13097 - pages Viewed
    • 9779 - Clicks

Background questions / context

Q: Do you manage or focus on an Internet, intranet, or extranet site? (If you have responsibility for more than one then please pick the one that has the most interesting or important (for you) problems).

I work on all areas of our web site. Being the first web site in the US (and about 5th in the world - see http://www.slac.stanford.edu/history/earlyweb/), it is big and disorganized. As a public institution, we are required to share our information with the world, but in the past few years (especially since 9/11) we've had to move more and more information about our people and how we do business into our intranet.

Q: Who is your user community or typical visitor mix? What are the key transactions" you hope to facilitate with search?

Our site serves many audiences. We have a large international collaborators community who need our site to facilitate the collaborative process. We have onsite scientists and engineers who need the site to collaborate and to just do business. We have our non-scientific staff involved in keeping the operation running. And we serve the public in education and outreach.

Q: What Application / Solution are you trying to build?

Corporate / Intranet / HR portal for employees

Q: What are the goals for your site that search is key in enabling/supporting? How do you track your performance:

Qualitatively and quantitatively. (e.g. user surveys, search analytics, web traffic analysis, ...). We have been using NetGenesis for several years to analyze our web server log statistics. I am in the process of pulling the search server logs in for analysis. We hope to do more user surveys, but right now our only feedback is anecdotal.

Q: Can you briefly describe your current search, navigation, and content management infrastructure:

a) Search Engine(s): Verity Ultraseek 5.2
b) Portal or Personalization tools (if any): exploring Share Point Portal
c) Content management and/or revision control system(s): None
d) Taxonomy tools (if any): Playing with Verity Ultraseek CCE
e) Database derived content (e.g. cold fusion)
f) Interoperability issues? Officially we support Windows,
Linux, and Unix, but there are still a few Macs on site.
g) Network/Hardware configuration. We have both Unix
(Apache) and Windows (IIS) web server farms.