new idea ENGINEERING         Home  | Products  | Services  | Newsletter  | Resources  | About Us | Contact Info | Privacy Policy        

  Specializing in Enterprise Search since 1996 - including FAST, Autonomy IDOL, K2, and Ultraseek, OmniFind and Lucene

NIE Enterprise Search Terms Glossary

Navigator:

a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z 

Other Resources:

Definitions:

Access Control List
Synonyms:  ACL
Related Terms:  SSO, ACL, document level security
A set of permissions attached to a specific file or piece of data. ACL's typically list individuals and groups of people who can have access to the data, and who should specifically be denied access. It can also specify what level of access, such as "read only" or "modify". ACLs can be useful when implementing document level security.
[back to top]
 
accessibility
See Main Definition:  Americans with Disabilities Act
[back to top]
 
ACL
See Main Definition:  Access Control List
[back to top]
 
Active Directory
Synonyms:  Microsoft Active Directory, AD
Related Terms:  SSO, ACL, document level security
Software sold by Microsoft to store information about company resources such as people, machines and data. AD can be used as part of a system for doing document level security.
[back to top]
 
Active Server Pages
Synonyms:  ASP
Related Terms:  web server, JSP ("Java Server Pages")
A Microsoft programming language and environment for building interactive web sites. ASP stands for "Active Server Pages". Allows a programmer to easily embed computer programs inside of web pages. Though they have similar names, Microsoft ASP and Sun JSP are generally not compatible with each other.
[back to top]
 
AD
See Main Definition:  Active Directory
[back to top]
 
ADA
See Main Definition:  Americans with Disabilities Act
[back to top]
 
Adobe Acrobat
See Main Definition:  Portable Document Format
[back to top]
 
Adobe PDF
See Main Definition:  Portable Document Format
[back to top]
 
advanced search
Allowing a power user to further refine their search by including additional search options. These options often cause additional relational field operators to be added to the search, creating a hybrid filtered search. Though powerful, few users were actually inclined to visit a page labeled "Advanced Search", so this has been generally replaced by Parametric Search, which allows for similar functionality, but does so interactively after the first search results have been displayed.
[back to top]
 
agents
Related Terms:  automated document classification, saved searches
With regard to the search engine industry, the term "Agents" usually means a saved search that keeps running; it checks each new document that is added to the system, and takes some predefined action when it gets a match. Usage: "agents" was a very hot marketing term in the 1990s; when search engine vendors talked about agents what they were typically referring to was their automated document classification products ("agents" sounded cooler)
[back to top]
 
alternate term suggestions
Synonyms:  alternate terms, alternative terms, related terms
Related Terms:  content promotion, "Did you mean?"
A search engine suggests other words a user might be interested in, based on the search they just issued. Sometimes the alternate terms might a different / corrected spelling of a word, for example a user typing "Arkansaw" might get the suggestion "Arkansas"; Google suggests such corrections with their "Did you mean:" links in their results list. Other alternate terms might be other subjects that are directly related, for example somebody looking to buy a flashlight might also get a suggestion for the term "batteries".
[back to top]
 
Americans with Disabilities Act
Synonyms:  ADA, site accessibility
Laws that mandate web sites be usable by persons with disabilities who may not be able to use technologies like Flash or JavaScript. The changes have also helped web sites be more accessible to search engine spiders.
[back to top]
 
AMHS
See Main Definition:  Automated Message Handling System
[back to top]
 
analog operators
See Main Definition:  weighted operators
[back to top]
 
analytics
See Main Definition:  Search Analytics
[back to top]
 
Anti Terrorism Financing
Synonyms:  ATF
Related Terms:  Money Services Business
ATF/Anti-Terrorism Financing is the process of automatically monitoring financial transactions to detect suspicious activity that may be related to terrorism or money laundering, or other types of fraud.
[back to top]
 
API
See Main Definition:  Application Programming Interface
[back to top]
 
Application Programming Interface
Synonyms:  API
Related Terms:  embedded search engine
A software library that allows computer programmers to add features to existing software products. API stands for "Application Programming Interface", and is usually a set of documentation and sample computer code showing how to extend the product. Search engine APIs allow developers to embed a search engine into other applications, for example to add search capabilities to an email program.
[back to top]
 
Application Service Provider
Synonyms:  ASP
Related Terms:  hosted search, FreeFind
A form of software that does not need to be installed on web server that is using it. Transparent to the site visitor, their browser is directed to talk to the ASP server for specific tasks. An example of an ASP is FreeFind; they provide search functionality for web sites without the need for each web site to install software.
[back to top]
 
ASP ("Active Server Pages")
See Main Definition:  Active Server Pages
[back to top]
 
ASP ("Application Service Provider")
See Main Definition:  Application Service Provider
[back to top]
 
ATF
See Main Definition:  Anti Terrorism Financing
[back to top]
 
attributes
See Main Definition:  Meta Data
[back to top]
 
audio mining
Related Terms:  Autonomy
Automatically extracted the words and phrases from a recorded conversation and converting them to text; the system will also typically record a time code representing when each word was spoken. Later, a search engine could take user search terms and find the appropriate audio or video clip containing those words. This technique is useful to some business, but technical barriers have kept it from widespread corporate usage.
[back to top]
 
automated document classification
Synonyms:  agents, AMHS ("Automated Message Handling System"), profiling
Related Terms:  Verity Real-Time, automated document profiling
A system that is preloaded with a set of searches and then watches for new documents that match each search; when a match is found, an action is taken. Common actions taken when a match is found include adding a meta tag to the document to flag it as being part of particular category of documents, or automatically forwarding the document to the person interested in that search.
[back to top]
 
automated document profiling
See Main Definition:  automated document classification
[back to top]
 
Automated Message Handling System
See Main Definition:  automated document classification
Related Terms:  Verity Real-Time, agents, profiling
Usage: AMHS was the preferred term used by the government in the 1990s for automated document profiling and distribution. Their "documents" tended to be government intelligence reports, aka "messages".
[back to top]
 
Autonomy
Related Terms:  Autonomy IDOL, Autonomy K2, Autonomy Ultraseek
A publicly held search engine company headquartered in England. In 2005 Autonomy acquired Verity, Inc. Autonomy now owns three of the well established enterprise search engine brands: IDOL, K2 (formerly Verity K2), and Ultraseek.
[back to top]
 
Autonomy IDOL
Related Terms:  Autonomy
The core search engine technology created by Autonomy.
[back to top]
 
Autonomy K2
Synonyms:  K2, Verity K2
K2 was the core technology of many search engine products developed and sold by Verity, Inc. starting in the mid 1990s. Verity was acquired by Autonomy in 2006 and the K2 brand name was extended to also include some of Autonomy's products. Earlier versions of K2 were developed by Verity, ending in the K2 6.x product line. As of 2006, Autonomy has combined their IDOL core engine with the K2 interface and re-released it as K2 v7. K2 v6.x was the last version based on the Verity core technology; K2 v7 uses the Autonomy IDOL engine as its core.
[back to top]
 
Autonomy Ultraseek
Synonyms:  Ultraseek, Verity Ultraseek, Inktomi Ultraseek, Internet search syntax, Infoseek
A very popular commercial search engine currently sold by Autonomy; Autonomy is the fourth owner of this product line. It was originally developed at Infoseek in the 1990s, it was then briefly owned by Inktomi, it was then acquired by Verity. Verity worked to integrate its K2 product line with Ultraseek, though the two search engines were originally developed independently. Autonomy has also started integrating some of the Ultraseek's functionality for use with their own IDOL product line.
[back to top]
 
batch file
Related Terms:  script file, script, shell script
A series of Microsoft Windows commands stored in a text file usually having a .bat extension.
[back to top]
 
batch-mode spidering
Related Terms:  spider
A spider that completely revisits every page on a web site when it wants to respider the site. This is the older, simpler design of a web spider, but it is not practical for sites with large amounts of content.
[back to top]
 
Behavior Based Taxonomy
Related Terms:  taxonomy
Unlike a generalized taxonomy, a Behavior-Based Taxonomy is a list of the search terms your site visitors use when they perform searches on your own site search engine. A Behavior-Based Taxonomy is a great start for a relevant generalized taxonomy, as well as a great source of information about how your visitors ask for content on your site. See http://ideaeng.com/pub/entsrch/issue06/article01.html
[back to top]
 
best bets
See Main Definition:  content promotion
Usage: Ultraseek name for rule based content promotion.
[back to top]
 
binary files
Files stored in a computer hard drive that contains seemingly random bytes of data, not easily intelligible by a human reader; the file looks like "gibberish". The contents of binary files can usually only be understood by the program that created them, or by other compatible software packages. The advantage of binary files is that, for programs that can understand their contents, they are more efficient in terms of space and/or access speed.
[back to top]
 
binary indices
See Main Definition:  index (noun)
[back to top]
 
blob
See Main Definition:  zone
Usage note: The term "blob" is used more frequently by people with a relational database background; search engines typically refer to zones. "zone" tends to be associated with Verity's vocabulary.
[back to top]
 
block mode terminals
Related Terms:  web browser
A method of communication between a remote client computer and a central server where data is transmitted in chunks or discrete transactions, instead of sending the data one character at a time. The modern web browser and the HTTP protocol can be viewed as a similar system, but implemented with more modern technology and graphics.
[back to top]
 
Boolean operators
In relation to search engines, a search syntax that only supports "yes" or "no" logic, and allows parts of a query to be joined together with the logical AND, OR and NOT operators.
[back to top]
 
boost
A search syntax that allows for certain search terms to be given more weight in relevancy calculations.
[back to top]
 
bot
See Main Definition:  spider
Synonyms:  'bot, robot
[back to top]
 
brokered indexing
Related Terms:  Autonomy K2 spider, FAST Search and Transfer
A series of cooperating software modules that can index vast amounts of data into a search engine, by efficiently dividing up the many different indexing tasks.
[back to top]
 
brokered search
A users' search is received by one search engine, which then forwards the request on to other engines and combines the results. This is similar to federated search, except that all the remote search engines are typically from the same vendor, so query syntax and relevancy are the same.
[back to top]
 
case insensitive
Search terms will match words in a document without regard to upper and lowercase letter differences. This is the default behavior of most search engines.
[back to top]
 
case sensitive
Search terms must match words in a document exactly in regards to upper and lowercase letters. This type of matching can be helpful when looking for proper names or abbreviations.
[back to top]
 
catalog
See Main Definition:  search indices
Synonyms:  document catalog
[back to top]
 
CGI ("Common Gateway Interface")
Related Terms:  web page form, dynamic content, URL
The means by which software running on a web server interacts with visitors. For example, when you submit a search form on a web site, the query is sent via CGI. A link to a CGI will sometimes have a question mark in the URL.
[back to top]
 
CGI field
Within the scope of search engines, a CGI field is a piece of data submitted to the search engine from a web page search form. It may contain the text that the user typed in, or it may represent various check boxes or items selected from a dropdown list. CGI fields are the key interface between a search form and the underlying search engine. These are not the same as regular "fields" in a search engine or database.
[back to top]
 
Click Tracking
A report showing what parts of a web site a visitor looked at This information is gathered by keeping track of which links the visitor clicked on, or by looking in the web site's log files. Click tracking does not provide as much insight into a visitor's intent as the newer "Search Analytics" product do. (defined below)
[back to top]
 
Click-through
Related Terms:  search analytics
A report showing which specific link a user clicked on when looking at the results of a search. Since searches often bring back many pages, it can be useful to see which page the users think is relevant.
[back to top]
 
closed network
See Main Definition:  Intranet
[back to top]
 
clustering
Related Terms:  Search 2.0, results list visualization, noun phrase extraction, n-gram
Grouping together similar documents in a search results list. There are many different techniques for doing this, most using statistics to analyze the words in the document.
[back to top]
 
CMS
See Main Definition:  content management system
[back to top]
 
collection
See Main Definition:  search indices
Usage: Usually associated with Verity K2 terminology
[back to top]
 
collection level security
Related Terms:  collection
Controlling access to sensitive documents at a high level by grouping similar documents together into specific collections, and then allowing users to have access to only certain collections. For example, all users can search the public content, but only employees can search both public and "Intranet" collections. This is the most common form of search engine security because it is relatively easy to implement; however it is not flexible enough for more complex security requirements.
[back to top]
 
command line
Synonyms:  command line arguments, command line options
Traditionally, software was started by typing in commands to the computer. These commands had options that could control the details of what the software would do. On Microsoft Windows, these command line options usually start with a forward slash (/). On Unix, these commands usually start with a single or double hyphen (- or --) Many search engines include tools that are run from the command line. This allows them to be run from a script file or "cron job".
[back to top]
 
company portal
See Main Definition:  enterprise portal
Usage: the specific phrase "company portal" may imply that the enterprise portal is available to the general public, or at least to customers and partners.
[back to top]
 
compliance
Related Terms:  Sarbanes-Oxley Act, "SOX"
In this context, insuring that 100% of data is represented and searchable in a vertical application. For example, making sure that a search for particular client's name will always reliably bring back all pertinent records. Vocabulary: Sarbanes-Oxley Act: AKA: "SOX": Compliance regulations relating to what information companies must maintain and provide. SOX compliance is often related to Knowledge Management Systems and related search technology. See http://www.sarbanes-oxley-forum.com
[back to top]
 
compound document
See Main Definition:  sub-document indexing
[back to top]
 
content
Related Terms:  document
In the context of search engines, "content" is a general term referring to the data that is to be indexed and searched by the search engine. It might include web pages, files, database records, or other textual data that needs to be searched.
[back to top]
 
content management system
Synonyms:  CMS, document management system
Related Terms:  embedded search engine
Software that manages corporate documents and other important data. CMS often includes document version tracking, document security enforcement, workflow automation, and often have an embedded search engine to allow users to search through all the documents quickly.
[back to top]
 
content mining
Related Terms:  ETL, legacy data
The process of extracting valuable data that is stored in a normally inaccessible format. For example, many companies have textual data in Word, PDF or PowerPoint presentations, but they might like to load that into a database. Content mining software can go in and parse out the bits of data that are needed. See also "Legacy Data"
[back to top]
 
content promotion
Synonyms:  directed results, best bets, quick links
Related Terms:  alternate term suggestions, "Did you mean?", SearchTrack
A system to allow more precise control over which documents are returned as the result of a search; some systems allow an informed employee to suggest specific web pages that will best answer specific questions. For example, many pages on a web site might contain the term "support", but content promotion allows the main Tech Support home page to be suggested, above all other matches, when somebody searches for support.
[back to top]
 
content scraper
See Main Definition:  scraper
[back to top]
 
context
Synonyms:  adding context
Related Terms:  social network, subject domain disambiguation
In regards to search engines, context is a way of improving search relevancy by considering factors beyond what the user actually typed in; in other words, the engine adds in additional data or assumptions to the search to get better results. There are many forms of "additional data" that search engines might consider. For example, the system may consider social networking data to boost relevancy of popular documents. Or the system may limit the scope of search to a particular subject domain; for example, if a computer technician searches for "sun", the system might assume they are referring to the computer company Sun Microsystems, whereas an elementary student may have been referring to the Sun at the center of our solar system.
[back to top]
 
corporate network
Related Terms:  private network, firewall, Intranet
The secure network that links together computers at a company. Traffic and visitors from the outside global Internet are kept out by a network isolation filter called a firewall.
[back to top]
 
corporate portal
See Main Definition:  enterprise portal
[back to top]
 
crawler
See Main Definition:  spider
Usage: Some vendors do make a distinction between a "crawler" and a "spider". The different terms sometimes involve the decoupling of downloading web pages and creating the actual search indices.
[back to top]
 
cron job
Related Terms:  script file, script
A way of scheduling and automatically starting Unix shell scripts at regular time intervals. For example, many sites use a cron job to run their search engine spider at night when the network is not being heavily used.
[back to top]
 
cross vendor
Related Terms:  third party vendor, NIE, SearchTrack
A product, service or tool that works with multiple search engines. Most larger companies actually use more than one search engine, but the tools each search vendor provides tends to work only with their engine. Third party vendors can offer tools that work with multiple search engines. For example, when a Marketing department is looking at search activity for different parts of the site, they probably don't care what specific search vendor was used, they just want to know what visitors searched for. A third party tool could offer search analytics across all the search engines in use on the site and thus provide this type of global view.
[back to top]
 
Data Mining
Related Terms:  ETL
Analysis of large volumes of relatively simple data to extract important trends and new, higher level information. For example, a data mining program might analyze millions of product orders to determine trends among top-spending customers, such as their likelihood to purchase again, or their likelihood to switch to a different vendor.
[back to top]
 
data quality
See Main Definition:  search engine data quality
[back to top]
 
data silo
A system containing a set of documents or data. Silo sometimes also implies a rather standalone self contained system, which includes its own data storage and an embedded search engine. A silo that includes an advanced fulltext search engine and use primarily for that purpose may be referred to a search appliance. In many cases a silo may have its own embedded search engine and also have its content indexed and search by an external search engine as well.
[back to top]
 
database
See Main Definition:  relational database
Related Terms:  collection, document index, catalog
Usage: database is a very broad term, usually requiring more context to define precisely.
[back to top]
 
database gateway
Related Terms:  relational database, index (verb)
A means of hooking up a search engine to a relational database, so that the database's records can be searched search engine.
[back to top]
 
database index
Synonyms:  index (noun)
Related Terms:  relation database
The binary files associated with a traditional database that hold the actual data; typically stored on a hard drive.
[back to top]
 
database offloading
Related Terms:  search engine, relational database, zero term search
Using a search engine for tasks that were traditionally handled by relational databases. For example, a search engines might generate a report about products that have been sold this week, instead of using Oracle.
[back to top]
 
deep web
Related Terms:  scraper, spider, dynamic content, web page form
A way to more deeply spider a web site, beyond simply following links. For example, a deep web spider often fills in web page forms with a range of terms and can submit the form repeatedly, and then capturing the various results. Many simple web spiders will miss content that is only accessible by searching with forms.
[back to top]
 
deferred search
Related Terms:  repository database, federated search
This can be thought of as an extended form of federated search. Some remote systems may not accept distributed searches from a federated search engine. As a workaround, these remote systems are described in a repository database, and references to that remote system are returned. For example, a company may choose to not include HR payroll information in the Intranet federated search system. An employee searching for "salaries" could instead be given a notice telling them to visit the HR Payroll system to find salaries. If the employee has a login to that separate system, they can go there and do the search. In this way, highly sensitive data can still be located by those who need it, but not accidentally included in casual federated search results.
[back to top]
 
Did you mean?, "Did you mean?"
See Main Definition:  alternate term suggestions
Usage note: this particular phrasing was popularized by Google.
[back to top]
 
directed results
See Main Definition:  content promotion
[back to top]
 
DMOZ
Synonyms:  dmoz.org
An open source Internet taxonomy which attempts to catalog and organize all the web sites on the Internet. It is maintained by volunteers and used as a data source for many popular web portals including Google. It is sometimes said to be the open source competitor to Yahoo's ontology of web sites.
[back to top]
 
document
Related Terms:  record, hit, page, web page, URL, result, content
A unit of data indexed and searched by a search engine; typically each document is equivalent to a web page on a web site, or perhaps a Microsoft Word or Adobe PDF file, or a record in a database.
[back to top]
 
document attributes
See Main Definition:  Meta Data
[back to top]
 
document catalog
See Main Definition:  search indices
[back to top]
 
document fields
See Main Definition:  Meta Data
[back to top]
 
document frequency
Related Terms:  inverse document frequency
The number of documents in a system that contain a particular word. The assumption being that if a word appears in many documents, it is LESS LIKELY to help in relevancy calculations. This ratio is often inverted so that larger numbers indicate more relevancy (Inverse Document Frequency)
[back to top]
 
document highlighting
The practice of visually highlighting the users' search terms in a matching document when the user opens it up to read it. This is sometimes confused with the highlighting of document summaries in the results list.
[back to top]
 
document index
See Main Definition:  search indices
[back to top]
 
document indexer
See Main Definition:  indexer
[back to top]
 
document level security
Related Terms:  collection level security, sub-document level security
Controlling access to sensitive content on a document by document basis.
[back to top]
 
document management system
See Main Definition:  content management system
[back to top]
 
document meta data
See Main Definition:  meta data
[back to top]
 
document pipeline
Related Terms:  document indexing pipeline
A set of processes that a document passes through while being indexed. Each process is designed to modify the document in a certain way. For example, a process may look for dates within the text of the document, and add any such dates to the document's meta tags.
[back to top]
 
document profiling
See Main Definition:  automated document classification
[back to top]
 
document tagging
Synonyms:  meta tagging, tagging
Related Terms:  Meta Data, automated document profiling, scope of search, taxonomy
When documents are fed through an automated document profiler, meta tags can be added to the document to reflect which profiles matched. Later, that meta data can be used to limit the scope of the search.
[back to top]
 
Documentum
Related Terms:  Content Management System
A popular content management system.
[back to top]
 
DPump
Related Terms:  XPump, XML, API
An enhancement to NIE's XPump language that allows Java programmers to add new features into XPump. DPump is the "API" for XPump.
[back to top]
 
drill-down
Related Terms:  Search 2.0, results list navigation
Providing clickable choices in a results list so that the user can further refine the search results. For example, on a shopping site, a search for "plasma tv" might provide drill down links for various price ranges, various manufacturers, and links for particular stores where the product is sold. Clicking on any of these links will narrow the search to just those matches.
[back to top]
 
dynamic content
Synonyms:  dynamically generated content, dynamic web pages
Related Terms:  URL, CGI, spider, relational database, CMS, static content
Web pages on a web site that are generated dynamically whenever a visitor needs it. A simple example is a web page that includes an advertisement that changes each time a different visitor views the page. A more elaborate example would be a web based content management system (CMS) where each document is actually stored in a relational database and is looked up and shown whenever needed. Some spiders have trouble indexing dynamic content.
[back to top]
 
dynamic summaries
A textual summary of a document is often displayed in the results list under the title of a document. Dynamic summaries show portions of the document that contain the specific search terms entered by the user; the exact terms are often highlighted or bolded in the summary. Dynamic summaries are very popular.
[back to top]
 
early binding security
An efficient method of providing document level security. A user's search terms are augmented by field level operators that setup a filtered search based on which documents that user can see. This change to the query happens before it is submitted to the search engine, so that the search engine only returns documents that the user can see.
[back to top]
 
embedded search engine
Related Terms:  search engine, API
When a search engine is included as part of a larger software application. For example, many content management systems allow users to search through all the documents in the system; the search engine has been embedded in the CMS via the search engine's API. Many email programs also have embedded search engines, to help users find old emails by keyword.
[back to top]
 
Endeca
Related Terms:  taxonomy, parametric search
A search engine vendor with excellent parametric search technology. See http://endeca.com
[back to top]
 
enterprise portal
Synonyms:  company portal, corporate portal
Related Terms:  portal site
A portal site that is specific to one company. Usually the portal will be inside the company's secure Intranet and only be accessible to employees. It will usually include an enterprise search engine as an important component.
[back to top]
 
enterprise search engine
Related Terms:  search engine, Intranet, search engine vendors
A search engine that indexes and searches content with a company's Intranet. Unlike a local site search engine, enterprise engines typically index the content of multiple web servers on their own local Intranet. Usage: The adjective "Enterprise" also sometimes implies handling a very large amount of data.
[back to top]
 
Enterprise Search Newsletter
See Main Definition:  NIE Enterprise Search Newsletter
[back to top]
 
entity
Related Terms:  entity extraction
A piece of data of a known type, such as a date or amount of money or a reference to a particular city. Entities are often normalized to a common format, such as representing a date in YYYY-MM-DD hh:mm:ss format, regardless of how it was originally written in the source document.
[back to top]
 
entity extraction
Synonyms:  entity extractor
Related Terms:  entity, ETL
Automatically identifying and extracting specific patterns of text and treating them as a specific data type. For example, the phrases "Jan-01-2006", "January 1st, 2006" and "Near Years Day '06" all refer to the same date; an entity extraction system would understand this, and store all three as 2006-01-01. Entity extraction is useful to capturing dates, times, geographic locations, amounts of money, the names of people and companies, address, phone numbers, etc. By recognizing and properly storing these entities, a system can properly match user searches to the source documents, even though no actual words will match. For example, a user searching for "fifty dollars" could match a document with "$50.00".
[back to top]
 
ETL
See Main Definition:  Extract, Transform and Load
[back to top]
 
explicit summaries
A textual summary of a document is often displayed in the results list under the title of a document. Many document formats allow the author to specifically create a summary. This is a very common practice in HTML documents. The summary may not contain the specific key words the user typed in.
[back to top]
 
Extensible Markup Language
Synonyms:  XML
Related Terms:  XPump, DPump
A very useful standard format for computer data, which makes it easy to move data between different computer programs and systems. XML has become widely accepted in the past few years. Officially XML stands for "Extensible Markup Language"
[back to top]
 
external Meta Data
Synonyms:  overlay Meta Data, overlaid Meta Data
Related Terms:  Meta Data, CMS
Meta data for a document is usually stored inside the document file. However, in some cases, meta data can be assigned to a document after it was created and not be stored directly inside the document. An example of this is when a document is uploaded into a Content Management System; the user can assign additional document properties in the CMS. Special indexing procedures may be required to insure that the external Meta Data is properly associated with the contents of the actual document inside the search engine index.
[back to top]
 
Extract, Transform and Load
Synonyms:  ETL
Related Terms:  content mining, meta tagging, scraping
A process of gathering, converting and storing data, often from many locations. The data is often converted from one format to another in the process. Officially, ETL is an abbreviation for "Extract, Transform and Load"
[back to top]
 
Extranet
Related Terms:  Intranet
A semi-private controlled network run by a company for the benefit of its customers and partners. Enterprise search engines are often used to index content on the company's Extranet.
[back to top]
 
faceted search
Related Terms:  parametric search, hybrid search, scope of search, taxonomy
Faceted search is an extension to parametric search where the additional suggested searches are not limited to just well defined document meta data groups, and instead may be automatically derived using statistical methods. More importantly, faceted search engines do not blindly suggest choices that won't match any documents (later parametric engines fixed this as well). Also, faceted search engines are a bit more dynamic in how they break up the range of data in a particular field; for example, if all matches were in the same city, then it would not bother to offer city as a choice. Conversely, if matches were scattered among thousands of cities, the faceted engine might choose to suggest searches by state. See http://ideaeng.com/pub/entsrch/v2n6/article03.html
[back to top]
 
fact extraction
Related Terms:  entity extraction
An automated process of extracting specific facts from the text of many different documents. These systems usually do not use true artificial intelligence; they usually rely on simpler statistical analysis of words, phrases and entities. Sentences using ambiguous language or pronouns will usually not result in an extracted fact. If a fact appears consistently in many documents, it may be display in the results list.
[back to top]
 
FAQ
See Main Definition:  Frequently Asked Question
[back to top]
 
FAST Search and Transfer
Synonyms:  FAST
One of the high end vendors of enterprise search software. FAST stakes their reputation on searching incredibly large amounts of content very quickly. See http://fastsearch.com
[back to top]
 
feature vector
A set of interesting words, phrases or entities that are of statistical significance within a document. These specific items may be useful in finding other related documents, which should have a similar set of features.
[back to top]
 
federated indexing
Related Terms:  federated search
In contrast to federated search, federated indexing allows a single search engine to index content among many distributed systems, often crossing organizational boundaries. This allows a single search engine to search all of the distributed content.
[back to top]
 
federated search
Synonyms:  heterogeneous search
Related Terms:  federated indexing
Taking a users' search and sending it to multiple search engines, then combining the results back together. This approach is sometimes preferred as it doesn't require any single engine to index all of the content. Disadvantages can include different query languages for each engine, combining relevancy scores that use different scales, duplicate content, and timeout issues.
[back to top]
 
field
Related Terms:  Meta Data, zone, hybrid search
In traditional relational databases, fields were the pieces of data stored for each record in the database. Search engines have a similar concept, but tend to refer to these well defined pieces of data as Meta Data or document fields. Search engines also allow for large amounts of unstructured data, sometimes referred to as zones, which act more like database blobs. Having both types of data allows for hybrid searches.
[back to top]
 
fielded search
See Main Definition:  hybrid search
[back to top]
 
fielded search operator
Synonyms:  fielded search
Related Terms:  fulltext search operator
Search engines can perform searches that are very similar to traditional database searches, for example using the equals operator, or <=, >=, etc. When fielded search is combined with fulltext search operators, this is sometimes referred to as a filtered search or hybrid search.
[back to top]
 
field-level security
Synonyms:  sub-document security
Controlling access to specific parts of a document, such that different users can see different parts of the document. For example, a technical support person might be able to see most of the data for a customer, but not specific financial information.
[back to top]
 
file transfer protocol
Synonyms:  FTP
A network protocol for transferring files over the Internet. Some search engine spiders are able to retrieve an index documents stored on FTP servers.
[back to top]
 
filelist.txt
Synonyms:  sitelist.txt
Related Terms:  Autonomy Ultraseek
A file format stored on a web server that summarizes the recent changes to the site's pages. The Ultraseek spider can read this file and efficiently reined only the pages that have been added or changes, without the need to respider the entire site. This file format is an open standard and is very easy to parse, although at this time only Ultraseek and Ultraspider support it.
[back to top]
 
filtered search
See Main Definition:  hybrid search
Related Terms:  scope of search
[back to top]
 
filtered search
Synonyms:  source query text
Related Terms:  fielded search operator
A portion of the query sent to the search engine that limits the scope of the search, but is not used for highlighting or relevancy calculations.
[back to top]
 
Firefox
Related Terms:  Mozilla
An open source web browser based on the Mozilla code based.
[back to top]
 
firewall
Related Terms:  Intranet, corporate network, private network
A device that separates a company or institutions private Intranet from the public Internet and only lets carefully selected data cross between the two.
[back to top]
 
fixed summaries
See Main Definition:  static summaries
[back to top]
 
Flash
A software plugin for web browsers that allows for rich animation and highly interactive web pages. FLASH is generally not supported by most search engine spiders, and sites that require FLASH for navigation will typically have trouble being indexed.
[back to top]
 
form
See Main Definition:  web page form
[back to top]
 
FreeFind
Related Terms:  search engine, site search engine, ASP ("Application Service Provider")
An excellent low cost search engine for small to mid sized public web sites. FreeFind is an ASP, and therefore web sites using their service do not have to install any software on their local web server. See http://freefind.com
[back to top]
 
Frequently Asked Question
Synonyms:  FAQ, FAQs (plural)
Related Terms:  compound documents
A section of a web site that lists questions that are frequently asked, and the answers to those questions. Many sites also allow visitors to search over all these questions and answers with their search engine.
[back to top]
 
FTP
See Main Definition:  file transfer protocol
[back to top]
 
fulltext operator
Search engine query syntax that is specific to word and phrase matching, vs. more traditional field operators like =, <=, etc.
[back to top]
 
Full-Text search engine
See Main Definition:  search engine
Synonyms:  fulltext search engine
Usage: in this form the "Full-Text" prefix is used to emphasis the fact that these searches work on unstructured textual data, verses traditional databases' emphasis on structured data.
[back to top]
 
Full-Text search index
See Main Definition:  search indices
Synonyms:  fulltext index
Usage: in this form the "Full-Text" prefix is used to emphasis the fact that these searches work on unstructured textual data, versus traditional databases' emphasis on structured data.
[back to top]
 
fuzzy matching
Synonyms:  fuzzy search
Related Terms:  wildcard, typo, stemming, soundex
Allowing search terms to match a wide variety of words found in a document. There are many types of fuzzy matching, such as stemming, wildcard matching, thesaurus, and common misspellings or typos. Users may become frustrated if fuzzy matching brings back too many seemingly irrelevant matches, or if fuzzy matches are allowed to swamp exact matches.
[back to top]
 
Gartner Magic Quadrants
Synonyms:  Gartner Quadrants, Magic Quadrants
A yearly ranking of search engine vendors by Gartner, Inc. The report includes a graph based on two primary factors: vision and ability to execute; the upper right hand quarter of the graph indicates strength in both areas, and is the preferred ("magic") quadrant to be in. Many large companies use this report to help select which vendors to seriously evaluate. They publish similar reports for other industries as well.
[back to top]
 
generalized taxonomy
See Main Definition:  taxonomy
Usage: Just "taxonomy" is normally sufficient. The prefix "generalized" is used to distinguish a regular taxonomy from the new Behavior Based Taxonomies
[back to top]
 
geocoded search
Synonyms:  location aware search, location sensitive search
Searchable data that includes longitude and latitude in its meta tags. Users can then restrict the scope of their search to content related to their local geographic area.
[back to top]
 
Google
Related Terms:  web search engine, Internet search syntax
The world's best known web search engine. http://google.com
[back to top]
 
Google Appliance
Related Terms:  enterprise search engine, Google
Google has packaged their web search engine into an actual computer case that can be installed at companies to provide enterprise search for their private network.
[back to top]
 
GUI ("Graphical User Interface")
Related Terms:  UI, Web UI
A more modern User Interface which allows the user to control software with a mouse and keyboard, and is graphically displayed as a set of windows, menus and icons. When the human user clicks or types, the UI then takes the appropriate action. Examples include Microsoft Windows, Mac OS, GNome and KDE. Enterprise search vendors have added GUIs to their products over the years to make them easier to use and administer.
[back to top]
 
heterogeneous search
See Main Definition:  federated search
[back to top]
 
hierarchical data
Related Terms:  taxonomy
A way of organizing and storing information, where specific details are nested inside broader and broader categories of data. The broadest level of data can be through of as the "root" of a "tree", while smaller and smaller levels of detail can be thought of as "branches". For example, the World has Countries. Countries have States. States have cities. Cities have streets. Etc. This data could be nested, such that the "World" would be the broadest item of information; the "root" of a hierarchal database storing geographical data. XML is a particular format of hierarchal data.
[back to top]
 
highlighting
See Main Definition:  search highlighting
[back to top]
 
histogram
Related Terms:  relevance histogram, search activity histogram
A graph of tabulated data items, where each bar represents the number of times that particular item appeared. It slopes from the upper left to the lower right, showing the most frequent to least frequent items. The right side of the graph, where the counts trail off, is often called the long tail. Histograms are used in search engines for many things, including query tuning relevancy histograms and to show the most popular searches.
[back to top]
 
hit
See Main Definition:  result list entry
[back to top]
 
hosted search
Synonyms:  hosted search engine
Related Terms:  ASP ("Application Service Provider"), local site search engine, search engine, FreeFind
A search engine that is packaged as an ASP; the advantage is that search can be easily added to a web site, without the need to install software locally. An example of hosted search is FreeFind (http://freefind.com)
[back to top]
 
Htdig
An early type of Internet search engine.
[back to top]
 
HTML ("Hypertext Markup Language")
Related Terms:  web page, World Wide Web, XML
The most common format used to create web pages on the World Wide Web. HTML looks somewhat similar to XML. Document files in this format often have an extension of .html or .htm.
[back to top]
 
HTML form
See Main Definition:  web page form
[back to top]
 
html scraper
See Main Definition:  web page scraper
[back to top]
 
HTTP
See Main Definition:  HyperText Transport Protocol
[back to top]
 
hybrid search
Synonyms:  fielded search, filtered search
Related Terms:  taxonomy, parametric search, faceted search, scope of search
A search that includes both fulltext and traditional database search criteria. For example, a tech support person could look for "installation errors" (full-text) within a particular product line (more like a traditional database field search). By combining together the additional criteria of "product='accounting software'", the tech support person gets a more targeted scope of search, and is more likely to find the installation error they were looking for.
[back to top]
 
Hypertext Transport Protocol
Synonyms:  HTTP
A network protocol used by web browsers to talk to web servers.
[back to top]
 
IBM OmniFind
Related Terms:  search engine vendors
A brand name for several search engines offered by IBM. OmniFind Discovery Edition now includes iPhrase, a technology that IBM acquired.
[back to top]
 
IDF
See Main Definition:  inverse document frequency
[back to top]
 
IDOL
See Main Definition:  Autonomy IDOL
[back to top]
 
import/export
Related Terms:  relational database, indexer, spider
In traditional databases, data needed to be imported into the database system; when moving data from one system to another, data would be exported from the source system, and then imported into the destination system. Most full-text search engines do not offer robust import and export capabilities; some do offer import-only tools. Instead, search engines use the process of "indexing" or "spidering" to index the documents, and the original source documents are left where they were. Some third party vendors do offer limited import and export tools to move data between search engines.
[back to top]
 
incremental spidering
Related Terms:  spider
A method of spidering a web site that attempts to only download pages that are new or have changed. Over time, incremental spiders create a database of individual page URLs and track how often they change; they use this data to guess which pages need to be refetched and when. This form of spidering may delay the reindexing of pages that have recently changed, but who have historically been static. This method of spidering may also allow for "orphaned pages".
[back to top]
 
index (noun)
Synonyms:  collection, database index, word index, search indices, word inversion, binary indices
Related Terms:  indexer
Typically refers to a set of large binary data files stored on a disk.
[back to top]
 
index (verb)
Related Terms:  index (noun), indexer, spider, search indices, word index, database gateway
The tabulating and storing of data into the binary indices. The term has substantial technical differences when applied to search engines vs. traditional relational databases.
[back to top]
 
indexer
Synonyms:  document indexer
Related Terms:  search indices, spider, import/export, index (noun), index (verb)
Before a search engine can quickly search through documents, it must first create search indices that list every word in every document, along with information about each document's Meta Data. The program that performs this task is often referred to as an indexer. Usage: "indexer" is an older term, and is typically used when the process of indexing will be fairly simple and can be run from the command line; for more complicated web crawling the term "spider" is preferred.
[back to top]
 
indexing pipeline
See Main Definition:  document pipeline
[back to top]
 
indigenous search engine
Synonyms:  native search engine
Related Terms:  embedded search engine, data silo
The built in search engine inside of a system, such as the built in search capability in a document management system, data silo or search appliance.
[back to top]
 
inference
A statistical method used by search engines to find relevant documents even if they do not contain the words in the user's query. This technology is not always accurate.
[back to top]
 
Infoseek
See Main Definition:  Autonomy Ultraseek
Infoseek was the original creator of Ultraseek. They were also an early web search engine.
[back to top]
 
Inktomi
See Main Definition:  Autonomy Ultraseek
Usage: Inktomi briefly owned Ultraseek. They bought it from Infoseek and then eventually sold it to Verity
[back to top]
 
interactive results list
See Main Definition:  results list navigation
[back to top]
 
interactive search
See Main Definition:  results list navigation
[back to top]
 
Internet
Synonyms:  "The Net"
Related Terms:  World Wide Web
(vs. Intranet and Extranet) The global public computer network. Usage: Sometimes people are actually referring to the World Wide Web, which is a subset of the entire Internet. Examples of the Internet that are outside the scope of the World Wide Web include email, ftp, instant messaging, file sharing, etc. The Internet predates the World Wide Web by many years. Indexing and searching the entire public Internet is very different from handling Enterprise data on a private Intranet and Extranet.
[back to top]
 
Internet Explorer
Related Terms:  Firefox, Mozilla
The very popular web browser that ships on all Microsoft Windows machines.
[back to top]
 
Internet query syntax
See Main Definition:  Internet search syntax
[back to top]
 
internet search engine
See Main Definition:  web search engine
[back to top]
 
Internet search syntax
Synonyms:  Internet query syntax
Related Terms:  Google, Verity Ultraseek, web search engine, SQL
An informal set of syntax rules for expressing advanced searches in modern search engines. The most common attribute is the use of a plus sign ("+") to mean that a term is required, and a hyphen or minus sign ("-") to exclude a word from the search Unlike the VQL used in relational databases, search engines do not have a universally accepted cross vendor syntax. Internet search syntax also often recognizes quotation marks to demark exact phrases, and ()'s to convey precedence.
[back to top]
 
Intranet
Synonyms:  corporate network, enterprise network, secure network, private network, closed network
Related Terms:  firewall, enterprise search engine
The secured network connecting all the computers of a particular company or institution. Intranets are usually shielded from the public Internet via a device called a firewall.
[back to top]
 
inverse document frequency
Synonyms:  IDF
Related Terms:  document frequency
A popular mathematical technique used to calculate a document's relevancy to a particular search term. A term that appears in FEWER documents is assumed to be more important than a common word appearing in many documents.
[back to top]
 
iPhrase
Related Terms:  OmniFind, interactive search
Originally a privately held search technology company, which has since been acquired by IBM and is now part of the OmniFind product line. iPhrase uses advanced techniques for recognizing and analyzing common phrases and abbreviations, and presenting them to users in an innovative way, which facilitates interactive search.
[back to top]
 
iterative search
See Main Definition:  results list navigation
[back to top]
 
JavaScript
A programming language for adding interactivity to web sites. Many search engine spiders do not understand JavaScript; if a site requires JavaScript for navigation, many spiders will not be able to index the site.
[back to top]
 
join
Related Terms:  relational database, table, SQL
In traditional databases, a "join" is a SQL query that pulls records from multiple tables and connects the records via common fields. For example, a join between the employee table the department table could show the names of each department, and the names of each employee in that department. Full-text engines do not usually do "joins" at search time; if database data of that sort is to be searched then it would be joined at index time, not search time.
[back to top]
 
JSP ("Java Server Pages")
Related Terms:  web server, ASP ("Active Server Pages")
A Sun programming language and environment for building interactive web sites. JSP stands for "Java Server Pages". Allows a programmer to easily embed computer programs inside of web pages; the computer programs are written in Sun's Java programming language. Though they have similar names, Microsoft ASP and Sun JSP are generally not compatible with each other.
[back to top]
 
K2
See Main Definition:  Autonomy K2
[back to top]
 
k2 spider
Related Terms:  Autonomy K2
A brokered spider used to build K2 collections.
[back to top]
 
KeyView
Synonyms:  KeyView filters, Key View
Related Terms:  Autonomy, Verity
A set of filters used to interpret various document types including Microsoft Word, Excel and PowerPoint. KeyView was bought by Verity, and is now owned by Autonomy.
[back to top]
 
knowledge worker
A white collar employee who works primarily with information, such as legal or medical documents, technical data, financial data, etc. These users often make heavy use of search engines to do their jobs, and benefit greatly from improvements to their search engine. They are often search power users.
[back to top]
 
late binding security
A less efficient method of providing document level security. A user's search is submitted directly to the search engine, so that the search engine returns all matching documents regardless of whether the user can see them or not. Then every single document is checked against the security system that controls access to documents, to verify whether the user can see it or not. Only allowed documents are then display to the user. This method is easier to implement than early-binding security, but puts a heavy load on the corporate security system since every single document in every results list needs to be checked. Further, if the user only has access to a small percentage of total documents, the system may need to screen hundreds or thousands of documents just to find 10 documents that the user is allowed to see on one page of results.
[back to top]
 
LDAP
See Main Definition:  Lightweight Directory Access Protocol
[back to top]
 
legacy data
Related Terms:  content mining
Information that is stored in a format that is not easy to work with using modern computer software. Previously "Legacy Data" often referred to reports and text that only existed in paper format, but could not be accessed by a computer. More recently, enterprise data published on web pages, or in PDF and Word documents, has become difficult for modern software to access and process in an automated way.
[back to top]
 
legacy data
Related Terms:  XPump, PDF
Important data that is stored in a format that cannot be easily indexed by search engines, or that presents other technical challenges. Paper documents are often cited as an example of legacy data. However, these days even content that is stored in some electronic formats such as HTML, Microsoft Word