Anatomy of a Search Engine
At the most basic level, search engines share these three logical components:
Each one of these systems is dependent on the previous system in order to function. A search engine can't run searches if there is no fulltext index. And there won't be any fulltext index if the documents were never fetched and indexed.
Modern search engines have further subdivided the data prep, index and search functions into additional subsystems, in order to achieve better modularity and extreme scalability.
A fully exploded component view might look like:
Cross Page Links Database
Determine Mime Type
Determine Document Language
Word Inversion Index
Accept initial Query from the User
Even this outline is oversimplified for larger, more complex engines.
Traditional Monolithic Search
What's NOT a Search Engine
Note that it is technically possible to search in just one step by scanning the source material line by line every time a search term is entered. This is very slow and inefficient and we do not consider these systems to be true search engines.
Examples of these linear scan based "pseudo-search-engines" include:
In addition to being very slow (relative to the fulltext index based designs), these simpler pseudo engines typically don't have advanced capabilities like stemming or thesaurus support.