Creating a Search Catalog
Defining A Catalog
Once you have installed Site Server Search on your NT server and defined
the host information, you can begin the process of defining a catalog.
A catalog must be defined before you can populate or query it.
Each catalog has the following properties:
- The catalog's name
From your ASP search script you can tell the Query object which catalogue
to use. This can be useful when you have more than one site that needs
to be searched using the same script.
- The catalog build type
This can be a Crawl, i.e. Internet, intranet or file system crawl
(this is the most common method and will be covered through this example),
Notification or ODBC database.
- Where to propagate the results catalog
The process of copying the indexed catalog to host search server(s).
In addition, Crawl based catalogs have the following properties:
- The Start Addresses
The location from which to start a search. This could be the
root directory of a domain, such as http://www.devguru.com/ for a Web page,
a UNC path or file path (for file system crawls), or a Microsoft Exchange
Public folder.
There can be multiple start addresses to a catalog definition.
- Site Rules
These define how you handle access to specific parts of a site .
- Document Types to Crawl
Allows you to specify which type of documents should be included in
the catalog, such as HTML and ASP pages but excluding XLS files.
- Access Paths
Documents are accessed by the Crawler from one location, but can be
displayed to site visitors from a different location.
Creating a Catalog
For our sample ASP Site Search application we need to create a Crawl
catalog using the MMC Catalog Definition Wizard.
To create a Crawl catalog, start MMC. Under Search, double click
on a host to expand it and select Catalog Build Server. From
the Action menu, select New and then click Catalog
Definition with a Wizard. The Wizard is the easiest way to define
a catalog.
The following steps must be completed in order to define a search catalog:
- Catalog Name
As mentioned above, this name is used from your ASP code to define
which catalog to search against. For example, you may require three
site searches that each search a specific domain. These could also
be merged into a single overall site search catalog that searches
all domains.
In this case we will create a site search, so we will enter 'devguru'
- Specify the Crawl Type
Web link - follows hyperlinks within each document
File crawl - crawls all files in a directory and sub-directory. Results
in a file-name being used as the document path, i.e. 'c:\info\index.htm'
Exchange crawl - crawls all messages in a MS Exchange public folder.
For our site search application select Web link crawl.
- Start Address
You must specify at least one address to start the search from. As
this is a site search we will enter the URL to our home page, http://www.devguru.com
- Search Hosts
This the name of the host machine(s) where you want the completed
catalog to be propagated to. Choose at least one host from the list
On the final screen you can select to start build now. By selecting
this option Site Server Search will begin the process of trawling through
your site to build its catalog of documents.
After you have pressed Finish, you will see the new catalog
listed under the Catalog Build Server section.
|