Beruflich Dokumente
Kultur Dokumente
ColdFusion
Administration
ColdFusion® 5
Macromedia® Incorporated
Copyright Notice
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
About This Book
Contents
• Intended Audience................................................................................................... xiv
• New Features ............................................................................................................ xiv
• Developer Resources................................................................................................. xv
• About ColdFusion Documentation ........................................................................ xvi
• Getting Answers ...................................................................................................... xvii
• Contacting Macromedia........................................................................................ xviii
xiv About This Book
Intended Audience
Advanced ColdFusion Administration is intended for anyone who needs to perform
ColdFusion server management tasks, such as configuring advanced security or
managing clustered servers.
New Features
The following table lists the new features in ColdFusion 5:
Developer Resources
Macromedia Corporation is committed to setting the standard for customer support
in developer education, technical support, and professional services. The Web site is
designed to give you quick access to the entire range of online resources, as the
following table describes.
Book Description
Installing and Describes system installation and basic configuration for
Configuring Windows NT, Windows 2000, Solaris, and Linux
ColdFusion Server
Advanced Describes how to connect your data sources to the ColdFusion
ColdFusion Server, configure security for your applications, and how to use
Administration ClusterCATS to manage scalability, clustering, and
load-balancing for your site
Developing Describes on how to ColdFusion Server to develop your dynamic
ColdFusion Web applications, including retrieving and updating your data,
Applications using structures, and forms
Getting Answers xvii
Book Description
CFML Reference The online-only ColdFusion Reference provides descriptions,
syntax, usage, and code examples for all ColdFusion tags,
functions, and variables
CFML Quick A brief guide that shows the syntax of ColdFusion tags,
Reference functions, and variables
Getting Answers
One of the best ways to solve particular programming problems is to tap into the vast
expertise of the ColdFusion developer communities on the ColdFusion Forums.
Other developers on the forum can help you figure out how to do just about anything
with ColdFusion. The search facility can also help you search messages from the
previous 12 months, allowing you to learn how others have solved a problem that
you might be facing. The Forums is a great resource for learning ColdFusion, but it is
also a great place to see the ColdFusion developer community in action.
xviii About This Book
Contacting Macromedia
This chapter describes how to create and configure ColdFusion data sources for
several databases using ODBC, OLE DB, and native drivers. It also describes how to
use ColdFusion to create a database file in a cfquery and how to use connection
string options.
For basic information on data sources and for information on how to connect to SQL
Server, Access, and Oracle databases, see Installing and Configuring ColdFusion
Server.
Contents
• About ColdFusion database drivers........................................................................... 4
• Using ColdFusion to Create a Data Source (UNIX only)........................................ 10
• Using Connection String Options ............................................................................ 12
• Connecting to DB2 Databases ................................................................................. 15
• Connecting to dBASE/FoxPro Databases................................................................ 21
• Connecting to Excel Databases ................................................................................ 24
• Connecting to Informix Databases .......................................................................... 26
• Connecting to Sybase Databases ............................................................................. 32
• Connecting to Text Databases.................................................................................. 35
• Connecting to Visual FoxPro Databases.................................................................. 37
4 Chapter 1 Advanced Data Source Management
About OLE DB
OLE DB is a Microsoft specification for a set of interfaces designed to access data.
Although ODBC is primarily used to access SQL data in a platform-independent
manner, OLE DB is designed to access SQL and non-SQL data in an OLE Component
Object Model (COM) environment.
Note
OLE DB is available only on Windows NT/2000.
ColdFusion developers can access a range of data stores through Microsoft OLE DB,
including:
• MAPI-based data stores such as Microsoft Exchange and Lotus Mail
• Nonrelational data stores, such as Lotus Notes
• LDAP 2.0 data
• Data from OLE applications like word processors and spreadsheets
• Mainframe data
• HTML and text files, flat-file data
For more information, including a list of provider vendors, visit the Microsoft OLE
DB site at http://www.microsoft.com/data/oledb/.
Note
Before you install MDAC, stop all unnecessary services, such as Web servers, virus
scanning programs, or mail servers.
You should be aware of the following characteristics in how ColdFusion handles OLE
DB:
• The initial driver drop-down list box does not display all of the installed OLE DB
providers. If you are creating a data source using a provider other than
SQLOLEDB or Jet, such as MSDASQL or a MERANT OLE DB driver, you must
select other from the drop-down list box.
• No matter which provider you select from the drop-down list box, you must still
retype its name in the Provider field.
• When using MSDASQL, you must have an ODBC data source already defined for
the database. Enter this ODBC DSN in the ProviderDSN text box.
6 Chapter 1 Advanced Data Source Management
3 Enter a name for the new data source and select an OLE DB Provider from the
drop-down list.
Note
Do not name a ColdFusion data source Registry or Cookie, as these words are
reserved for use by ColdFusion.
4 Click Add.
The Create OLE DB Interface Data Source page displays:
Note
For the Server field, if the database is a local SQL Server database, enclose the
word local in parentheses: (local).
• If you are using another provider Enter its name as the Provider. Be aware
that MSDASQL requires a predefined ODBC data source for the database to
which you will connect. Enter the name of the ODBC data source in the
Provider DSN field.
8 Chapter 1 Advanced Data Source Management
Note
The omission of required username and password information is a common
reason why a data source fails to verify.
If ColdFusion cannot verify the data source, the Status displays as Failed. You can
run a cfquery against the failed data source to get more detailed information
about the problem. You also can try embedding a username and password into
the cfquery tag to see if the query works.
About ColdFusion database drivers 9
If you are creating a UNIX data source, you might need to set environment
variables for your database client library by editing the ColdFusion start script in
<installdir>/coldfusion/bin. For detailed information about editing the
ColdFusion start script for your particular database, see the section about your
database.
Note
See the MERANT DataDirect ODBC Reference for details about SQL statements used
for flat-file drivers. The default location of this reference on UNIX machines is:
<installdir>/coldfusion/odbc/doc/odbcref.pdf. On Win32 machines, the default
location is: <installdir>/cfusion/bin/odbcref.pdf.
<HTML>
<HEAD>
<TITLE>dBASE Table Setup</TITLE>
</HEAD>
<BODY>
<!---
Before running this code, you need to create the
newtable data source in the ColdFusion Administrator,
specifying the MERANT dBASE/FoxPro ODBC driver.
--->
Date date,</P>
Descript char(254))
</cfquery>
<cfoutput QUERY=""QueryTest2"">
#Bean_ID# #Name#<br>
</cfoutput>
</BODY>
</HTML>
12 Chapter 1 Advanced Data Source Management
The APP and WSID values are readily available when you run the above query. A SQL
Server DBA can use Profiler to view this information in a trace:
Using Connection String Options 13
Example
The following code is a dynamic connection. There is no data source definition in the
odbc.ini settings.
<cfquery name = "DATELIST"
dbtype=dynamic
blockfactor=100
connectstring="DRIVER={SQL SERVER};
SERVER=(local);
UID=sa;
PWD=;
DATABASE=pubs">
SELECT * FROM authors
</cfquery>
For dynamic connections, the ColdFusion Administrator Maintain Connect default
value is enabled. If you need to change this, you must use regedit to add a pseudo
__DYNAMIC__ key in the ColdFusion/CurrentVersion/DataSources Registry key and
specify a MaintainConnect value of 0.
Connecting to DB2 Databases 15
Option Description
Data Source Name A name for your data source.
Description Descriptive information about the data source.
Database Alias The DB2 database name.
Note
Although native driver performance is usually superior to ODBC performance, you
can connect to DB2 via ODBC on Windows. To do so, create the data source in the
Windows ODBC Data Source Administrator, using the IBM ODBC driver. In the
ColdFusion Administrator, configure any ColdFusion-specific settings, such as a
username and password.
Option Description
Data Source Name A name for your ODBC data source.
Description Descriptive information about the data source.
Database Name The name of the DB2/6000 database.
Cursors Preserve cursors at the end of each transaction. Select this
option if you want cursors to be held at the current position
when the transaction ends. Doing so can impact the
performance of your database operations.
you create a database, it is automatically cataloged on the server with the database
alias (database_alias) the same as the database name (database_name). The client
uses the information in the database directory, along with the information in the
node directory, to establish a connection to the remote database.
3 Place the dll file generated in step 2 into the appropriate directory on the server.
For example, put the file on a server called DB2SERVER into the
C:\sqllib\function\ folder. You could also put it into the
C:\sqllib\function\unfenced\ folder.
4 Run a CREATE PROCEDURE statement to register your stored procedure.
• The CREATE PROCEDURE statement creates a row in the database catalog
(syscat.procedures table), making it visible to client applications, including
ColdFusion Server.
• The stored procedure’s name is what you called it in your SQC file. The
following example calls the stored procedure outsrv.
• The create procedure statement looks like this:
CREATE PROCEDURE server1
(OUT sal double, IN salind integer)
EXTERNAL NAME ’outsrv!outsrv’
LANGUAGE C
DETERMINISTIC
PARAMETER STYLE DB2DARI;
5 Grant users who need to run the stored procedure permission to execute it:
GRANT EXECUTE ON PACKAGE server1 TO PUBLIC;
Example
The following example demonstrates a CFSTOREDPROC tag that calls the stored
procedure named outsrv. The actual stored procedure name and the password
parameter are case sensitive.
<CFSTOREDPROC PROCEDURE="outsrv"
DATASOURCE="DB2SERVER"
USERNAME="DB2"
PASSWORD="DB2">
<CFPROCPARAM TYPE="OUT"
CFSQLTYPE="CF_SQL_DOUBLE"
VARIABLE="FOO" NULL="NO">
<CFPROCPARAM TYPE="IN"
CFSQLTYPE="CF_SQL_INTEGER"
VALUE="0"
NULL="NO">
</CFSTOREDPROC>
<CFOUTPUT>#FOO#</CFOUTPUT>
Connecting to dBASE/FoxPro Databases 21
Note
Because dBASE and FoxPro databases are configured identically in the ColdFusion
Administrator, they are discussed together in this section. For information on
connecting to Visual FoxPro databases, see “Connecting to Visual FoxPro Databases”
on page 37.
Option Description
Data Source Name A name for your ODBC data source.
Description Descriptive information about the data source.
Database Directory The path dBASE database that you want to use as an ODBC
data source.
Database Version Enter the version number of the dBASE or FoxPro database
that you want to use: dBASE versions III, IV, and 5.0 and
FoxPro versions 2.0, 2.5, and 2.6.
Driver Settings Collating Sequence Determines the sequence in which
the fields sort.
Page Timeout Specifies the period of time, in tenths of a
second, that an unused page remains in the buffer before
being removed.
22 Chapter 1 Advanced Data Source Management
Option Description
Data Source Name A name for your ODBC data source.
Description A short description of the data source.
Database Directory The name, including the complete path, of the database file
that you want to use as the ODBC data source.
Database Version The version number of the dBASE/FoxPro database that you
want to use: Clipper, dBASE versions III, IV, V, and FoxPro
versions 2.5, 3.0.
Data File Extension The file extension to use for data files. The default setting is
DBF. The setting cannot be more than three characters, and
it cannot be one the driver already uses, such as MDX or
CDX. The Data File Extension setting is used for all Create
Table statements.
• Use international collating sequence Determines the
order in which records display when you issue a Select
statement with an Order By clause.
If you do not select this option, the driver automatically
uses the ASCII sort order. This order sorts items
alphabetically with uppercase letters preceding lowercase
letters. For example, “A, b, C” sorts as “A, C, b.”
If you select this option, the driver uses the international
sort order as defined by your operating system. This sort
order is always alphabetic, regardless of case; the letters
from the previous example would sort using as “A, b, C.”
Connecting to dBASE/FoxPro Databases 23
Option Description
Data Source Name A name for your ODBC data source.
Description A short description of the data source.
Database Directory The name, including the complete path, of the database file
that you want to use as the ODBC data source.
Database Version The version number of the dBASE/FoxPro database that you
want to use. ColdFusion supports dBASE V, IV, and FoxPro
v3.0.
Driver Settings • Use lowercase file extension (.dbf) Specifies whether
lowercase file extensions are accepted. Select this option
to accept lowercase extensions. Clear this option to accept
only uppercase extensions.
• Use international collating sequence Determines the
order in which records display when you issue a Select
statement with an Order By clause.
If you do not select this option, the driver automatically
uses the ASCII sort order. This order sorts items
alphabetically with uppercase letters preceding lowercase
letters. For example, “A, b, C” sorts as “A, C, b.”
If you select this option, the driver uses the international
sort order as defined by your operating system. This sort
order is always alphabetic, regardless of case; the letters
from the previous example would sort using as “A, b, C.”
24 Chapter 1 Advanced Data Source Management
Option Description
Data Source Name A name for your ODBC data source.
Description Descriptive information about the data source.
Workbook/Directory The path and filename of the Excel workbook that you want
to use as the ODBC data source.
Version Enter the version number of the Excel workbook that you
want to use. The ColdFusion Administrator supports Excel
versions 3, 4, 5, 97, and 2000.
Driver Settings Rows to Scan The number of rows to scan to determine
the data type of each column. The data type is determined by
the maximum number of kinds of data found. If data does not
match the data type guessed for the column, the data type is
returned as a NULL value.
Enter a number from 1 to 16 for the rows to scan. The default
value is 16. If this setting is 0, all rows are scanned. A
number outside the limit returns an error.
Connecting to Excel Databases 25
Option Description
Data Source Name A name for your data source.
Description Descriptive information about the data source.
Database Workbook A name that identifies the workbook file containing the Excel
database.
• International sort Determines the order in which
records display when you issue a Select statement with an
Order By clause.
If you do not select this option, the driver automatically
uses the ASCII sort order. This order sorts items
alphabetically with uppercase letters preceding lowercase
letters. For example, “A, b, C” sorts as “A, C, b.”
If you select this option, the driver uses the international
sort order as defined by your operating system. This sort
order is always alphabetic, regardless of case; the letters
from the previous example would sort using as “A, b, C.”
26 Chapter 1 Advanced Data Source Management
Option Description
Data Source Name A name for your ODBC data source.
Description Descriptive information about the data source.
Database Name The name of the database to which you want to connect.
Host Name • The name of the machine on which the Informix server
resides.
• Use Informix registry for Logon ID and
Password Determines whether the server reads the
Logon ID and Password directly from the Informix
registry.
Server Port Number The number of the server port. This will match the number
(Informix Dynamic entered in the services file for the Informix server.
ODBC Server Driver
only)
Service (Informix 7.x/ The network services file.
9.x Driver only) On Windows NT, the services file is located in
C:\winnt40\system32\drivers\etc.
On UNIX, the file is located in /etc.
Server Name The name of the Informix server as it appears in the sqlhosts
file.
Protocol (Informix 7.x/ The network protocol.
9.x Driver only)
Connecting to Informix Databases 27
Option Description
Data Source Name A name for your data source.
Description Descriptive information about the data source.
Default Database The name of the database to which you want to connect by
default.
Server The name of the Informix server, including the full path.
Host The name of the machine on which the Informix server
resides.
Service The network services file.
On Windows NT, the services file is located in
C:\winnt40\system32\drivers\etc. On UNIX, the file is located
in /etc.
Protocol The network protocol.
Client Locale Specifies the language, territory, and code set that the client
application (ColdFusion) uses to perform operations that read
or write to the database.
Database Locale Specifies the language, territory, and code set that the
Informix server needs to interpret locale-sensitive data types.
Translation DLL Leave blank.
2 You must uncompress and/or untar this file into a separate subdirectory on your
server; for example: /opt/isdk.
This is the directory that you point to in the start script as INFORMIXDIR.
3 Run the script installclientsdk to install the client SDK.
4 Before you continue, verify that you can connect to the Informix server from a
client other than ColdFusion or with a utility such as iconnect.
Code Description
dbserver This name matches the value in your Informix server /etc/onconfig
file, and also matches the INFORMIXSERVER environment
variable in your /coldfusion/bin/start script.
nettype Determines what kind of network protocol to connect with.
hostname The hostname of the server where the database is. You can put the
IP address or hostname.
service name The entry in the /etc/services or master NIS file for the port that
informix listens on. This can also be the port# for the service name,
such as 1526.
Note
If necessary, check with your system administrator for the name of the service.
30 Chapter 1 Advanced Data Source Management
Option Description
Data Source Name A name for your ODBC data source.
Description Descriptive information about the data source.
Database Name The name of the database to which you want to connect.
Server Name The name of the server containing the Sybase tables that you
want to access. If not supplied, the initial default is the server
name in the DSQUERY environment variable. On UNIX, the
name of a server from your $SYBASE/interfaces file.
Server Port The port number that the Sybase server monitors for
requests. The default value is 5000.
Network Library The name of the network library. This specifies which network
(Windows only) protocol to use (Winsock or NamedPipes). The default is
Winsock. This option has no effect on UNIX; on UNIX, TCP/
IP is used.
Performance Row Limit (Fetch Array Size on Windows) The number of
rows the driver retrieves from the server for a fetch. Selecting
this option can increase performance by reducing network
traffic.
Create stored procedures (UNIX only) Determines
whether stored procedures are created on the server for
every call to SQLPrepare.
When enabled, stored procedures are created for every call
to SQLPrepare. This setting can result in bad performance
when processing static statements.
When disabled, the driver does not create stored procedures.
Disable database cursors for Select statements
Determines whether database cursors are used for Select
statements. In some cases performance degradation can
occur when performing large numbers of sequential Select
statements because of the amount of overhead associated
with creating database cursors.
Connecting to Sybase Databases 33
Option Description
Data Source Name A name for your ODBC data source.
Description Descriptive information about the data source.
Server Enter the name of the server hosting the Sybase System 11
database.
Default Database Enter the name of the default database to use on the
specified server.
Enable RAISERROR Select to obtain user-defined errors
from stored procedures and triggers.
Note
If the Sybase database is on the same server as ColdFusion, make sure the $SYBASE
environment variable that you set up in the ColdFusion start script is pointing to the
Sybase client directory and not the Sybase server directory. Both of these directories
contain an interfaces file.
CFHOME=/opt/coldfusion
CFUSER=nobody
SYBASE=/work/sybclient11.1;export SYBASE
#II_SYSTEM=/home
# Set library search path
# NOTE: Add your database client library directory to the FRONT
# of this list
# Example:
# LD_LIBRARY_PATH=$SYBASE/lib:/usr/dt/lib:/lib:/usr/openwin/lib:
# $CFHOME/lib
LD_LIBRARY_PATH=$SYBASE/lib:/usr/dt/lib:/lib:/usr/openwin/lib:$CFHOME/
lib
After you complete all the steps in this section, you must stop and restart ColdFusion
services to reload the odbc.ini file.
Connecting to Text Databases 35
Option Description
Data Source Name A name for your ODBC data source.
Description Descriptive information about the data source.
Database Directory The directory that contains the text files.
Extensions List Lists the filename extensions of the text files on the data
source. To use all files in the directory, enter *.*. To use only
files with specific extensions, add each extension that you
want to use.
Option Description
Data Source Name A name for your data source.
Description Descriptive information about the data source.
Database Directory The directory that contains the text files.
Extensions List Lists the filename extensions of the text files on the data
source. To use all files in the directory, enter *.*. To use only
files with specific extensions, add each extension that you
want to use.
36 Chapter 1 Advanced Data Source Management
Option Description
Table Type Select the default type of text file. ColdFusion supports
comma-separated, tab-separated, character-separated, fixed
length, and stream table types. The default type is used when
creating a new table and opening an undefined table.
• Column Names in First Line Select this check box to
use the first row of data in the text file as column names.
• International Sort Determines the order in which
records display when you issue a Select statement with an
Order By clause.
If you do not select this option, the driver automatically
uses the ASCII sort order. This order sorts items
alphabetically with uppercase letters preceding lowercase
letters. For example, “A, b, C” sorts as “A, C, b.”
If you select this option, the driver uses the international
sort order as defined by your operating system. This sort
order is always alphabetic, regardless of case; the letters
from the previous example would sort using as “A, b, C.”
Connecting to Visual FoxPro Databases 37
Option Description
Data Source Name A name for your ODBC data source.
Description A short description of the data source.
Database Info • Path The name, including the full path, of the
database to which you want to connect.
• Visual FoxPro Database Connect to a Visual FoxPro
database (dbc file) and to all the tables and local views
in the database.
• Free Table Directory Connect to a directory of free
tables, that is, tables not associated with any particular
dbc file.
Driver Settings • Collating Sequence Select the collating sequence
that you want to use. The collating sequence determines
the sequence in which the fields sort.
• Exclusive Select this check box so that the driver
opens the Visual FoxPro database exclusively when you
access data using this data source. Other users cannot
access the database or the tables in the database while
the database is opened exclusively. Tables within the
exclusively opened database are opened as shared.
This option is not valid when you select the Free Table
Directory option.
• Fetch data in background Select this check box to
fetch records in the background (progressive fetching).
Otherwise, ColdFusion waits until all records in the
result set are fetched.
38 Chapter 1 Advanced Data Source Management
Chapter 2
Administrator Tools
The tools provided with ColdFusion Administrator make it easy for you to share Web
site files, analyze log files, and monitor Web site performance. This chapter
introduces the Administrator Tools included with ColdFusion Server 5 and their
benefits. The ColdFusion Administrator online Help provides additional information
about how to use these tools.
Contents
• Accessing the Administrator Tools........................................................................... 40
• Features on the Tools Tab ......................................................................................... 41
40 Chapter 2 Administrator Tools
Navigation bar
The left navigation bar lists the tools provided with ColdFusion Administrator. Note
that some of the tools provided are limited to the ColdFusion Server 5 Enterprise
Edition.
Features on the Tools Tab 41
Logging Settings
Use the Logging Settings page in the ColdFusion Administrator to specify where you
want to store your log files and which log file format you prefer to use when viewing
your log files. To access the Logging Settings page in the ColdFusion Administrator,
click Tools > Logging Settings.
Help button
Submit Change
button
Default logging
directory.
42 Chapter 2 Administrator Tools
On the Logging Settings page, you can accept the defaults or change them as needed.
Each time you make a change, you must apply the change by clicking Submit
Change.
By default, log files are stored in the CFusion\log directory and all log files are
saved using the ColdFusion 5 format. To learn more about the log settings and the
differences between the log file formats, click Help on the Logging Settings page.
Log Files
The Log Files page in ColdFusion Administrator enables you to view a list of all
generated log files from a single display. On this page, you can search and filter the
content of log files, store log files for future use, and remove log files that are no
longer needed. To access the Log Files page in ColdFusion Administrator, click
Tools > Log Files.
Help button
Check boxes
for viewing
single or
multiple log
files.
Controls
You can view single or multiple log files by checking the log files you want to view and
clicking View Log Files.
Use the individual controls when you want to search and filter log files, remove log
files, store log files for future reference, and/or schedule the storage of log files.
To learn more about the log files and its settings, click Help on the Log Files page.
Features on the Tools Tab 43
Server Reports
The Server Reports supplied with ColdFusion Server 5 Enterprise Edition provide
instantaneous statistics about the performance of your ColdFusion Server. In
addition, some of these reports provide information that you can use to track server
configuration changes and view current configuration settings.
To access the Server Reports in the ColdFusion Administrator, click Tools > Server
Reports. The following table provides a brief overview of each report type.
For additional information about the Server Reports, click Help on the Server Reports
page.
Note
If ClusterCATS is installed on your machine, all ColdFusion System Monitoring
features appear in the ClusterCATS application and do not appear in the ColdFusion
Administrator. To learn how to use the System Monitoring features in ClusterCATS,
see the sections later in this book.
46 Chapter 2 Administrator Tools
Help button
The easy-to-read tabular form on the Server Configuration page lists the names and
status of the Web servers configured on your local system along with the status of
each threshold setting and monitoring device configured. To learn more about the
information and management controls provided on this page, click Help on the
Server Configuration page.
Note
A monitoring device in ColdFusion can include Server Probes and/or a third-party
hardware load balancing device. The status for these monitoring devices only
appears on the Server Management page after each device is configured in
ColdFusion using the Server Probes page or Hardware Integration page. For more
information about the configuration options required for these monitoring devices
and their benefits, see the sections in this chapter on Server Probes and Hardware
Integration.
Features on the Tools Tab 47
Server Probes
The Server Probes tool in the ColdFusion Administrator enables you to actively test
the health and operation of your local Web sites. Specifically, ColdFusion offers two
probes for monitoring your Web site environment:
• Default probes The default probes let you test the availability of the
ColdFusion Server or a specific URL.
• Custom probes The custom probes let you specify a test program to run as a
probe. Depending on the program executable that you specify, you can use a
custom probe to verify the availability of almost any part of your Web site such as
a database.
You can easily configure a default or custom probe from the Server Probes page in
the ColdFusion Administrator. To access this page, click Tools > System Probes.
Help button
Probe
management
controls.
Probe type
setting.
Required Web
server user-defined
setting.
Optional
user-defined
settings.
48 Chapter 2 Administrator Tools
The tabular form on the Server Probes page identifies the names and status of each
probe configured in ColdFusion along with the name of the Web server that the
probe is monitoring. The probe management controls let you suspend the operation
of a configured probe and/or create, edit, and remove probe configurations.
The Server Probe Setup page lets you configure the settings required to set up a
default or custom probe in ColdFusion. Use the Type drop-down list box to select the
type of probe you want to configure. For more information about how to configure a
default or custom probe in ColdFusion, click Help on the Server Probe Setup page.
Alarms
The Alarm Email Notification page in ColdFusion Administrator lets you set up alarm
notifications in the event that one or more critical events fail in your Web site. You
can choose to notify yourself or others when one of the following events occur: Web
server failure, Web server busy, load balancing device is unreachable, or a system
probe failed.
To access the Alarm Email Notification page in ColdFusion Administrator, click Tools
> Alarms.
Help button
Required
user-defined
notification fields
.
On the Alarms Email Notification page you can choose to set up alarm notifications
for one or all events. To notify someone of an event, enter their e-mail address in the
Notification Recipient field. To learn more about how to configure alarm
notifications in ColdFusion, click Help on the Alarm Email Notification page.
Features on the Tools Tab 49
Help button
Required
user-defined
fields
To configure ColdFusion to work with Cisco Local Director, you must specify the
DNS name and IP address of the Local Director box and the DFP Port that the
ColdFusion Server uses to communicate with the Local Director box. For more
information about configuring Cisco Local Director with ColdFusion, click Help on
the Setting Up Load Balancing Hardware page.
The Archive and Deploy tools group in the ColdFusion Administrator includes the
following features: Archive Settings, Create Archive, Deploy Archive, and Archive
Security. A description of each of these features follows.
Archive Settings
The Archive Settings page in the ColdFusion Administrator lets you configure various
archive system settings that apply to all archive and deploy operations. To access the
Archive Settings page in ColdFusion Administrator, click Tools > Archive Settings.
Help button
Archive working
directory.
Controls for
defining archive
variables.
Features on the Tools Tab 51
The following table provides a brief description of the features presented on the
Archive Settings and Variable Definition page:
Feature Description
Archive working The archive working directory text box lets you specify the directory
directory where all archive and restore temporary files and log files are
written.
By default the archive temporary files and log files are written to
Cfusion\cfam\car\temp directory.
Save log files The save log file controls let you specify when ColdFusion writes
archive events to a log file.
ColdFusion, by default, logs events to the archive log file each time
you create or restore an archive.
Controls for The archive variable controls let you add, edit, and view archive
defining archive variables in ColdFusion. Archive variables define locations that you
variables commonly archive and restore on your system. The variable acts
as an alias, saving you time from typing long paths to files you want
to archive or restore.
The tabular form on the Archive Settings page identifies all the
archive variables supplied with ColdFusion plus all the user-defined
archive variables. You can click Add Variables to define new
variables or click a variable name shown in the tabular form to edit
the definition of an existing variable.
All variable definitions in the ColdFusion Administrator are defined
and edited using the Variable Definition page. In the Variable
Definition page you must provide a name for the variable definition
and a full path to the file(s) that you often archive and restore.
Default settings You can use the default settings provided on the Archive Settings
page or change them as needed. Each time you make a change on
the Archive Settings page, you need to apply that change by
clicking Submit Changes.
To learn more about the archive settings and archive variables in ColdFusion, click
Help.
52 Chapter 2 Administrator Tools
Create Archive
The Create Archive page in ColdFusion Administrator lets you create and edit
archive definitions and build archive files. To access the Create Archive page in
ColdFusion, click Tools > Create Archive.
Help button
Build archive
control
Navigation bar to
specify the items to
archive.
Use the controls on the Create ColdFusion Archive page to add, edit, and view
archive definitions. The tabular form on the this page identifies all user-defined
archive definitions in ColdFusion. You can click Create Archive Definition to define
new archive definitions or click any definition name shown in the tabular form to
view and edit the settings of an existing definition.
Features on the Tools Tab 53
All archive definitions are defined and edited using the Archive Definition page. Use
the navigation bar on the Archive Definition page to define the items you want to
archive and restore. Each time you make a change in the Archive Definition page you
must click Apply. You can remove items in the archive definition by clicking Delete.
After you create your archive definition, you can click Build Archive on the Create
ColdFusion Archive page. The Build Archive control creates a compressed archive file
(.car file extension) of your definition.
To learn more about creating archive files in ColdFusion, click Help on the Create
ColdFusion Archive page or the Archive Definition page.
Note
After you build an archive file (car), you can deploy that archive file on your system or
securely send it electronically to another system. For more information about how to
deploy an archive file or securely send an archive file electronically, see the following
sections in this chapter on Deploy Archive and Archive Security.
Deploy Archive
The Deploy Archive page in ColdFusion lets you to restore an existing archive file (car
file) to either a location on your system or to a mapped network location.
To access the Deploy Archive page in ColdFusion Administrator, click Tools > Deploy
Archive.
Help button
The archive file retrieval control lets you specify the retrieval method required to
obtain the archive file (car file) you want to deploy. You can select one of three
controls: local, http, or ftp. Use local when the archive file is on your system or on a
mapped network drive. Use http if the archive file is posted on a Web site. Use ftp if
the archive file is posted on an FTP site. Alternatively, if you specified local as the
54 Chapter 2 Administrator Tools
retrieval method you can click Browse Server to specify the archive file’s location on
your system. After you specified the retrieval method and location of the archive file
you can then click Next on this page to specify the location to restore the file.
To learn more about how to deploy archive files in ColdFusion, click Help on the
Archive Deploy page.
Archive Security
The Archive Security page lets you digitally sign and/or encrypt your ColdFusion
archive files. With these features you can securely send and receive archive files
electronically.
By signing an archive file, you notify the recipient of the archive file that the file
actually came from you and has not been forged or tampered with. By encrypting an
archive file, you can help protect the contents of the archive file from intruders.
After you sign or encrypt an archive file in ColdFusion, you can then securely
exchange this file electronically by using any of the following transport methods:
• E-mail program Use an e-mail program, such as Microsoft Outlook, to
exchange secure archive files.
• FTP site Exchange secure archive files by posting the secure file on an FTP
(File Transfer Protocol) site.
• Web site Exchange secure archive files by posting the secure file on an on a
Web site.
• Shared file system Exchange secure archive files by posting the secure file to a
shared local or remote network location.
To sign or encrypt files in ColdFusion Administrator use the Archive Security page.
To access this page, click Tools > Archive Security.
Help button.
Click the names of the settings in the navigation bar to import a security certificate,
sign an archive file, verify the signature of an archive file, encrypt an archive file, or
decrypt an archive file.
Note
Certificates are required to digitally sign a ColdFusion archive file or to verify the
signature of an archive file. You can obtain a certificate from a Certificate Authority
such as VeriSign, Inc., or you can generate a certificate using the Key Tool utility
provided with the Sun Microsystem JDK 1.3.
For details on how to import a certificate, sign an archive file, verify the signature of
an archive file, or encrypt and decrypt an archive file, click Help on the Archive
Security page in the ColdFusion Administrator.
56 Chapter 2 Administrator Tools
Part II
ColdFusion Security
ColdFusion Security
This chapter introduces ColdFusion Server Basic and Advanced security features that
allow you to protect a wide variety of ColdFusion resources.
Contents
• Why Is ColdFusion Security Important?.................................................................. 60
• Choosing a Level of ColdFusion Security ................................................................ 62
• To Learn More About Security.................................................................................. 67
60 Chapter 3 ColdFusion Security
Basic security
Basic security is the initial default security framework for ColdFusion and lets you
secure the ColdFusion server with password access:
• Application development Secure access to data sources and files with password
protection. Block access to several sensitive ColdFusion tags.
• Application deployment Prevent applications from executing several
ColdFusion tags that could be used to upload, delete, or otherwise manipulate
server files.
• Administrative Access Secure access to ColdFusion administrative functions
with password protection.
All editions of ColdFusion Server include Basic Security features. When you install
ColdFusion Server, Basic Security is automatically activated.
Advanced security
ColdFusion Server Professional and Enterprise editions include Advanced Security
features that provide scalable, granular security for building and deploying your
ColdFusion applications:
• Application development Control access to files, data sources and
administration for each developer on your team. Coordinate team development
on shared servers with the assurance that sensitive data and applications are
secure.
• Application deployment Create complex rules to programmatically control
access to functionality within applications. Provide multiple levels of user access
from within an application. Confine applications to secure areas that can flexibly
restrict the access applications have to directories, components, databases or
other resources on the server.
• Administrative access Assign different degrees of administrative access to
specified users.
Data encryption
Both Basic and Advanced security support the Secure Sockets Layer (SSL) protocol
which encrypts Internet application protocols (like HTTP) with public key
cryptography. SSL protects against snooping, eavesdropping, or any sort of message
tampering when information is passed between clients and servers. Most Web
servers support SSL. The server administrator installs a private key that is used to
decrypt inbound data and encrypt outbound data. Once the key is installed, the Web
server automatically encrypts or decrypts data as it is received or transmitted.
62 Chapter 3 ColdFusion Security
If your Web server connections are encrypted with SSL, all communications,
including ColdFusion transmissions, are automatically encrypted. You do not have
to do anything from within ColdFusion to activate data encryption.
Note
If you turn off both Basic and Advanced security, all ColdFusion resources and server
administration functions become available to anyone who has access to the server.
When you install ColdFusion Server, leave Basic security passwords in place until you
finalized your security plan and are ready to implement it.
As you begin to think about how you will secure your Web applications, keep these
important points in mind:
• Security is never absolute. Technology is fast-evolving and the Web is, by nature,
an environment that favors openness and access over privacy and security. You
should regularly review your security plans to make sure your company hasn’t
outgrown them.
• No single security model is perfect for every application or development
environment. For example, an intranet deployed only to employees from a server
behind your company’s firewall and an e-commerce site on the Web would have
very different security plans. When they plan applications, ColdFusion
developers must weigh the costs and benefits of the various security alternatives
in the context of the project requirements.
• Trust is perhaps the most important concept to consider when you are planning
any security strategy. When users decide whether or not to download something
from the Web, it usually depends on if they trust the site. The site can engender
trust in any number of ways, by providing a digital certificate, for instance.
Similarly, how open you choose to make your ColdFusion environment depends
on whether or not all your users are trusted. Generally speaking, the level of trust
is inversely proportional to the level of security you need to implement. If trust is
high—for example, if your development group consists of five people and they all
access the ColdFusion server over a LAN—then you can probably manage with a
less secure environment. However, if trust is lower—for example, if you're an
Internet Service Provider (ISP) hosting a development site—then you will need to
implement a more complex and restrictive security plan. The more public the
application or development environment, the lower the level of trust.
Choosing a Level of ColdFusion Security 63
Basic security covers all phases of application development and deployment. Basic
security is a good solution for trusted users because it offers them a single access
level—complete control. Consider implementing Basic security if you have legacy
systems or other security models in place.
Basic security also requires very little support from the ColdFusion Server
administrator: You’ll want to choose a password that can’t be easily guessed and
change it regularly, but aside from that, Basic security won’t require much of your
time. Developers, on the other hand, will need to spend more time writing their
applications; granular run-time access security is possible with Basic security, but
involves custom development.
Advanced Security, on the other hand, allows you a great deal of flexibility and
control, but requires more time and greater effort to set up and maintain than Basic
security. Depending on how you implement it, Advanced Security can also affect
performance when developers try to access resources from ColdFusion studio or
when users try to run ColdFusion applications.
The following sections examine the effects of Basic and Advanced security on
application development and deployment, and on administrative access to
ColdFusion Server. Remember that when you select Basic or Advanced security,
you’re making a global choice that affects all aspects of ColdFusion. You can’t, for
instance, select Basic security for server administration and Advanced security for
RDS. This section is organized by major task simply to help you prioritize your
security concerns and then select the type of ColdFusion security that best meets the
majority of your needs.
Developing applications
Basic and Advanced security both restrict access to ColdFusion servers from
ColdFusion Studio. You can restrict access by developers who connect to ColdFusion
servers over a local area network as well as by developers who use RDS to access
ColdFusion servers.
Deploying applications
Web applications present new security challenges for IT managers, administrators,
and application developers. Basic security leaves the bulk of runtime security
implementation to application developers. Advanced security makes it easier for
developers to authenticate users and authorize application access, because
Advanced security separates group membership and user logon maintenance from
security policy specification.
Choosing a Level of ColdFusion Security 65
Note
You can access the ColdFusion Administrator either locally or remotely. Because the
ColdFusion Administrator is a Web-based interface, it inherits the level of encryption
you set on the Web server on which ColdFusion is installed. If the Administrator is
installed on a Web server that encrypts Web connections, information sent to the
server during remote server administration is automatically encrypted.
Contents
• About Basic Security ................................................................................................. 72
• Configuring Remote Development Security (RDS) ................................................ 73
• ColdFusion Remote Development Services (RDS)................................................. 74
• Using a Password to Restrict Access to RDS............................................................ 76
• Configuring Basic Runtime Security........................................................................ 77
72 Chapter 4 Configuring Basic Security
Installation defaults
The ColdFusion Administrator installs with secure access enabled. The password you
enter as part of the setup is saved as the default, so that when you open the
Administrator for the first time, you are prompted to enter the password. We
recommend that you continue to use Administrator security until you complete the
ColdFusion server configuration. Once you’ve determined your security
requirements, you may decide to set up Advanced security. For more information,
see Chapter 5, “Configuring Advanced Security” on page 79.
By using a LAN based file access model and by restricting developer data source
access to the local workstation, a very secure development environment can be
achieved.
76 Chapter 4 Configuring Basic Security
Note
Password protection is enabled by default at server installation time. If you have not
explicitly disabled password access, then security is already configured for your
server.
Note
Whenever you make a change to Basic security settings, you need to stop and restart
the ColdFusion RDS service using the Services Control Panel in Windows or the stop
and start scripts on Solaris.
5 To specify a directory from which otherwise blocked tags can be executed, enter a
fully qualified path (using forward slashes) in the Unsecured Tags Directory field.
By default, this is the directory in which the ColdFusion Administrator is
installed.
ColdFusion displays an error message when it encounters a restricted tag in an
application. For more information about these tags, see to the CFML Reference.
Chapter 5
Configuring Advanced
Security
This chapter describes how to set up and configure ColdFusion Server advanced
security. Advanced security, which is based on Netegrity SiteMinder v. 4.11, lets you
protect a wide variety of ColdFusion resources.
Contents
• What is Advanced Security?...................................................................................... 80
• Advanced Security Basics ......................................................................................... 81
• Advanced Security Implementations ...................................................................... 84
• Creating an Advanced Security Framework............................................................ 88
• Setting Up a Security Server ..................................................................................... 89
• Caching Advanced Security Information ................................................................ 91
• Defining User Directories ......................................................................................... 92
• Defining a Security Context...................................................................................... 95
• Specifying Resources to Protect ............................................................................... 96
• Implementing ColdFusion RDS Security ................................................................ 98
• Implementing User Security .................................................................................... 99
• Implementing Server Sandbox Security ................................................................ 100
• Securing the ColdFusion Administrator................................................................ 102
• Viewing a Map of your Security Framework ......................................................... 103
• An Example of ColdFusion Studio Security .......................................................... 104
• Advanced Security Single Sign-On......................................................................... 109
• Undocumented Tags and Functions ..................................................................... 110
80 Chapter 5 Configuring Advanced Security
Note
If you have not already read Chapter 3, “ColdFusion Security” on page 59," take a few
minutes now to do so. This chapter discusses the differences between Basic and
Advanced security and helps you decide which type of security is best for your
ColdFusion environment.
Advanced Security Basics 81
User directories
User directories provide a listing of user information, such as the user’s name, login
password, and the names of any groups to which the user belongs. ColdFusion
Advanced Security lets you incorporate any of the following industry-standard user
directories:
• Lightweight Directory Access Protocol (LDAP) directory
• Windows NT domain
• ODBC data source
A user directory authenticates users by verifying that their credentials match those in
the directory. It tells you if someone is a valid user of the system. When you create a
security context, you select users and groups from a user directory and then
individually assign them access rights to ColdFusion resources. ColdFusion
developers then include code in their applications that checks if a user has rights to a
resource.
Because ColdFusion uses your existing LDAP directories, NT domains, or data
sources, you don’t have to create and maintain redundant user directories just to
develop or deploy ColdFusion applications. Using existing NT or LDAP provides an
added bonus: User groups to whom you assign security privileges automatically
inherit changes to group membership; no additional maintenance is required. For
example, suppose your company’s NT Domain contains a user group called BigDev.
You’ve used Advanced Security to give the BigDev group access to a number of
custom tags. Your company hires a new developer to work in the BigDev group.
When the new developer is added to the BigDev group in your company’s NT
domain, she’s automatically granted access to the custom tags because of her user
group affiliation.
82 Chapter 5 Configuring Advanced Security
Resource types
A ColdFusion resource type that you want to protect is the core of Advanced security.
Selecting a resource to protect doesn’t specify how to protect it or which users can
access it; you’re simply telling ColdFusion the name and, if applicable, the action of
the resource you intend to secure. For example, you can control:
• Write access to all the files in a specified directory
• Which actions of a specified CFML tag are restricted
• Inserts and updates for a specific ColdFusion data source
Resources are not secured until you specifically choose to protect them. You can
secure the following types of resources:
• Applications
• Verity Collections
• Components
• ColdFusion Tags
• ColdFusion Functions
• Custom Tags
• Data Sources
• Files and Directories
• User Objects
• Users
Policies
After you specify a resource to protect, you need to create a policy that gives a set of
users access rights to that resource. A policy binds resources to users or user groups,
that is, it grants a group of users access to specified resources.
For example, you can create a policy that gives members of a team complete access
to three data sources that the team uses regularly. You could also create a policy that
specifies the system administrator as the only user who can use the cffile tag’s
write action.
If you specify a resource to protect but do not include it in any policy, the resource is
fully protected within the Security Context—in other words, no users have access to
those resources.
Advanced Security Basics 83
Security contexts
A security context is a container for logically-related groups of policies.
You can create and implement as many security contexts as your application or
development environment requires:
• You can reuse a single security context, implementing it across several
applications.
• If you are deploying a more complex application, you may need to create more
than one security context for that application alone.
• If you’re managing a fairly small, homogeneous group of developers, you can use
a single security context for an entire ColdFusion application server.
• You can create a separate security context for each of your development groups.
This approach is recommended if you administer a hosted development
environment or if your developers access ColdFusion resources remotely.
84 Chapter 5 Configuring Advanced Security
Note
This chapter describes the steps necessary install Advanced security features and set
up the security framework in the ColdFusion Administrator. Once you’ve put the
security framework in place, developers must code security features into their
ColdFusion applications. For information about coding secure applications, see
Developing Web Applications with ColdFusion.
Advanced Security Implementations 85
Note
The security sandbox feature is only available in the Enterprise edition of ColdFusion
Server.
Implementation summary
The details of your ColdFusion Server Advanced Security implementation depend
largely on your platform and how you decide to store security policy information.
Security policy information can be stored in one of three ways:
• Using the Access database file supplied by default with ColdFusion Server
(Windows only)
• Using the ODBC data source of your choice
• Using an LDAP directory server. LDAP is the only option on UNIX.
Once you have decided on a method of storing security policy information, the
implementation details are essentially the same regardless of platform and storage
type. ColdFusion Advanced Security is implemented by defining the following
elements in order:
1 A security server.
2 A user directory, in the form of an NT domain, an LDAP directory, or an ODBC
data source.
3 A security context, with specific resource types to protect.
4 Specific ColdFusion rules to protect resources of a type suppported by the
security context.
5 Policies that bind users and groups to rules for a security context.
Setting Up a Security Server 89
Note
It’s a good idea to take the ColdFusion server offline while you’re configuring
Advanced security.
2 Select the Use Advanced Server Security check box. This enables you to set up a
security context with policies, rules, and users. Click Submit Changes.
3 In the configuration page that appears, enter information for the following
advanced security configuration areas:
• Security Server Connection Settings
• Security Server Caching Settings
90 Chapter 5 Configuring Advanced Security
5 Enter a username and password if the domain, directory, or data source requires
one. You can leave these fields blank if ColdFusion Server is running under
Administrator access.
6 Select the Secure Connect check box to implement encrypted transmission of
authentication information. Secure Connect must be enabled when accessing an
LDAP server over Secure Sockets Layer (SSL).
7 Leave the Add User Directory to Existing Security Context check box selected to
add users from this user directory to existing security contexts automatically. If
you disable this option, you must manually associate users with each security
context you create.
8 If your user directory is an NT Domain or ODBC data source, click Add to define
the directory. If your user directory is an LDAP directory, complete the steps that
follow to set LDAP directory options.
Note
ColdFusion 5 introduces a new Resources View in Advanced security. This view
provides and easy-to-use, graphical way to specify resources you want to protect and
add them to policies. Once you’ve specified user directories and created security
contexts, you can configure all Advanced security settings in the new Resource View.
To protect resources:
1 In the Advanced Server Security page, click Resources.
You see the Resource View page.
2 Select a security context from the Current Security Context drop-down box.
In the Resource Browser, any resource type you selected when you created the
current security context appears next to an icon that depicts a closed lock. This
icon indicates that you can protect individual resources of this type. Resource
types you did not select when you created the current context appear next to an
icon that depicts an open lock.
3 In the Resource Browser, select a resource type and then click the Add Resource
button at the bottom of the page.
You see the Add Resource dialog. The contents of this dialog are different for each
resource type. For example, if you select CFML Tags, you see a drop-down list that
contains all the ColdFusion tags; if you select Files and Directories, you see a text
box where you enter the name of the file or path to protect.
4 Specify the resource to protect and click OK.
You see the Resource View page again. At the bottom of the page, you see the
Policy Editor for the resource you just specified.
5 Click Add Policy.
6 Enter a name for the new policy and click OK.
For example, you could create a top-level security policy, called Platinum, to
grant to certain users broad access to protected resources.
7 Write a description of the policy and click OK.
Specifying Resources to Protect 97
You see the Resource View page again, showing the policy you just created. Other
available policies appear in a drop-down box at the bottom of the page.
8 Select the check boxes that correspond to the actions you want to protect.
Now you can add users to the policy.
Note
Only groups are displayed when you add users to a policy. To enter an individual
user, you must know the user login and enter it in the Enter User box. Displaying a
list of all possible individual users, which could easily number in the thousands,
would be a very impractical means of adding individual users to a policy.
The users you have added to the security policy are now matched to the resources
that you have also defined and added to the policy.
98 Chapter 5 Configuring Advanced Security
Note
If both user security and server sandbox security are enabled, sandbox security takes
precedence.
In order to implement server sandbox security, you must use the ColdFusion
Administrator to:
1 Set up the security server. See “Setting Up a Security Server” on page 89 for more
information.
2 Set up user directories to authenticate against an NT domain, an LDAP directory,
or an ODBC data source. See “Defining User Directories” on page 92 for more
information.
3 Create a security context for the application. See “Defining a Security Context” on
page 95 for more information.
4 Specify individual resources to protect and set up policies that match secured
resources with authorized users and groups. See “Specifying Resources to
Protect” on page 96 for more information.
5 On the ColdFusion Administrator’s Advanced Server Security page, select the Use
Security Sandbox Settings check box and then click the Security Sandboxes
button at the bottom of the page.
You see the Registered Security Sandboxes page.
6 In the Security Sandbox box, enter a fully qualified path (using forward slashes)
for the directory whose contents you want to protect.
7 Select the type of sandbox to create from the Type drop-down:
• Choosing Operating System protects OS-level resources based on privileges
assigned through a Windows NT domain.
• Choosing Security Context protects ColdFusion resources based on privileges
assigned through a security context.
8 Click Add.
You see the New Sandbox page, with the path you entered in step 6 already in the
Location box.
9 Specify a Windows NT Domain or a security context:
• If you chose Operating System in step 7, enter the NT Domain to authenticate
against in the NT Domain box.
Implementing Server Sandbox Security 101
Note
Before you can configure ColdFusion Administrator security, you must know how to
create a user directory. If you don’t know how to create a user directory, see “Defining
User Directories” on page 92.
2 Enter the server name or a TCP/IP address for the LDAP option. If you specify an
LDAP directory you can fill out the Lookup Start field with uid= and the Lookup
End field with ,ou=ou_name,o=org_name. If you leave the Lookup fields blank
then the ColdFusion Studio User will have to enter their entire distinguished
name rather than just their user name.
Adding policies
Now that you’ve selected the resources to protect, add two policies, one named
MARS and one named VENUS. At the bottom of the Resource View page, you see the
Policy Editor for the resource you just specified
To add policies:
1 Click Add Policy.
2 Enter MARS as the name for the new policy and click OK.
3 Write a description of the policy and click OK.
You see the Resource View page again, showing the policy you just created.
4 Select all the check boxes to protect all actions.
Now you can add users to the policy.
• C_R_FILE
• C_W_FILE
• C_DEVELOPMENT_R_FILE
• C_DEVELOPMENT_W_FILE.
Now the MARS policy has access rights to the mars_dsn and all files in the
c:\development\mars directory and sub directories.
Notice we did not add any of the wildcard rules named ALL_ , which protect all data
sources and files. The policies only have access to the resources explicitly defined in
their member rules. However, the policies have rules, but users still don’t have access.
The next step is assigning users and groups to the policies.
Administrative Functions
In addition to standard CFML functions, the ColdFusion 5 Administrator uses the
following undocumented functions:
• CF_SETDATASOURCEUSERNAME() Sets the default user name for a
ColdFusion data source
• CF_SETDATASOURCEPASSWORD() Sets the default password for the
ColdFusion data source
• CF_ISCOLDFUSIONDATASOURCE() Verifies a connection to a ColdFusion data
source
• CF_GETDATASOURCEUSERNAME() Gets the default user name for a
ColdFusion data source
• CFUSION_VERIFYMAIL() Verifies the connection to the default ColdFusion
SMTP mail server
• CFUSION_GETODBCINI() Gets ODBC data source information from the
Registry
• CFUSION_SETODBCINI() Sets ODBC data source information in the Registry
• CFUSION_GETODBCDSN() Gets the ODBC data source names from the
Registry
Undocumented Tags and Functions 111
Administrative Tags
In addition to standard CFML tags, the ColdFusion 5 Administrator uses the
following undocumented tags:
• CFINTERNALDEBUG Used for internal ColdFusion debugging by product
development and to PCode templates without executing them (used by the
CFML Syntax Checker).
• CFSECURITYADMIN Used for updates to Advanced Security information.
112 Chapter 5 Configuring Advanced Security
Part III
Advanced Verity Tools
This part describes a number of Verity tools and utilities you can use
for configuring the Verity K2 Server search engine, as well as creating,
managing, and troubleshooting Verity collections. The following
chapters are included:
Configuring Verity K2 Server............................................................ 115
Indexing XML Documents ................................................................137
Verity Spider .....................................................................................145
Managing Verity Collections with the mkvdk Utility ..........................185
Verity Troubleshooting Utilities .........................................................199
Chapter 6
This section provides information about setting up and configuring the Verity K2
server, which is installed with ColdFusion Server.
Contents
• Overview .................................................................................................................. 116
• About K2 Server ....................................................................................................... 118
• Starting K2 Server .................................................................................................... 120
• Stopping K2 Server .................................................................................................. 122
• Editing the k2server.ini File .................................................................................... 124
• k2server.ini Parameter Reference .......................................................................... 127
• Using the rck2 Utility to Search K2 Documents.................................................... 131
• Error Messages ........................................................................................................ 132
116 Chapter 6 Configuring Verity K2 Server
Overview
ColdFusion Server 5 includes an OEM restricted version of the Verity K2 Server,
which incorporates a highly scalable search server architecture. K2 supports
simultaneous indexing of distributed enterprise repositories and handles hundreds
of concurrent queries and users. You will see considerable performance
improvements when using K2 Server to search Verity collections.
The version of K2 Server that is part of ColdFusion 5 is restricted in the following
areas:
• For ColdFusion Professional, K2 Server can search a maximum of 125,000
documents.
• For ColdFusion Enterprise, K2 Server can search a maximum of 250,000
documents.
Make sure that the k2server.exe is running on the host you specify in the Verity
Server hostname field. Also, the port number you enter must correspond with the
port number you specify in the k2server.ini file. The default port number value in
the k2server.ini file is 9901.
118 Chapter 6 Configuring Verity K2 Server
About K2 Server
K2 Server is a high-performance search engine designed to process searches quickly
in a high performance, distributed system. The K2 search system has a client/server
model. K2 client applications, such as ColdFusion applications, provide users access
to document indexes stored in Verity collections.
K2 Server is a multi-threaded application built around the Verity search engine,
providing access to Verity collections and tracking any changes made by indexing
applications.
The K2 search system is designed to take advantage of the latest advances in
hardware and software technology and provides the following features:
• Multi-threaded architecture
• Support for Verity knowledge retrieval features, including topics
• Continuous operation support
• Incremental squeeze
• Highly scalable
Installation details
K2 is installed by default with ColdFusion server, but is activated manually by
invoking a command file executable.
• The K2 Server installed with ColdFusion is a restricted version. ColdFusion is
allowed to interact with only one K2 Server.
• If you install a fully licensed version of Verity K2 Server and configure ColdFusion
to use the K2 broker, ColdFusion will not restrict document searches.
• The restricted version of K2 Server installed with ColdFusion has document
search limits as follows: 125,000 documents (ColdFusion Professional) and
250,000 documents (ColdFusion Enterprise). Macromedia Spectra sites have a
limit of 750,000 documents.
Note
To use the K2 mode, you must edit the server registration file k2server.ini,
configure ColdFusion to use K2 Server, and restart the K2 Server executable,
k2server.exe.
Starting K2 Server
The ColdFusion installer places the K2 files into the following directories:
• Windows platforms: cfusion\bin
• UNIX: opt/coldfusion/verity/<platform>/bin
The K2 Server is started from the command line or from a script in the Unix
environment and can be integrated as a service within the Windows NT
environment. The server is designed to run with a minimum of intervention. Most
configuration parameters are set in a configuration file, which can be given a
user-assigned name (the default file name is k2server.ini).
Command-line arguments include the name of the configuration file, the TCP port
for incoming connections and the verbosity level for informational messages. The K2
Server has a warm restart capability, designed to keep the server’s well-known TCP
port open in case of a crash and to allow changes in the configuration file to be
initialized without killing the primary server process.
The K2 Server is started by the using the following command:
k2server [<option1> <option2> ...]
The options available for this command are summarized in the following table:
exit 0
122 Chapter 6 Configuring Verity K2 Server
Stopping K2 Server
You can run K2 Server either as a Windows service or in a command window, as an
ordinary application. Unless you use the -ntService 1 option when starting K2
Server, K2 runs in the command window.
fi
fi
}
exit 0
124 Chapter 6 Configuring Verity K2 Server
The k2server.ini file consists of a large number of parameters you probably won’t
need to change. To get started quickly focus on the following sections in the
k2server.ini file:
• vdkHome (line 33 in the k2server.ini file listing on page 125)
• The Coll-n sections of k2server.ini: (beginning at line 66 in the k2server.ini
file listing on page 125)
In the file listing for k2server.ini, the collection section can be found between lines
66-78.
For complete details on k2server.ini parameters, refer to “k2server.ini Parameter
Reference” on page 127.
Server section
The following table describe the keywords that can be used in the [server] section of
the server configuration file. A sample configuration file (k2server.ini) is provided
with the K2 Server executable.
The server section parameters are as follows:
Parameter Description
serverAlias An arbitrary name used to identify the server.
numThreads Default number of search threads to be started in the server process. Iftoo
many threads exist, the system can run out of memory; if too few threads exist,
then searches will be blocked and forced to wait for a Verity engine thread to
become free. The value of numThreads is based on hardware resources and
system needs..
maxFiles The maximum number of file handles that can be opened by a specific search
thread. The default value for maxFiles is dependent on the limits of the OS
used. The maxFiles value affects how file handles are shared between the
operating system and the search engine. The maxFiles and numThreads
values together can be used to tune system performance.
These values can be set for a server:
[server]
numThreads=4
maxFiles=100
The above entries for a K2 Server cause the system to support a maximum of
4 concurrent searches, with 100 file handles allocated for each search thread.
The search engine determines default values per operating system. For large
or fragmented collections, it is recommended that you explicitly set a value for
maxFiles.
portNo TCP port number for client connections. The value of portNo is the same value
assigned to portNo in the k2broker.ini file that identifies the broker referring to
this server.
numListeners Maximum number of clients that can connect to the server at one time. The
numListeners value must be equal to or greater than the sum of all
numThreads values specified by all K2 Brokers in the K2 search system. The
numThreads value is set for a K2 Broker in the k2broker.ini file.
128 Chapter 6 Configuring Verity K2 Server
Parameter Description
broker(n) Brokers to ping on startup. Multiple brokers may be specified. For example:
broker(1)=machinea:9900
broker(2)=machineb:9901
maxColSize The maximum width of the fields to return to the results list, in bytes. Default is
2048 bytes.
Keyword Description
vdkHome Directory containing Verity resources.
vdkSortingFlag A flag indicating whether the Verity engine will sort at the collection
level. Valid values are:
• NO or False or 0 to not perform sorting at the collection level
(default)
• YES or True or 1 to perform sorting at the collection level.
To implement sorting at the collection level you must set
vdkSortingFlag to YES in the k2server.ini file (in the [server]
section) and the k2broker.ini file (in the [broker] section).
sortTruncDocs Maximum number of documents to consider when sorting.
accessProfile Security Access Profile specified in the form of a query expression.
The security access profile represents the access question that a
document must pass in order for users to have access to it.
topicSet Default path name to a directory for the default topic set, which is an
indexed set of topics. The value of topicSet identifies the default topic
set to make available to clients at start-up by every search service.
knowledgeBase Default path name to a knowledgebase map file, which identifies
numerous topic sets (indexed topics). The value of knowledgeBase
identifies the topic sets (multiple) to make available to clients at
start-up for every search service).
charMap A string that names the character set to use for strings that are sent
into the server, and are generated by the server. This string must
correspond to the name of a .cs file in the root of the common
directory that configures a character set and its mappings. For
example, if your application should use character set 8859 for all of its
interactions with the server, then set this charMap to the string 8859.
Valid values include, but are not limited to, the character sets supplied
by Verity: 850 (default) for code page 850; 8859 for code page 8859.
locale The name of the locale (combination of language, dialect, and
character set) to use for all internal Verity engine operations. This
name must correspond to a subdirectory in the common directory
where the configuration file for the locale is found and where the
message database and other locale-specific files are located. Leaving
this keyword null means the server will use the default internal locale,
which is “english” written in the “850” character set.
k2server.ini Parameter Reference 129
Keyword Description
resultCacheTimeout Timeout in milliseconds for the result cache. Timeout occurs after 60
seconds or when the cache overflows based on
resultCacheQuota.
resultCacheQuota The number of slots per segment for the result cache. The result
cache is composed of 16 segments, each of which has a number of
slots for caching items in: K2SearchNew, K2SearchRecv,
K2DocReadBatch. Timeout occurs after resultCacheQuota
value * 16.
If resultCacheQuota=10, each of the segments has 10 slots. Note
that since a search operation involves a call to K2SearchNew and a
call to K2SearchRecv, an additional slot is used.
resultCacheEnabled A flag indicating whether the result cache is enabled. Valid values are:
• Yes or True or 1 enables the result cache.
• No or False or 0 disables the result cache (default).
By default, the cache is not enabled.
resultCacheMaxInBytes Amount of memory, in bytes, to use for the cache.
Collection sections
The K2 Server initializes a separate search service for each collection that you
identify in the server configuration file. To add one or more collections to the
configuration file, enter a separate block of keywords for each collection in the
following format:
[Coll-n]
collPath=<pathname>
topicSet=<topicset>
knowledgeBase=<knowledgeBase>
numThreads=<value>
maxFiles=<value>
onLine=<value>
maxColSize=<value>
locale=<language>
charmap=<charmap>
inputDateFormat=<format>
Increment the block label for each collection that you configure, starting with Coll-0.
The following table lists the keywords used to configure each collection and search
service:
Keyword Description
collPath The path name identifying the collection home directory.
collAlias An arbitrary name used to identify the collection.
topicSet The path name to a directory for the default topic set, which is an indexed
set of topics. The value of topicSet identifies the default topic set to make
available to clients at start-up by every search service. If not specified, the
value of topicSet from the [server] section is used.
130 Chapter 6 Configuring Verity K2 Server
Keyword Description
knowledgeBase The path name to a knowledgebase map file, which identifies numerous
topic sets (indexed topics). The value of knowledgeBase identifies the topic
sets (multiple) to make available to clients at start-up for every search
service. If not specified, the value of knowledgeBase from the [server]
section is used.
numThreads The number of concurrent searches for the collection. If not specified, the
value of numThreads from the [server] section is used.
maxFiles The maximum number of files that can be opened by a specific search
thread for a collection. If not specified, the value of maxFiles from the
[server] section is used. The maxfiles and numThreads values together can
be used to tune system performance. These values can be set for a
collection:
[Coll-0]
numThreads=4
maxFiles=100
The above entries for collection 0 cause K2 to support a maximum of 4
concurrent searches, with 100 file handles allocated for each search thread.
onLine A flag indicating whether the server starts up with the collection on-line.
Valid values are:
• 0 start the server with the collection off-line;
• 1 to start the server with the collection in a hidden state;
• 2 to start the server with the collection on-line (default).
In the hidden state, collections can be primed and tested, but are not yet
available for searching by users. When collections are set off-line, any
queries currently running complete using these resources; subsequent
queries do not see the resource.
maxColSize The maximum width of the fields to return to the results list, in bytes. If not
specified, the value of maxColSize from the [server] section is used.
charMap A string that names the character set to use for strings that are sent into the
server, and are generated by the server. This string must correspond to the
name of a .cs file in the root of the common directory that configures a
character set and its mappings. If not specified, the value of charMap from
the [server] section is used.
For example, if your application should use character set 8859 for all of its
interactions with the server, then set this charMap to the string 8859. Valid
values include, but are not limited to, the character sets supplied by Verity:
850 (default) for code page 850; 8859 for code page 8859
locale The name of the locale (combination of language, dialect, and character
set) to use for all internal Verity engine operations. This name must
correspond to a subdirectory in the common directory where the
configuration file for the locale is found and where the message database
and other locale-specific files are located. If not specified, the value of locale
from the [server] section is used.
inputDateFormat The input date format to be used. If there is no specified value for
inputDateFormat, the default is MDY (Month-Day-Year), a numeric format.
Using the rck2 Utility to Search K2 Documents 131
rck2 syntax
The syntax used to start rck2 from the command line is:
rck2 -server <servername> -port <portno>
For example: c:\cfusion\bin\rck2 -server localhost -port 9901
Error Messages
All K2 Client API functions return an error code, and K2Success is the successful
return value. A complete listing of API error codes follows.
Warnings
Error Code No. Description
K2Warning_CollectionDown (10) The collection was down when it was opened.
K2Warning_QueryComplex (11) Too many matching words.
K2Warning_LowMemory (12) Memory is low for indexing.
K2Warning_CollectionReadOnly (13) The collection is read-only.
K2Warning_DriverNotFound (14) Couldn’t locate specified driver.
K2Warning_LargeToken (15) Returned a token greater than maxSize.
K2Warning_ArgTooLarge (16) Argument too large.
K2Warning_DataSrcNotAvail (17) Cannot locate collection data.
K2Warning_SearchRestricted (18) Searching subset of collection.
Error Messages 135
This chapter provides an overview of the process of configuring Verity for indexing
XML files.
Contents
• Indexing Overview .................................................................................................. 138
• Style Files ................................................................................................................. 139
• Indexing XML Documents...................................................................................... 143
138 Chapter 7 Indexing XML Documents
Indexing Overview
The addition of Verity K2 to ColdFusion 5 includes the ability to index and search
XML documents. To be properly indexed, XML data files must be well-formed XML
documents, as specified in the Extensible Markup Language Recommendation http:/
/www.w3.org/TR/REC-xml.
Briefly stated, a well-formed XML document contains elements that begin with a
start tag and terminate with an end tag. One element, which is called the root or
document element, cannot appear in the content of another element. For all other
elements, if the start tag is in the content of another element, the end tag is also in
the content of the same element.
The XML data files must have a .xml extension if the universal filter is used. If
documents do not have a .xml extension, you can index XML documents into an
XML-only collection by specifying the XML filter in the style.dft file.
Implementation summary
Verity support for XML documents is implemented by an XML filter file and
controlled using a number of style files. The style files can be found in the following
locations:
• cfusion\verity\Common\style (Windows)
• opt/coldfusion/verity/common/style (UNIX)
• cfusion\verity\common\style\file (Windows)
• cfusion\verity\common\style\custom (Windows)
• opt/coldfusion/verity/common/style/file (UNIX)
• opt/coldfusion/verity/common/style/custom (UNIX)
Style Files 139
Style Files
The following style files are required to enable indexing of XML files. Default style
files are installed into in the cfusion\verity\common\style directory (Windows)
and opt/coldfusion/verity/common/style directory (Linux and UNIX).
style.uni file
To index XML documents, the style.uni must include the following lines:
type: "text/xml"
/format-filter = "flt_xml"
/charset= guess
/def-charset = 8859
<?note:
140 Chapter 7 Indexing XML Documents
<?note:
? "preserve" indexes xmltag as zone with the presence of
? <ignore xmltag="*" />
?>
<?next 1 sample line commented out:
<preserve xmltag="section_3" />
?>
<?note:
? "suppress" will suppress every xmltag embedded within
?>
<?next 2 sample lines commented out:
<suppress xmltag="region_1" />
<suppress xmltag="region_3" />
?>
<?note:
? "field" will further index content between the beginning
? and end of this pair of xmltags as field values
?>
<?next 1 sample line commented out:
<field xmltag="column_1" />
?>
<?note:
? if attribute "fieldname" is present, above content will
? be indexed into VDK field under the value of fieldname
? instead of the field under the name of xmltag
?>
<?next 1 sample line commented out:
<field xmltag="column_2" fieldname="vdk_field_2" />
?>
<?note:
? if attribute "index" is set to "override", above content
? will be indexed into VDK field overriding values read in
? from bulk insert file, if any
?>
<?next 1 sample line commented out:
<field xmltag="column_3" index="override" />
?>
<?note:
? fieldname & index attributes could both exist
?>
</style.xml>
Style Files 141
Command Description
field Indexes the content between the pair of specified XML tags as field
values. By default, the field name is the same as the xmltag value,
unless otherwise specified by the fieldname attribute.
Attributes:
• xmltag
• fieldname
• index
ignore Skips indexing of xmltag but indexes the content between the pair of
specified XML tags.
Attribute:
• xmltag
preserve Indexes specified xmltag as a zone if preceded by ignore
xmltag = "*".
Attribute:
• xmltag
suppress Suppresses every xmltag embedded within the specified xmltag.
Attribute:
• xmltag
The following command indexes the content between the start and end tags of the
specified xmltag as a field, which is given the same name as xmltag:
<field xmltag = "column_1"/>
The following command indexes the content between the start and end tags of the
specified xmltag as a field, which is given the name specified in the fieldname
attribute:
<field xmltag = "column_2" fieldname = "vdk_field_2"/>
The following command indexes the content between the start and end tags of the
specified xmltag as a field, overriding any existing value of the field:
<field xmltag = "column_2" index = "override"/>
Note
Both fieldname and index attributes can be used in a field command.
style.ufl file
If administrators have defined custom fields to be populated in the style.xml file,
the fields must also be defined in the style.ufl file or style.sfl file, using standard
syntax.
style.dft file
To create a collection that contains only XML documents, administrators can modify
the style.dft file to invoke the XML filter directly. In this case, the XML documents
do not need a .xml extension.
The style.dft must include the following lines:
$control: 1
dft:
{
field: DOC
filter="flt_xml"
}
Indexing XML Documents 143
Verity Spider
This chapter contains basic Verity Spider documentation, explaining how to index
documents on your Web site.
Contents
• Overview .................................................................................................................. 146
• Verity Spider Syntax ................................................................................................ 148
• Core Options............................................................................................................ 151
• Processing Options ................................................................................................. 153
• Networking Options................................................................................................ 159
• Paths and URLs Options ......................................................................................... 163
• Content Options...................................................................................................... 168
• Locale Options......................................................................................................... 176
• Logging Options ...................................................................................................... 178
• Maintenance Options ............................................................................................. 180
• Setting MIME Types ................................................................................................ 181
146 Chapter 8 Verity Spider
Overview
The Verity Spider enables you to index Web-based and file system documents
throughout the enterprise. Verity Spider works in conjunction with the Verity
KeyView document filtering technology so that more than two hundred of the most
popular application document formats can be indexed, including Office2000 and
WordPerfect, ASCII text, HTML, SGML, XML and PDF (Adobe Acrobat) documents.
Restart capability
When an indexing job fails, or for some reason the Verity Spider cannot index a
significant number or type of URLs, you can now restart the indexing job to update
the collection. Only those URLs which were not successfully indexed previously will
be processed.
Performance
With low memory requirements, flow control and the help of multithreading and
efficient Domain Name System (DNS) lookups, spidering performance is greatly
improved over previous versions.
Overview 147
Flow control
When indexing Web sites, Verity Spider distributes requests to Web servers in a
round-robin manner. This means one URL is fetched from each Web server in turn.
With flow control, it is possible that a faster Web site will finish before a slower one.
Regardless, the Verity Spider optimizes indexing every Web server.
Verity Spider V3.7 adjusts the number of connections per server depending on the
download bandwidth. When the download bandwidth from a Web server falls below
a certain value, Verity Spider will automatically scale back the number of
connections to that Web server. There will always be at least one connection to a Web
server. When the download bandwidth increases to an acceptable level, Verity Spider
reallocates connections (per the value of the -connections option, which is 4 by
default). You can turn off flow control with the -noflowctrl option.
Multithreading
Since version 3.1, the Verity Spider has separated the gathering and indexing jobs
into multiple threads for concurrence. Verity Spider V3.7 can create concurrent
connections to Web servers for fetching documents, and have concurrent indexing
threads for maximum utilization. This translates to an overall improvement in
throughput. In previous releases, work was done in a round-robin manner, so that at
any given time, only one job was running. Spider attends to the Web sites within an
indexing job in a round-robin manner.
Overview
Before you create an indexing task for a new collection, you should make copies of
the relevant default style files to ensure that you have a set of template style files in a
known, stable state.
Keep in mind that running multiple simultaneous Verity Spider jobs on the
Information Server host may cause performance problems for searches. This does
not mean you should never run indexing jobs when users may be searching, because
your collections are available for searching even while indexing jobs are running.
With an eye toward optimizing performance, you should try staggering your indexing
jobs to avoid overloading your server.
-start
A starting point for an indexing job. You can specify multiple instances, or use
multiple values in a single instance.
When you execute an indexing job from a command-line and you do not use a
command file (with -cmdfile), you must URL-escape any special characters in the
starting point. To URL-escape a special character, use
"%hex-ASCII-character-number" in place of the character. For example, you would
use /time%26/ instead of /time&/. This allows the operating system to properly
process the command string.
In the event an indexing task halts, you can re-run the task as-is. The persistent store
for the specified collection is read and only those candidate URLs that are in the
queue but not yet processed are parsed. Candidate URLs correspond to URLs of the
following status as reported by vsdb:
cand, used, inse, upda, dele, fail
.
Note
By using -start with -refresh, you provide a starting point for Verity Spider and
therefore do not need to use at least one of -host, -domain, -nofollow or
-unlimited
150 Chapter 8 Verity Spider
-refresh
Used for updating a collection, specifies that Verity Spider process only those
documents which qualify as follows:
• They are new documents in the repository, and they qualify for indexing under
the criteria.
• They exist in the collection and are recorded in the Verity Spider persistent store
with a status of done. If Verity Spider determines that these indexed documents
have been updated in the repository, then they are retrieved again to be reparsed
and reindexed. Note that the document VdkVgwKey values do not change.
• They are deleted in the collection. If Verity Spider determines that documents
have been deleted from the repository, then they are also deleted from the
persistent store and the collection. The exception to this rule is when you use
-nooptimize with -refresh. In this case, any document deleted from the
repository is marked for deletion in the collection. It will be removed from the
collection and the persistent store when the next indexing task is run for the
collection.
When you re-run an existing indexing job, Verity Spider will automatically refresh the
collection. If you add or remove any of the starting points, however, you must
manually specify -refresh in order to refresh existing documents.
Note
You can also use -start to provide a starting point for Verity Spider. If you do not use
-start, then you should use at least one of -host, -domain, or -nofollow. For further
control, also see -refreshtime. If you do not use any constraint criteria, Verity Spider
will operate without limits and will likely index far more than you intended.
Core Options 151
Core Options
-cmdfile
Specifies that Verity Spider reads command-line syntax from a file in addition to the
options passed in the command-line. This option includes the path name to the file
containing the command-line syntax. The -cmdfile option circumvents
command-line length limits.
The syntax for the command-file is:
option optional_parameters
For better readability, you should put each option and any parameters on a single
line. Verity Spider will be able to properly parse the lines.
Note
It is highly recommended you take advantage of the abstraction offered by this
option. User error in erroneously including or omitting options in subsequent
indexing jobs can be greatly reduced.
-collection
Syntax
-cmdfile path_and_filename
Specifies that Verity Spider reads command-line syntax from a file in addition to the
options passed in the command-line. This option includes the path name to the file
containing the command-line syntax. The -cmdfile option circumvents
command-line length limits.
The syntax for the command-file is:
option optional_parameters
For better readability, you should put each option and any parameters on a single
line. Verity Spider will be able to properly parse the lines.
Note
It is highly recommended you take advantage of the abstraction offered by this
option. User error in erroneously including or omitting options in subsequent
indexing jobs can be greatly reduced.
-help
Displays Verity Spider syntax options.
152 Chapter 8 Verity Spider
-jobpath
Syntax
-jobpath path
Specifies the location of the Verity Spider databases and the indexing job-related files
and directories.
The job-related directories and their contents are:
• log All Verity Spider log files. See -loglevel for descriptions of the log files.
• bif Bulk insert files.
• temp Web pages cached for indexing.
You can also specify the temp directory by using the -temp option.
• admin Files created by the Information Server Admin Tool.
These directories are created for you beneath the last directory specified in path.
You must make sure that path values are unique for all indexing jobs. If you do not
use -jobpath, Verity Spider will create a /spider/job directory within the collection.
For multiple-collection tasks, the first collection specified will be used.
Warning
You cannot use multiple job paths for multiple simultaneous indexing tasks for the
same collection. Only one indexing task at a time can run for a given collection.
-style
Syntax
-style path
Details Specifies the path to the style files to use when creating a new collection.
If -style is not specified, Verity Spider uses the default style files in verity/prdname/
common/style
Where verity/prdname is the user-definable portion of the installation directory.
Note
You can safely omit -style when resubmitting an indexing job as the style information
will already be part of the collection. If you are using -cmdfile, you can leave it there.
Processing Options 153
Processing Options
-abspath
Type: File system only
Generates absolute paths for files. Use this option when the document locations are
not going to change, but the collection might be moved around.
When you index a Web server’s contents through the file system, you should use
-prefixmap with -abspath to map the absolute filepaths to URLs.
-detectdupfile
Type: File system only
Details Enables checksum-based detection of duplicates when indexing file systems.
By default, a document checksum is not computed on indexed files. By using
-detectdupfile, a checksum is computed based on the CRC-32 algorithm. The
checksum combined with the document size is used to determine if the document is
a duplicate.
-indexers
Syntax: -indexers num_indexers
Specifies the maximum number of indexing threads to run on a collection.
The default value is 2. Note that increasing the value for -indexers requires additional
CPU and memory resources.
See also -maxindmem.
-license
Syntax: -license path_and_filename
Specifies the license file to use. By default, ind.lic is used, from:
verity/prdname/platform/admin/
Where verity/prdname is the user-definable portion of the installation directory,
and platform represents the platform directory.
-maxindmem
Syntax: -maxindmem kilobytes
Specifies the maximum amount of memory, in kilobytes, used by each indexing
thread. The number of threads is specified with -indexers.
154 Chapter 8 Verity Spider
By default, each indexing thread uses as much memory as is available from the
system.
-maxnumdoc
Syntax: -maxnumdoc num_docs
Specifies the maximum number of documents to be downloaded or submitted for
indexing. The value for num_docs does not necessarily correspond exactly to the
number of documents indexed. The following factors affect the actual number.
Whether or not the value of num_docs falls within a block of documents dictated by
-submitsize. If it does, the entire block of documents must be processed.
Whether or not documents retrieved are actually indexed because they are invalid or
corrupt.
-mimemap
Syntax: -mimemap path_and_filename
Specifies a control file (simple ASCII text) that maps file extensions to MIME-types.
This allows you to make custom associations and override defaults.
The format for the control file is:
#file_ext_no_dot mime-type
abc application/word
-nocache
Type: Web crawling only
Used with -noindex or -nosubmit, this option disables the caching of files during
Web site indexing. This has the effect of decreasing the demands on your disk space.
Normally, Verity Spider downloads URLs and then writes them to a bulk insert file
and downloads the documents themselves. When indexing occurs, once
-submitsize has been reached, the cached files are indexed and then deleted. If you
use -noindex, the bulk insert file is submitted but not processed by Verity Spider, and
so the documents are not deleted until indexing occurs takes over. This will usually
be mkvdk or collsvc, or you can subsequently use Verity Spider again with the
-processbif option.
By using -nocache in conjunction with -noindex or -nosubmit, you avoid storing
files locally at all. Files are downloaded only when indexing actually occurs.
See also -noindex.
-nodupdetect
Type: Web crawling only.
Disables checksum-based detection of duplicates when indexing Web sites.
URL-based duplicate detection is still performed.
Processing Options 155
-noindex
Specifies that the Verity Spider gathers document locations without indexing them.
The document locations are stored in a bulk insert file (BIF), which is then submitted
to the collection. This option is typically used in conjunction with a separate
indexing process, such as mkvdk or collection servicers (collsvc). The BIF will be
processed by the next indexing process run for the collection, whether it is the Verity
Spider, mkvdk or collection servicers (collsvc).
Do not try to start both the Verity Spider and another process at the same time. You
must allow Verity Spider enough time to generate enough work for the secondary
indexing process to act upon. If you are using mkvdk, you can run it in persistent
mode to ensure it will act upon work generated by Verity Spider.
Note
When you execute an indexing job for a collection and you use -noindex, the
persistent store for the collection is not updated.
-nosubmit
Specifies that the Verity Spider gathers document locations without indexing them.
The document locations are stored in a bulk insert file (BIF), which is not submitted
to the collection. This option is typically used in conjunction with a separate
indexing process, such as mkvdk or collection servicers (collsvc). You can also use
Verity Spider again with the -processbif option. Note that with an indexing process
other than Verity Spider, you must specify the name and path for the BIF because the
collection has no record of it.
-persist
Syntax: -persist num_seconds
Enables the Verity Spider to run in persistent mode, checking for updates every
num_seconds seconds until it is stopped.
While the Verity Spider is running in persistent mode, there is no optimization. Once
the Verity Spider is taken out of persistent mode, you will need to perform
optimization on the collection. For more information about using mkvdk Chapter 9,
“Managing Verity Collections with the mkvdk Utility” on page 185.
156 Chapter 8 Verity Spider
Note
You should not run more than one Verity Spider process in persistent mode. As the
Verity Spider is a resource intensive process, you should only run it in persistent
mode with an interval of less than one day. For time intervals greater than twelve
hours, you should use some form of scheduling. Some examples are cron jobs for
UNIX, and the AT command for Windows NT Server.
-preferred
Syntax: -preferred exp_1 [exp_n] ...
Type: Web crawling only
Specifies a list of hosts or domains which are to be preferred when retrieving
documents for viewing. You can use wildcard expressions, where the asterisk ( * ) is
for text strings and the question mark ( ? ) is for single characters. To use regular
expressions, also specify the -regexp option. Use this option when you leave
duplicate detection enabled and do not specify -nodupdetect.
When indexing, you may encounter a non-preferred host first. In that case,
documents are parsed and followed and stored as candidates. When duplicates are
encountered on another server, which is preferred, the duplicate documents from
the non-preferred server are skipped. When documents are requested for viewing,
they will be retrieved from the preferred server.
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
See Also -regexp
-prefixmap
Syntax: -prefixmap path_and_filename
Type: File system only
Specifies a control file (simple ASCII text) that maps file system paths to Web aliases.
In conjunction with -abspath, this option is typically used to create an URL field that
is the Web equivalent of a file system path. File system indexing is faster than Web
crawling over the network. If you use -prefixmap to replace the file system path with
the Web URL, relative hyperlinks in the HTML pages are kept intact when viewed
through Information Server.
The format for the control file is:
src_field src_prefix dest_field dest_prefix
If you use backslashes, you must double them so they are properly escaped. For
example:
C:\\test\\docs\\path
Processing Options 157
-processbif
Syntax: -processbif ’command_string !*’
Due to the use of special characters, which represent the bulk insert file (BIF), you
must run Verity Spider with a command file using the -cmdfile option.
Specifies a command string in which you can call a program or script which operates
on BIFs generated by Verity Spider.
For example, if you want to use a script called fix_bif to add customized
information to BIF files, use the following command:
vspider -cmdfile filename
Where filename is the text-only command file which contains the following (among
any other necessary options):
-processbif ’fix_bif !*’
Note that your command file will include other options as well.
-regexp
Specifies the use of regular expressions rather than the default wildcard expressions
for the following options: -exclude, -indexclude, -include, -indinclude,
-skip, -indskip, -preferred, and -nofollow.
Wildcard expressions allow the use of the asterisk ( * ) for text strings, and the
question mark ( ? ) for single characters.
Regular expressions allow for more powerful and flexible means for matching
alphanumeric strings. For example, to match "ab11" or "ab34" but not "abcd" or
"ab11cd," you could use the following regular expression:
^ab[0-9][0-9]$
The full extent to which regular expressions can be employed is beyond the scope of
this description. For more information on regular expressions, refer to a book
devoted to the subject.
158 Chapter 8 Verity Spider
-submitsize
Syntax: -submitsize num_documents
Specifies the number of documents submitted for indexing at one time. The default
value is 128. The upper limit is 64,000.
Note
Although larger values mean more efficient processing by the indexer, smaller values
will allow more parallelism on multi-CPU systems. Furthermore, in the event of a
halt during indexing, a smaller value means fewer documents will be lost.
-temp
Syntax: -temp path
Specifies the directory for temporary files (disk cache). By default, the temp directory
is contained within the job directory (optionally specified with the -jobpath option.
If you do not specify a value for this option, Verity Spider will create a /spider/temp
directory within the collection. For multiple-collection tasks, the first collection
specified will be used.
Note
Make sure the location you specify contains enough disk space to handle the
documents which are downloaded and held before indexing. The documents are
deleted from the harddisk after they are indexed.
See also -jobpath, for specifying the location of all indexing job directories and files,
one of which is the temp directory.
Networking Options 159
Networking Options
-agentname
Syntax: -agentname string
Type: Web crawling only.
Specifies the value for the agent name field that is part of the HTTP request. Since
Web servers can be configured to return different versions of the same page
depending on the requesting agent, you can use -agentname to impersonate a
browser client.
Use double-quotes if the name contains a space. Use -cmdfile if the agent name you
want to use contains forbidden characters such as slashes or backslashes.
-connections
Syntax: -connections num_connections
Details Specifies the maximum number of simultaneous socket connections to make
to Web sites for indexing. Each connection implies a separate thread.
The default value is 6.
Note
Verity Spider’s dynamic flow control makes the most use of all available connections
when indexing Web sites. If you are indexing multiple sites, you may want to increase
this number. Note that increasing the number of connections may not always help
because of such dependencies as your network connection and the capabilities of
the remote hosts.
-delay
Syntax: -delay num_milliseconds
Type: Web crawling only.
Details Specifies the minimum time between HTTP requests in milliseconds. The
default value is 0 milliseconds for no delay.
-header
Syntax: -header string
Type: Web crawling only
Specifies an HTTP header to be added to the spidering request. For example:
-header "Referer: http://www.verity.com/"
Verity Spider sends some predefined headers, such as Accept and User-Agent among
others, by default. Special headers are sometimes necessary to correctly index a site.
160 Chapter 8 Verity Spider
For example, previous versions of Verity Spider did not support the "Host" header,
which is needed for Virtual Host indexing. Also, a "Proxy-authentication" header was
needed to pass a username and password to a proxy server.
In Verity Spider V3.7, the "Host" header is supported by default, and the -proxyauth
option is available for proxy server authentication. Therefore the -header option is
maintained only for backwards compatibility and possible future enhancements.
Note
Misuse of this option will cause spider failure. In the event that this happens, re-run
the indexing task with modified -header values.
-hostcache
Syntax: -hostcache num_hostnames
Specifies the number of hostnames to cache to avoid DNS lookups. Without this
option, the host cache will continue to grow.
The default value is 256.
-noflowctrl
Type: Web crawling only.
Disables round-robin indexing of Web sites with network flow control.
By default, Verity Spider uses round-robin indexing of Web sites to avoid
overwhelming a Web server and to improve indexing performance. Verity Spider
connects to each Web server in a round-robin manner, using up to the value for
-connections. This means one URL is fetched from each Web server in turn.
Note
Using -noflowctrl may result in a significant drop in performance.
-noproxy
Syntax: -noproxy name_1 [name_n] ...
Type: Web crawling only.
Used in conjunction with -proxy, -noproxy specifies that the Verity Spider directly
access the hosts whose names match those specified. By default, when -proxy is
specified, the Verity Spider first tries to access every host with the proxy information.
To improve performance, use -noproxy for those hosts you know can be accessed
without a proxy host. For the name variable, you can use the asterisk ( * ) wildcard for
text strings. For example:
’*.verity.com’
You cannot use the question mark ( ? ) wildcard, and the -regexp option does not
allow you to use regular expressions.
Networking Options 161
On Windows NT, you should include double quotes around the argument to protect
the special character ( * ). On UNIX, you should use single quotes. Note that this is
only required when you run the indexing job from a command line. Quotes are not
necessary within a command file (-cmdfile).
Note
You must have valid Verity Spider licensing capability to use this option.
-proxy
Syntax: -proxy proxyhost:port
Type: Web crawling only.
Specifies host and port for proxy server.
Note You must have valid Verity Spider licensing capability to use this option.
See also -proxyauth for proxy servers that require authentication, and -noproxy for
hosts which you know are accessible without having to go through a proxy server.
-proxyauth
Syntax: -proxyauth login:password
Type: Web crawling only.
Specifies login information for proxy server connections that require authorization
to get outside the firewall. Used in conjunction with -proxy.
Note
You must have valid Verity Spider licensing capability to use this option. Information
Server V3.7 does not support retrieving documents for viewing through secure proxy
servers. Do not use -proxyauth for indexing documents which are to be viewed
through Information Server V3.7
-retry
Syntax: -retry num_retries
Type: Web crawling only.
Specifies the number of times the Verity Spider should attempt to access an URL. You
should use -retry when it is likely that an unstable network connection will give
false rejections.
The default value is 4.
-timeout
Syntax: -timeout num_seconds
Type: Web crawling only.
162 Chapter 8 Verity Spider
Specifies the time period, in seconds, that the Verity Spider should wait before timing
out on a network connection and on accessing data. The data access value is
automatically twice the value you specify for the network connection timeout.
The default value for the network connection timeout is 30 seconds, and therefore
the value for the data access timeout is 60 seconds.
Paths and URLs Options 163
Note
There must be a corresponding "Authfile=" entry in the Information Server
configuration file, inetsrch.ini, so that documents can be accessed for viewing.
Both -auth and Authfile= must point to the same file.
-cgiok
Type: Web crawling only.
Allows indexing of URLs containing the ? symbol. This typically means the URL leads
to a CGI or other such processing program.
The return document produced by the Web server is indexed and parsed for
document links which are followed and in turn indexed and parsed. However, if the
Web server does not return a page, perhaps because the URL is missing parameters
which are required for processing in order to produce a page, then nothing happens.
There is no page to index and parse.
Example
A URL without parameters is:
http://server.com/cgi-bin/program?
If you include parameters in the URL to be indexed, as specified with the -start
option, then those parameters are processed and any resulting pages are indexed
and parsed.
By default, URLs with ? symbols are skipped.
-domain
Syntax: -domain name_1 [name_n] ...
Type: Web crawling only.
Limits indexing to the specified domain(s). You must use only complete text strings
for domains. You may not use wildcard expressions. URLs not in the specified
domain(s) will not be downloaded or parsed.
You may list multiple domains by separating each one with a single space.
Note
You must have the appropriate Verity Spider licensing capability to use this option.
164 Chapter 8 Verity Spider
-followdup
Specifies that Verity Spider follows links within duplicate documents, although only
the first instance of any duplicate documents will be indexed.
You may find this option useful if you use the same home page on multiple sites. By
default, only the first instance of the document is indexed, while subsequent
instances are skipped. If you have different secondary documents on the different
sites, using -followdup will allow you to get to them for indexing, while still indexing
the common home page only once.
-followsymlink
Type: File system only.
Specifies that Verity Spider follows symbolic links when indexing UNIX file systems.
-host
Syntax: -host name_1 [name_n] ...
Type: Web crawling only.
Limits indexing to the specified host or hosts. You must use only complete text
strings for hosts. You may not use wildcard expressions.
You may list multiple hosts by separating each one with a single space. URLs not on
the specified host(s) will not be downloaded or parsed.
-https
Type: Web crawling only.
Allows the indexing of SSL-enabled Web sites.
Note
You must have the Verity SSL Option Pack installed to use -https. The Verity SSL
Option Pack is a Verity Spider add-on available separately from a Verity salesperson.
-jumps
Syntax: -jumps num_jumps
Type: Web crawling only.
Specifies the maximum number of levels deep an indexing job can go from the
starting URL. Specify a number between 0 and 254.
The default value is unlimited. If you see extremely large numbers of documents in a
collection where you do not expect them, you should consider experimenting with
this option, in conjunction with the Content options, to pare down your collection.
Paths and URLs Options 165
-nodocrobo
Specifies ROBOT META tag directives are to be ignored.
In HTML 3.0 and earlier, robot directives could only be given as the file robots.txt
under the root directory of a Web site. In HTML 4.0, every document can have robot
directives embedded in the META field. Use this option to ignore them. This option
should, of course, be used with discretion.
See Also -norobo and http://www.w3c.org/TR/REC-html40/html40.txt.
-nofollow
Syntax: -nofollow "exp"
Type: Web crawling only.
Specifies Verity Spider cannot follow any URLs which match the expression exp. If
you do not specify a exp value for -nofollow, then Verity Spider assumes a value of "*"
where no documents are followed.
You can use wildcard expressions, where the asterisk ( * ) is for text strings and the
question mark ( ? ) is for single characters. You should always encapsulate the exp
values in double quotes to ensure they are properly interpreted.
If you use backslashes, you must double them so they are properly escaped. For
example:
C:\\test\\docs\\path
To use regular expressions, also specify the -regexp option.
Previous versions of the Verity Spider did not allow the use of an expression. This
meant that for each starting point URL, only the first document would be indexed.
With the addition of the expression functionality, you can now selectively skip URLs
even within documents.
See also -regexp
-norobo
Type: Web crawling only.
Specifies that any robots.txt files encountered are ignored. The robots.txt file is
used on many Web sites to specify what parts of the site indexers should avoid. The
default is to honor any robots.txt files.
If you are re-indexing a site and robots.txt has changed, the Verity Spider will
delete documents that have been newly disallowed by robots.txt.
This option should, of course, be used with discretion and extreme care, especially in
conjunction with -cgiok.
See Also -nodocrobo and http://info.webcrawler.com/mak/projects/robots/
norobots.html.
166 Chapter 8 Verity Spider
-pathlen
Syntax: -pathlen num_pathsegments
Limits indexing to the specified number of path segments in the URL or file system
path. The path length is determined as follows:
The host name and drive letter are not included. For example, neither
www.spider.com:80/ nor C:\ would be included in determining the path length.
All elements following the host name are included.
The actual filename, if present, is included. For example, /world.html would be
included in determining the path length.
Any directory paths between the host and the actual filename are included.
Example
For the following URL, the path length would be 4:
http://www.spider:80/comics/fun/funny/world.html
<-1-> <2> <-3-> <---4--->
For the following file system path, the path length would be 3:
C:\files\docs\datasheets
<-1-> <-2-> <---3--->
The default value is 100 path segments.
-refreshtime
Syntax: -refreshtime timeunits
Specifies that any documents which have been indexed since the timeunits value
began are not to be refreshed.
The syntax for timeunits is:
n day n hour n min n sec
Where n is a positive integer. Note that there must be spaces, and since the first three
letters of each time unit is parsed, you can use the singular or plural form.
If you specify:
-refreshtime 1 day 6 hours
Only those documents which were last indexed at least 30 hours and 1 second ago,
will be refreshed.
Note
This option is valid only with the -refresh option. When you use vsdb -recreate, the
last indexed date is cleared.
Paths and URLs Options 167
-reparse
Type: Web crawling only.
Forces parsing of all HTML documents already in the collection. You must specify a
starting point with the -start option when you use -reparse.
You can use -reparse when you want to include paths and documents which were
previously skipped due to exclusion or inclusion criteria. Remember to change the
criteria, else there will be little for the Verity Spider to do. This can be easy to overlook
when you are using -cmdfile.
-unlimited
Specifies no limits to be placed on Verity Spider if neither -host nor -domain is
specified. The default is to limit based on the host of the first starting point listed.
-virtualhost
Syntax: -virtualhost name_1 [name_n] ...
Specifies that DNS lookups are avoided for the hosts listed. You must use only
complete text strings for hosts. You may not use wildcard expressions. This allows
you to index by alias, such as when multiple Web servers are running on the same
host. You can use regular expressions.
Normally, when Verity Spider resolves host names, it uses DNS lookups to convert
the names to canonical names, of which there can be only one per machine. This
allows for the detection of duplicate documents, to prevent results from being
diluted. In the case of multiple aliased hosts, however, duplication is not a barrier as
documents can be referred to by more than one alias, and yet remain distinct
because of the different alias names.
Example
You may have both marketing.verity.com and sales.verity.com running on the same
host. Each alias has a different document root, although document names such as
index.htm may occur for both. With -virtualhost, both server aliases can be
indexed as distinct sites. Without -virtualhost, they would both be resolved to the
same host name and only the first document encountered from any duplicate pair
would be indexed.
Warning! If you are using Netscape Enterprise Server, and you have specified only the
host name as a virtual host, then Verity Spider will not be able to index the virtual
host site. This is because the Verity Spider always adds the domain name to the
document key.
168 Chapter 8 Verity Spider
Content Options
-casesen
Details Makes processing case-sensitive by specifying that the spider process
separately keys that differ only in case. Use only for indexing UNIX servers.
-exclude
Syntax: -exclude exp_1 [exp_n] ...
Files, paths and URLs matching the specified expression(s) will not be followed. If
you use backslashes, you must double them so they are properly escaped. For
example:
C:\\test\\docs\\path
You can use wildcard expressions, where the asterisk ( * ) is for text strings and the
question mark ( ? ) is for single characters. For example:
’/my_doc*/year199?’
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
To use regular expressions, also specify the -regexp option.
To specify a file, path or URL which you want followed but not indexed, use
-indexclude. For document types, use -mimeexclude instead. For example, specify
-mimeexclude application/pdf rather than -exclude *.pdf.
Note
When specifying an URL, you must use full, absolute paths using the same format as
appears in the HTML hyperlink. If the link is relative, you must change it to absolute
to use it with -exclude.
-include
Only those files, paths and URLs which match the specified expression or
expressions will be followed. If you use backslashes, you must double them so they
are properly escaped. For example:
C:\\test\\docs\\path
You can use wildcard expressions, where the asterisk ( * ) is for text strings and the
question mark ( ? ) is for single characters. For example:
’/my_doc*/year199?’
Content Options 169
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
To use regular expressions, also specify the -regexp option.
Keep in mind that if your starting points do not contain the specified -include
expressions, nothing will be indexed. The -include option prevents Verity Spider
from even following anything which does not match the specified expressions. You
may want to use -indinclude instead. Where -include prevents Verity Spider from
even following anything which does not match the specified expressions,
-indinclude allows Verity Spider to follow what matches the specified expressions,
while not indexing.
For document types, use -mimeinclude instead. For example, specify -mimeinclude
text/html rather than -include *.htm.
Note
When specifying an URL, you must use full, absolute paths using the same format as
appears in the HTML hyperlink. If the link is relative, you must change it to absolute
to use it with -include.
-indexclude
Syntax: -indexclude exp_1 [exp_n] ...
Specifies that the files and paths in URLs which match the expressions are not
indexed. They are, however, still followed. If you use backslashes, you must double
them so they are properly escaped. For example:
C:\\test\\docs\\path
You can use wildcard expressions, where the asterisk ( * ) is for text strings and the
question mark ( ? ) is for single characters. For example:
’/my_doc*/year199?’
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
To use regular expressions, also specify the -regexp option.
You would use this option to gather some documents, such as HTML tables of
contents, to gain access to other documents for indexing.
Where the -exclude option prevents Verity Spider from even following anything
which matches the specified expressions, -indexclude allows Verity Spider to follow
anything while only skipping that which matches the specified expressions.
For document types, use -indmimeexclude instead.
170 Chapter 8 Verity Spider
Note
When specifying an URL, you must use full, absolute paths using the same format as
appears in the HTML hyperlink. If the link is relative, you must change it to absolute
to use it with -indexclude.
-indinclude
Syntax: -indinclude exp_1 [exp_n] ...
Specifies that only those files and paths in URLs which match the expressions be
followed and indexed. If you use backslashes, you must double them so they are
properly escaped. For example:
C:\\test\\docs\\path
You can use wildcard expressions, where the asterisk ( * ) is for text strings and the
question mark ( ? ) is for single characters. For example:
’/my_doc*/year199?’
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
To use regular expressions, also specify the -regexp option.
Where the -include option prevents Verity Spider from even following anything
which does not match the specified expressions, -indinclude allows Verity Spider to
follow anything while only indexing that which matches the specified expressions.
Example
If you want to index all documents that include "search" in the URL at http://
web.verity.com, you cannot use:
vspider -collection collname -start http://web.verity.com
-include ’*search*’
This is because the starting point does not match the -include criteria. Instead, use
-indinclude to follow all documents (unless, of course, you have specified any of the
exclude options) and index only those documents that match your criteria. Simply
replace -include with -indinclude in the above example.
Note
When specifying an URL, you must use full, absolute paths using the same format as
appears in the HTML hyperlink. If the link is relative, you must change it to absolute
to use it with -indinclude.
-indmimeexclude
Syntax: -indmimeexclude mime_1 [mime_n] ...
Specifies that only those MIME types which match the expressions be followed but
not indexed.
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
Use this option to gather some documents, such as HTML tables of contents, to gain
access to other documents for indexing. The -mimeexclude option, on the other
hand, prevents specified documents from being followed at all. For the mime
variable, you can include the asterisk ( * ) wildcard for text strings. For example:
’text/*’
You cannot use the question mark ( ? ) wildcard, and the -regexp option does not
allow you to use regular expressions.
-indmimeinclude
Syntax: -indmimeinclude mime_1 [mime_n] ...
Specifies that only those MIME types which match the expressions be followed and
indexed.
The -mimeinclude option would not allow you to index desired documents if the
starting URL is not followed. For the mime variable, you can include the asterisk ( * )
wildcard for text strings. For example:
’text/*’
On Windows NT, you should include double quotes around the argument to protect
the special character (*). On UNIX, you should use single quotes. Note that this is
only required when you run the indexing job from a command line. Quotes are not
necessary within a command file (-cmdfile).
You cannot use the question mark ( ? ) wildcard, and the -regexp option does not
allow you to use regular expressions.
Example
If you want to index all Word documents at http://web.verity.com, you cannot use:
vspider -collection collname -style style_dir -start
http://web.verity.com -mimeinclude ’application/msword’
This is because the starting point does not match the -mimeinclude criteria. Now,
you can use -indmimeinclude to follow all documents (unless, of course, you have
specified any of the exclude options) and index only those documents that match
your criteria. Simply replace -mimeinclude with -indmimeinclude in the above
example.
172 Chapter 8 Verity Spider
-indskip
Syntax: -indskip HTML_tag "exp"
Type: Web crawling only.
Specifies Verity Spider is follow and parse links, but not index, any HTML document
which contains the text of exp within the given HTML_tag. For multiple HTML_tag
and exp combinations, use multiple instances of the -skip option.
You can use wildcard expressions, where the asterisk ( * ) is for text strings and the
question mark ( ? ) is for single characters. For example:
’/my_doc*/year199?’
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
If you use backslashes, you must double them so they are properly escaped. For
example:
C:\\test\\docs\\path
To use regular expressions, also specify the -regexp option.
Example
To skip all HTML documents which contain the word "personnel" in the Title
element, while still parsing those documents for links to other documents, use the
following:
-indskip title "personnel"
Example
To avoid indexing directory listing pages, while still parsing the document and path
links except for link up to the parent directory, use one of the following depending on
the Web server being indexed:
For Netscape Web servers, use the following:
-indskip title "*Index of*"
-nofollow "*parent directory*"
For Microsoft Internet Information Server, use the following:
-indskip a "*to parent directory*"
-nofollow "*parent directory*"
-maxdocsize
Syntax: -maxdocsize integer
Specifies the maximum size, in kilobytes, for documents to be indexed. Any
documents larger than the value specified by maxdocsize will be ignored.
The default is to index documents of any sizes.
Content Options 173
-metafile
Syntax: -metafile path_and_filename
Type: Web crawling only.
Allows you to use a text file to map custom meta tags to valid HTTP header fields. If
you use backslashes, you must double them so they are properly escaped. For
example: C:\\test\\docs\\path.
This means you are able to use your own meta tag, in the document, to replace what
is returned by the Web server, or to insert it if nothing is returned. Currently, the only
header fields of real value are "Last-Modified" and "Content-Length." Note, however,
that future enhancements could allow for much greater variety.
The syntax for entries in the text file is:
name Last-Modified y|n
or
name Content-Length y|n
Where y|n is an override flag which can be either yes or no.
Example
A mapping file for -metafile might include:
Doc_Last_Touched Last-Modified n
Doc_Size Content-Length y
If you use the y override flag, the value for the custom meta tag overrides the value for
the valid field, even if both values are present and differ. This can be useful when the
valid field value is always sent, but you want to specify your own value with a custom
meta tag.
If you use the n override flag, then the value for the custom meta tag will be used only
if there is no value for the valid field returned by the server. If a value for the valid
field exists, then that is given precedence.
Warning! If you have several entries mapping to the same valid field, only the last
entry will take effect.
-mimeexclude
Syntax: -mimeexclude mime_1 [mime_n] ...
Specifies MIME types which are neither followed nor indexed.
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
The default is to include all MIME types. For the mime variable, you can include the
asterisk ( * ) wildcard for text strings. For example:
’text/*’
174 Chapter 8 Verity Spider
You cannot use the question mark ( ? ) wildcard, and the -regexp option does not
allow you to use regular expressions.
Use -indmimeexclude to allow the Verity Spider to follow documents, without
indexing them, to gain access to other desirable document types.
-mimeinclude
Syntax: -mimeinclude mime_1 [mime_n] ...
Specifies MIME types to be included.
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
The default is to include all MIME types. For the mime variable, you can include the
asterisk ( * ) wildcard for text strings. For example:
’text/*’
You cannot use the question mark ( ? ) wildcard, and the -regexp option does not
allow you to use regular expressions.
-mindocsize
Syntax: -mindocsize integer
Specifies the minimum size, in kilobytes, for documents to be indexed. Any
documents smaller than the value specified by mindocsize will be ignored.
The default is to index documents of any sizes.
-skip
Syntax: -skip HTML_tag "exp"
Type: Web crawling only
Specifies Verity Spider is to not index any HTML document which contains the text of
exp within the given HTML_tag. For multiple HTML_tag and exp combinations, use
multiple instances of the -skip option.
You can use wildcard expressions, where the asterisk ( * ) is for text strings and the
question mark ( ? ) is for single characters. For example:
’/my_doc*/year199?’
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
Content Options 175
If you use backslashes, you must double them so they are properly escaped. For
example:
C:\\test\\docs\\path
To use regular expressions, also specify the -regexp option.
Example 1
To skip all HTML documents which contain the word "personnel" in the Title
element, use the following:
-skip title "personnel"
Example 2
To skip all HTML documents which contain both the word "private" and the phrase
"internal user" in any paragraph element, use the following:
-skip title "personnel"
-skip p "*internal use*"
See also -regexp.
176 Chapter 8 Verity Spider
Locale Options
-charmap
Syntax: -charmap name
Specifies the character map to use. Valid values are 8859 or 850. The default value is
8859.
-common
Specifies path to the Verity home directory, verity/prdname/common, where verity/
prdname is the user-definable portion of the installation directory.
Note
This option is typically not needed, as long as the PATH environment variable is set
correctly.
-datefmt
Syntax: -datefmt format
Specifies the Verity import date format to use. Valid values are MDY, DMY, YMD, USA
and EUR. The default value is MDY.
-language
Syntax: -language name
Specifies the Verity locale to use in indexing. This option is being replaced by the
semantically consistent -locale, and is still supported for backwards compatibility.
-locale
Syntax: -locale name
Specifies the Verity locale to use in indexing, such as German (deutsch) or French
(francais). The default is English (english). This option is identical to -language.
-msgdb
Syntax: -msgdb path
Specifies the path to the ind.msg message database file.
If the Verity Spider was installed properly, this option should be unnecessary. By
default, the ind.msg message database is read from:
verity/prdname/platform/admin
Locale Options 177
Logging Options
-loglevel
Syntax: -loglevel [nostdout] argument
Specifies the types of messages to log. By default, messages are written to standard
output and to various log files in the subdirectory named /log beneath the Verity
Spider job directory. If you add nostdout to the loglevel argument, messages will not
be written to standard output. Log files, however, will still be created.
Valid message types are described in the following table:
Choose one of the following arguments to determine which message types are
logged.
Maintenance Options
-nooptimize
Prevents the Verity Spider from optimizing the collection, thus reducing processing
overhead during the indexing job. Use this option sparingly, as it leaves the collection
in less than optimum shape. Some examples of when you might want to use this
option are:
• You want to manually perform custom optimization of the collection, using
mkvdk. By default the Verity Spider optimization mimics the mkvdk actions of
maxmerge and vdbopt. For more information on mkvdk, see the Verity Collection
Building Guide.
• You are running multiple indexing jobs against a collection, and want to wait
until they are all finished to optimize.
Generally, you should not leave a collection unoptimized for too long, as search
times can slow significantly.
In brief, optimizing a collection means creating a small number of large partitions,
which can greatly reduce search times.
-purge
Deletes document tables and index files in the collection, and cleans up the
collection’s persistent store. The collection is then "fresh" with its original style files,
and is not deleted from the file system.
-repair
Specifies a failure-recovery mode for the collection, where the goal is to determine
the causes of any errors, repair the errors (if possible), and bring a collection back up.
Although the Verity indexing engine always leaves the collection in a consistent,
usable state, and no data can be lost or corrupted due to machine failures, it is
possible for a process or event external to the Verity engine to corrupt one or more
collections.
You can use -repair for constant failure-recovery operation, or you can run it
selectively on collections that are "down."
Setting MIME Types 181
Syntax restrictions
When you specify MIME type criteria, keep in mind the following restrictions.
When you encounter MIME Types being dropped, make sure the Web server you are
indexing has the necessary MIME Type information. See the documentation for your
Web server for information about specifying MIME Types.
You can examine the indexing job’s log files for indications that files are being
skipped due to MIME Types. For example, a typical ASCII file you might want
indexed is a log file (filename.log). Unless the Web server understands that files with
.LOG extensions are ASCII text, of MIME Type text/plain, you will see in the indexing
job log file that .LOG files are skipped because of MIME Type even if you use:
-mimeinclude ’text/*’
When you encounter MIME Types being dropped, check if the Verity Spider
recognizes that particular MIME Type. See the table, “Known MIME types for file
system indexing” on page 183 for more details.
You can examine the indexing job’s log files for indications that files are being
skipped due to MIME types. For example, a typical ASCII file you might want indexed
is a log file (filename.log). Since the Verity Spider does not understand that files
with .LOG extensions are ASCII text, of MIME Type text/plain, you will see in the
indexing job log file that .LOG files are skipped because of MIME Type even if you
use:
-mimeinclude ’text/*’.Setting MIME Types
Furthermore, you should also use inclusion and exclusion criteria to finely control
what is indexed.
• If your list of file types to index is rather long, use one of the exclusion criteria:
(-exclude, -indexclude, -mimeexclude, or -indmimeexclude) to exclude
extensions you know you do not want to index. For example:
-exclude ’*.exe’ ’*.com’
• If the list of file types you want to index is relatively small, use one of the inclusion
criteria (-include, -indinclude, -mimeinclude, or -indmimeinclude) to specify
them. For example:
-include ’*.txt’ ’*.1st’ ’*.log’.Setting MIME Types
mkvdk is a command-line utility installed with ColdFusion that you can use to
perform maintenance operations on Verity collections, which are the primary data
type for building searching/indexing functionality into your ColdFusion application
pages.
Contents
• Overview of the Verity mkvdk Utility ..................................................................... 186
• Getting Started with the Verity mkvdk Utility ....................................................... 187
• Bulk Submit Options............................................................................................... 194
• Collection Maintenance Options........................................................................... 195
186 Chapter 9 Managing Verity Collections with the mkvdk Utility
mkvdk syntax
The following is the basic syntax of the command:
mkvdk -collection path [option] [dockey]
Multiple options and dockeys can be included, as needed. If dockey is a list of files, it
should consist of an at-sign (@) followed by the filename that contains a simple list of
files, as in @filelist. The options for mkvdk are described in .
The following operations occur when you use mkvdk to create a new collection:
1 New collection directories are created and the specified style files are copied to
the style subdirectory.
2 The style file settings are read and the required information is passed to the Verity
search engine.
3 The gateway is used to open the document files, which are parsed according to
the settings in various style files.
4 A new partition is created, which includes an index and an attribute table.
5 Assist data is generated, which may include a spanning word list.
When problems occur during an operation, mkvdk writes error messages to the
system log file (sysinfo.log). You can direct error and other messages to the console
by using mkvdk with the -outlevel option. You can direct messages to a file of your
choice by using the -loglevel and -logfile options.
The format of the log file is shown below:
You can use the log file to view details about what happens during the collection
building process. Use the mkvdk -loglevel command and specify the numeric
identifier for the message level you want, as summarized in the following table:
Type Number
Fatal 1
Error 2
Warning 4
Getting Started with the Verity mkvdk Utility 187
Type Number
Status 8
Info 16
Verbose 32
Debug 64
To calculate the numeric parameter, add up the numbers for the message types you
want to include. The default for both -outlevel and -loglevel is 15, which selects
fatal, error, warning, and status messages (1+2+4+8).
Alternatively, you can set up a collection and insert documents in one mkvdk
command, using this syntax:
mkvdk -create -collection collectionname -bulk -insert filespec
Note
The -create option can be used only once to create the collection directory
structure. After a collection directory structure has been created, do not to use the
-create option to update the collection.
Option Description
-create This option creates a collection in the specified -collection directory. It
creates the directory structure, determines the index contents and sets
up the documents table schema according to the style files used. If the
specified collection already exists, mkvdk exits rather than overwriting
the existing collection.
-style dir This option specifies the style directory that contains the style files to
use in creating a collection. This option can only be used with the
-create option. If you do not specify this option when you use mkvdk to
create a collection, mkvdk uses the style files in the common/style
directory.
-description desc This option sets the collection’s description. Enter any alphanumeric text
you like, such as “This collection contains electronic mail from ABC
Company.” Include the quotation marks.
-words This option builds the word list for all partitions in the collection.
Option Description
-collection path This option specifies the path of the collection to create or open. This is
required to execute mkvdk.
-nolock This option turns off file locking. Locking is on by default.
-synch This option performs work immediately. If this option is not used, indexing
work is done in the background, as time permits.
-about This option shows information about the collection, such as its description and
the date when it was last modified.
-datapath path This option specifies the datapath to use to find documents being added to the
specified collection. All relative document paths will be relative to this setting.
If you do not set this option, mkvdk looks for documents next to the collection
directory.
-topicset path This option creates a topic index for the collection based on the specified topic
set and stores it in the collection directory. This facilitates quick and efficient
searches over the collection data when using topics.
-mode mode This option sets the indexing mode. Values are case insensitive. Valid settings
are:
• Generic
• FastSearch
• NewsfeedIdx
• NewsfeedOpt
• BulkLoad
• ReadOnly
• Any custom mode defined in the style.plc file. The default is Generic
mode.
-common This option specifies the path of the Verity common directory. If you do not use
this option, the Verity engine looks for the common directory in the directory
containing the mkvdk executable, and then along the executable search path.
The executable search path is determined by your operating system
environment settings. It is the path used by the OS to find the programs you
run.
-help This option displays mkvdk syntax options.
-debug This option runs mkvdk in debugging mode.
190 Chapter 9 Managing Verity Collections with the mkvdk Utility
Option Description
-nooptimize This option prevents optimization by this instance of mkvdk. Using this option
turns off the service level VdkServiceType_Optimize. The service types
determine what type of work the Verity engine and its self-administration
features will execute on a collection.
-nohousekeep This option prevents housekeeping by this instance of mkvdk. Housekeeping
includes deleting files that are no longer needed. Using this option turns off the
service level VdkServiceType_DBA. (Service types are described under
nooptimize.)
-noindex This option prevents indexing by this instance of mkvdk. Documents will not
be inserted or deleted. Using this option turns off the service level
VdkServiceType_Index. (Service types are described under nooptimize.)
-charmap name The name of the character set that you would like all strings mapped to for
your application. You should set this to name a character set that your system
can display properly. Using the search engine with the English locale, the
character set that any version of Windows displays is 8859, the character set
that a Macintosh computer would display is mac or mac1. Note that this is
NOT the name of the character set of documents being indexed, it is only the
name of the character set that your display can handle properly. (The
character set of the document is set in the style.dft file using the /charmap
option, which is described in Chapter 9.)
Valid options are 850, 8859, mac. The default is no mapping.
-locale name The name of the Verity locale to be used by mkvdk. The locale name must
correspond to the name of an existing locale directory which must exist in
install_dir/common/locale. Valid options are english, deutsch, and francais.
The default is english.
-datefmt format This option is used to convert a date field value into Verity’s internal data
representation, and can be used in conjunction with the mkvdk options
-extract (for the field extraction feature) and -bulk (for the bulk submit feature).
The named format string identifies to the date parsing routines as to what
order dates are written in when the date string only consists of a sequence of
numbers (for example, 03/03/96). Valid options are described in “Date format
options” on page 191. The default is MDY.
-servlev level Service level. The specifier, level, is a string consisting of keywords separated
by hyphens, such as search-index-optimize. Valid keywords are described in
“Date format options” on page 191.
Servicing only
Getting Started with the Verity mkvdk Utility 191
The following command performs servicing only. Use this command if you only want
to index submitted documents and service the collection.
mkvdk -collection path
Keyword Description
search Enable search and retrieval
insert Enable adding and updating documents
192 Chapter 9 Managing Verity Collections with the mkvdk Utility
Keyword Description
optimize Enable opportunistic collection optimization
assist Enable building of word list
housekeep Enable housekeeping of unneeded files
delete Enable document deletion (see Chapter 3)
backup Enable backup
purge Enable background purging
repair Enable collection repair
dataprep Same as search-index-optimize-assist-housekeep
index Same as insert-delete
Messaging options
mkvdk provides a variety of messaging options, described in the following table:
Option Description
-quiet This option displays only fatal and error messages to the console. It overrides
the -outlevel setting. For a list of message types, refer to “Message Types.”
-outlevel (num) This option indicates which message types to display to the console. Valid
values are determined by adding numbers together that correspond to the
desired message types. The default value is 15. For more information, refer to
“Message Types.”
-logfile file name This option saves messages in the specified file.
-loglevel (num) This option indicates which message types to route to the optional log file.
Valid values are determined by adding numbers together that correspond to
the desired message types. The default value is 15. For more information,
refer to “Message Types.”
Message types
Message types and their corresponding numbers are listed in the table below. To set
the -outlevel or -loglevel option, add up the numbers for the message types you want
to include. For example, to tell mkvdk to display all messages except debug messages,
set -outlevel to 1+2+4+8+16+32=63. The default for both -outlevel and -loglevel is 15,
which selects fatal, error, warning, and status messages (15=1+2+4+8).
Type Number
Fatal 1
Error 2
Warning 4
Status 8
Getting Started with the Verity mkvdk Utility 193
Type Number
Info 16
Verbose 32
Debug 64
Option Description
-extract This option extracts field values from documents, using the field extraction rules
specified in the style.tde file. For more information, refer to Chapter 9.
-insert This option adds documents to the collection. This is the default option for mkvdk.
-update This option adds documents to the collection by replacing all previous information
about the specified documents.
-delete This option marks the specified documents as deleted and makes them unavailable
for searches. To actually remove deleted documents from the collection’s internal
documents table and word indexes, use the squeeze keyword.
-nosave Specifies that a work list, which is generated by mkvdk automati-cally when the
-extract option is used, will not be saved in the collection directory in a file called
worklist (in the Verity bulk submit file format). By default, mkvdk saves the worklist
in the worklist file.
-nosubmit Specifies that a work list, which is generated by mkvdk automatically when the
-extract option is used, will not be submitted to the indexing engine and will be
saved in the collection directory in a file called worklist (in the Verity bulk submit file
format). This option allows mkvdk to process field extraction separately from other
indexing tasks..Collection Building Tool (mkvdk)
194 Chapter 9 Managing Verity Collections with the mkvdk Utility
Option Description
-bulk This option tells mkvdk to interpret filespec as a bulk submit file. The option
can be used with -insert, -update, and -delete.
-offset num This option specifies the offset into a bulk submit file or files. Note that if you
specify multiple bulk submit files and use the -offset option, the offset is
applied to all of the bulk submit files.
-numdocs num This option specifies the number of documents to insert or delete from the
bulk insert file or files. Note that if you specify multiple bulk insert or delete
files and use the -numdocs option, the -numdocs setting is applied to all of
the bulk insert or delete files.
-autodel This option deletes the bulk submit file or files when the bulk submission
work is finished.
Option Description
-backup dir This option backs up the collection into the specified directory. Note that the
backup will not include the tde subdirectory. The tde subdirectory is created by
and for Topic Document Entry if Topic Document Entry is used to create or
maintain the collection.
-repair This option repairs the collection, performed by an API call.
-purge This option waits the amount specified by the purgewait option and then
deletes all documents in the collection, but not the collection itself; it leaves
the collection directory structure intact.
To specify a different wait period, use the -purgewait option instead of -purge.
If you do not use purgewait, the default is 600 seconds.
-purgeback This option, used with the -purge option, performs a purge in the background.
-purgewait sec This option specifies to the -purge option how many seconds to wait. If you do
not specify sec, the default is 600..Collection Building Tool (mkvdk)
-noservice This option prevents collection servicing (servicing includes indexing) by this
instance of mkvdk, performed by an API call.
-persist This option services the collection repeatedly, at default intervals of 30
seconds. Use the -sleeptime option to set a different interval.
-sleeptime sec This option specifies the interval between service calls when mkvdk is run
with the -persist option.
-optimize spec This option performs various optimizations on the collection, depending on the
value of spec. The specifier, spec, is a string consisting of keywords separated
by hyphens, such as maxmerge-squeeze-readonly. Valid keywords are:
described under “Optimization Keywords.”
-noexit Windows only. This option causes the I/O window to remain after the program
is finished. By default, the window closes and the program exits so that scripts
calling mkvdk will not hang.
Backing up a collection
The following command backs up a collection to the specified directory.
mkvdk -backup path_1 -collection path_2
196 Chapter 9 Managing Verity Collections with the mkvdk Utility
Deleting a collection
To delete a collection, use the appropriate command for your operating system. For
example, to remove the collection directory structure and control files on a UNIX
system, use the following command.
rm -r -collection_path
Purging a collection
The following command deletes all documents from a collection, but does not delete
the collection itself.
mkvdk -purge -collection path
Persistent service
The following command runs mkvdk as a persistent process, so that servicing is
performed repeatedly after num idle seconds.
mkvdk -persist -sleeptime num -collection path
Deleting a Collection
Note that -purge deletes all documents in a collection, but does not delete the
collection itself. To delete a collection, use operating system commands such as the
rm command on UNIX to remove the collection directory structure and control files.
Optimization Keywords
Optimization keywords for the -optimize option are described below.
Keyword Description
maxclean This keyword performs the most comprehensive housekeeping possible, and
removes out-of-date collection files. This optimization is recommended only
when you are preparing an isolated collection for publication. Note that when
using this type, if the collection is being searched, sometimes files get deleted
too early and this affects search results.
maxmerge This keyword performs maximal merging on the partitions to create partitions that
are as large as possible. This creates partitions that can have up to 64000
documents in them.
readonly This keyword makes the collection read only. When used, mkvdk marks the
collection as read-only and unchanging after the function call is done. This is
appropriate for CD-ROM collections.
Collection Maintenance Options 197
Keyword Description
spanword This keyword creates a spanning word list across all the collection’s partitions. A
collection consists of numerous smaller units called partitions each of which
includes a word list. Optionally, a spanning word list can be built with an ngram
index.
ngramindex This keyword builds an ngram index for the collection. An ngram index is
designed to improve the search performance for queries with the <TYPO> and/or
<WILDCARD> operators. An ngram index can not be built without a spanning
word list. You can build a spanning word list and ngram index in the same
command, for example:
mkvdk -collection collname -optimize spanword-ngramindex
squeeze This keyword squeezes deleted documents from the collection. Squeezing
deleted documents recovers space in a collection, and improves search
performance. Using this option invalidates the search results.
vdbopt Each collection consists of smaller units called Verity databases (VDBs). The
vdbopt keyword configures the collection’s VDBs. This keyword has the effect of
linearizing the data in a VDB, and making the collection metadata contained in
the VDB more streamlined. It also allows the VDB to grow to a much larger size.
tuneup This keyword is a convenience keyword that includes maxmerge, vdbopt, and
spanword.
publish This keyword is a convenience keyword that includes all of the optimization
types. Use this keyword to optimize the collection for the best possible retrieval
performance, such as for publication to a network on a server or on a CD-ROM.
Option Description
-maxfiles num This option sets the maximum number of files that mkvdk can have open
at once. The default is 50.
-diskcache num This option sets the size of the mkvdk disk cache in kbytes. The default is
128.
Chapter 10
Verity Troubleshooting
Utilities
This chapter provides information about using a variety of Verity utilities for
troubleshooting Verity collections.
Contents
• Overview of Verity Utilities ..................................................................................... 200
• Using the Verity rcvdk Utility.................................................................................. 201
• Attaching to a Collection Using rcvdk ................................................................... 202
• Viewing Results of the rcvdk Utility ....................................................................... 203
• Using the Verity didump Utility ............................................................................. 206
• Using the Verity browse Utility............................................................................... 209
• Using the Verity merge Utility ................................................................................ 211
• Verity VDK Error Messages ..................................................................................... 213
200 Chapter 10 Verity Troubleshooting Utilities
Starting rcvdk
To start rcvdk on most systems, type the path and executable name at a command
prompt. The examples shown below assume you have set your PATH variable set, so
you just need to enter rcvdk at a command prompt to run it.
For example:
c:\cfusion\bin\rcvdk /common = c:\cfusionf\verity\common
When you start rcvdk with no arguments, you get the message below followed by the
rcvdk prompt.
Type ‘help’ for a list of commands.
RC>
The help command produces the following list of available commands:
RC> help
Available commands:
search s Search documents.
results r Display search results.
clusters c Display clustered search results.
view v View document.
summarize z Summarize documents.
attach a Attach to one or more collections.
detach d Detach from one or more collections.
quit q Leave application.
about Display VDK ‘About’ info
help ? Display help text; ‘help help’ for details.
expert x Toggle expert mode on/off.
RC>
At any time, you can enter “q” at the RC> prompt to quit the application.
202 Chapter 10 Verity Troubleshooting Utilities
Basic searching
To retrieve all documents, use the s command without arguments. After you press
return, a search update message is produced, as shown below.
RC>s
Search update: finished (100%). Retrieved: 85(85)/85.
RC>
The search results indicate that 85 of the total 85 documents in the collection were
retrieved. If you specify a query argument, such as “universal filter”, a subset of the
total documents in the collection, which contain the specified string, will be
retrieved.
RC>s universal filter
Search update: finished (100%). Retrieved: 18(18)/85.
RC>
In the messsage returned for the search above, rcvdk indicates that 18 documents
matched the query. More elaborate queries using the Verity query language can be
performed, as shown in this example:
RC>s universal filter <OR> filter.Troubleshooting and Maintenance Tools
Viewing Results of the rcvdk Utility 203
Option Description
r Displays the results list, starting with the first document. A maximum of 24
documents will be displayed.
r n Displays the results list, starting with the nth document. A maximum of 24
documents will be displayed.
v Displays the first or next document in the results list. Highlights are
indicated using reverse video, if possible. If not, double angle brackets are
used, as in:
>>universal<< >>filter<<
To exit the document display, enter “q”.
v n Displays the nth document in the results list. To exit the document display,
enter “q”.
The results list for the “universal filter” search is shown below. For each document,
these fields are displayed by default: Number, Score, and VdkVgwKey.
RC> r
Retrieved: 18(18)/85
Number SCORE VdkVgwKey
1: 1.00 d:\search97\s97is\locale\english\doc\collbldg\08_cbg3.htm
2: 0.97 d:\search97\s97is\locale\english\doc\collbldg\11_cbg2.htm
3: 0.97 d:\search97\s97is\locale\english\doc\collbldg\08_cbg7.htm
4: 0.97 d:\search97\s97is\locale\english\doc\collbldg\08_cbg1.htm
5: 0.95 d:\search97\s97is\locale\english\doc\collbldg\cbgtoc.htm
6: 0.95 d:\search97\s97is\locale\english\doc\collbldg\08_cbg4.htm
7: 0.93 d:\search97\s97is\locale\english\doc\collbldg\cbgix.htm
8: 0.92 d:\search97\s97is\locale\english\doc\collbldg\08_cbg6.htm
9: 0.90 d:\search97\s97is\locale\english\doc\collbldg\08_cbg.htm
10: 0.90 d:\search97\s97is\locale\english\doc\collbldg\04_cbg1.htm
11: 0.90 d:\search97\s97is\locale\english\doc\collbldg\01_cbg1.htm
12: 0.87 d:\search97\s97is\locale\english\doc\collbldg\f_cbg.htm
13: 0.87 d:\search97\s97is\locale\english\doc\collbldg\08_cbg2.htm
14: 0.84 d:\search97\s97is\locale\english\doc\collbldg\06_cbg1.htm
15: 0.80 d:\search97\s97is\locale\english\doc\collbldg\part4.htm
16: 0.80 d:\search97\s97is\locale\english\doc\collbldg\f_cbg1.htm
17: 0.80 d:\search97\s97is\locale\english\doc\collbldg\11_cbg5.htm
18: 0.80 d:\search97\s97is\locale\english\doc\collbldg\08_cbg5.htm
RC>
204 Chapter 10 Verity Troubleshooting Utilities
All fields in a column will be blank if the field is not defined for the collection’s
schema in the documents table (in style.ddd, style.sfl, or style.ufl). A field in a
document’s row will be blank if the field was not populated by a gateway, bulk submit
action, or filter.
For example:
c:\cfusion\bin\didump /common = c:\cfusion\verity\common -pattern llama
c:\new\parts\00000001.did
To view the occurrences of a specific word or pattern, enter a command using the
-pattern option, as in the following example:
didump -pattern acronym 00000003.did
The didump utility will display information about the number of occurrences of the
word “acronym.” You can display the individual occurrences of a word using the
verbose (-verbose) option.
For example:
c:\cfusion\bin\browse /common = c:\cfusion\verity\common
c:\new\parts\0000001.ddd
Displaying fields
There are several options that can be used to control the display of field information.
To display all the document fields, follow these steps:
1 At the Action prompt, enter ##
2 Press return 2 times to display the fields for the first document record
3 Press return to view the document fields for the next sequential record
The following partial display of the results of the browse command includes internal
fields, used by the Verity search engine. An internal field name starts with an
underscore (_) character.
50 Created FIX-date ( 4) = 12-Jan-1998 01:52:27 pm
51 Modified FIX-date ( 4) = 24-Sep-1997 02:40:26 pm
52 Size FIX-unsg ( 4) = 5381
53 DOC_OF FIX-unsg ( 4) = 0
54 DOC_SZ FIX-unsg ( 4) = 4294967295
55 DOC_FN_OF FIX-unsg ( 4) = 436
56 DOC_FN_SZ FIX-unsg ( 2) = 58
57 _CACHE_FN_OF FIX-unsg ( 4) = 2922
58 _CACHE_FN_SZ FIX-unsg ( 2) = 0
59 _ParentID_OF FIX-unsg ( 4) = 354
60 _ParentID_SZ FIX-unsg ( 2) = 46
61 Title_OF FIX-unsg ( 4) = 2481
62 Title_SZ FIX-unsg ( 2) = 15
You can eliminate the internal fields. To do this, type the underscore character, then
press return. If you enter an underscore character again then press return, the
internal fields will be displayed.
Using the Verity merge Utility 211
Note
The Verity merge utility is available only on Windows platforms.
It is important to note that collections can be merged only if they have identical
schemas. Collections can be merged if they have exactly the same set of style files
(and style file entries).
Breaking up a large collection helps to optimize search performance, because it
allows many applications to perform multiple concurrent search requests over the
different collections. After breaking up a large collection, you can also discard older
collections to reclaim limited disk storage space.
merge can be found in the ColdFusion bin directory: cfusion\bin.
To obtain help for the merge utility, enter the following command:
merge -help
Note
After running the merge utility, you must optimize the collection, using the mkvdk
-optimize option.
For example:
c:\cfusion\bin\merge /common = c:\cfusion\verity\common
Splitting collections
The following is the syntax for using the merge utility to split a single large collection
into smaller collections.
merge -split <srcCollection> <newCollection1> <newCollection2>
[-number]
212 Chapter 10 Verity Troubleshooting Utilities
The utility reads srcCollection and splits it in roughly equal-sized pieces, using the
file names given for newCollection1 and so on.
If you want to split a very large collection into a large number of new collections, you
can use the following option instead of explicitly naming each new collection:
merge -split -number newCollection srcCollection
The utility reads the collection identified by srcCollection and splits it into the
number of segments specified by the -number option. The name of the first new
collection is generated by appending the first two letters in the alphabet (aa) to the
directory name given for newCollection. Each subsequent file name is generated by
incrementing one of the appended letters (up to zz) for a maximum of 676 partitions.
For example, if the value of -number is 3, and the value of newCollection is
Collection1, the collections are named, Collection1aa, Collection1ab, and
Collection1ac.
Note
The maximum length of the directory name given for newCollection is 2 characters
less than the length allowed by the file system.
Verity VDK Error Messages 213
Warnings
Error Code No. Description
VdkWarning_CollectionDown (10) The collection was down when it was opened.
VdkWarning_QueryComplex (11) Too many matching words.
VdkWarning_LowMemory (12) Memory is low for indexing.
VdkWarning_CollectionReadOnly (13) The collection is read-only.
VdkWarning_DriverNotFound (14) Couldn’t locate specified driver.
VdkWarning_LargeToken (15) Returned a token greater than maxSize.
VdkWarning_ArgTooLarge (16) Argument too large.
VdkWarning_DataSrcNotAvail (17) Cannot locate collection data.
VdkWarning_SearchRestricted (18) Search restricted to a subset of the collection.
218 Chapter 10 Verity Troubleshooting Utilities
Part IV
ColdFusion
High-Availabilty
This chapter describes the concepts involved in achieving scalable and highly
available Web applications.
Contents
• What is Scalability?.................................................................................................. 222
• Issues Affecting Successful Scalability Implementations .................................... 225
• What is Web Site Availability? ................................................................................. 234
• Techniques for Creating Scalable and Highly Available Sites .............................. 239
222 Chapter 11 Scalability and Availability Overview
What is Scalability?
As an administrator, it’s likely that you often hear about the importance of having
Web servers that scale well, but what exactly is scalability? Simply, scalability is a Web
server’s ability to maintain a site’s availability, reliability, and performance as the
amount of simultaneous Web traffic, or load, hitting the Web server increases.
The major issues that affect Web site scalability include:
• “Performance” on page 222
• “Load management” on page 224
Performance
Performance refers to how efficiently a site responds to browser requests according
to defined benchmarks. Application performance can be designed, tuned, and
measured. It can also be affected by many complex factors, including application
design and construction, database connectivity, network capacity and bandwidth,
back office services (such as mail, proxy, and security services), and hardware server
resources.
Web application architects and developers must design and code an application with
performance in mind. Once the application is built, various administrators can tune
performance by setting specific flags and options on the database, the operating
system, and often the application itself to achieve peak performance. Following the
construction and tuning efforts, quality assurance testers should test and measure
an application’s performance prior to deployment to establish acceptable quality
benchmarks. If all of these efforts are performed well, consequently you are able to
better diagnose whether the Web site is operating within established operating
parameters when reviewing the statistics generated by Web server monitoring and
logging programs.
Depending on the size and complexity of your Web application, you may be able to
handle anywhere from 10 to thousands of concurrent users. The number of
concurrent connections to your Web server(s) will ultimately have a direct impact on
your site’s performance. Therefore, your performance objectives must include two
dimensions:
• the speed of a single user’s transaction
• the amount of performance degradation related to the increasing number of
concurrent users on your Web servers
Thus, you must establish desired response benchmarks for your site and then
achieve the highest number of concurrent users connected to your site at the desired
response rates. By doing so, you will be able to determine a rough number of
concurrent users for each Web server and then scale your Web site by adding
additional servers.
Once your site runs on multiple Web servers, you will need to monitor and manage
the traffic and load across the group of servers. See “Hardware planning” on page 237
and “Techniques for Creating Scalable and Highly Available Sites” on page 239 to
learn about the ways you can do this.
What is Scalability? 223
Linear scalability
Perfect scalability—excluding cache initializations—is linear. Linear scalability,
relative to load, means that with fixed resources, performance decreases at a
constant rate relative to load increases. Linear scalability, relative to resources,
means that with a constant load, performance improves at a constant rate relative to
additional resources.
Load management
Load management refers to the method by which simultaneous user requests are
distributed and balanced among multiple servers (Web, ColdFusion, DBMS, file, and
search servers). Effectively balancing load across your servers ensures that they do
not become overloaded and eventually unavailable.
There are several different methods that you can use to achieve load management:
• Hardware-based solutions
• Software-based solutions, including round-robin Internet DNS or third-party
clustering packages
• Hardware and software combinations
Each option has its own distinct merits.
Most load balancing solutions today manage traffic based on IP packet flow. This
approach effectively handles non-application-centric sites. However, to effectively
manage ColdFusion Web application traffic, it is important to implement a
mechanism that monitors and balances load based on specific ColdFusion Web
application load. ColdFusion relies on a leading software-based clustering
technology, ClusterCATS, to ensure that the ColdFusion Web servers, the Web server,
and other servers on which your ColdFusion Web applications depend remain highly
available.
To learn more about different hardware and software load management solutions,
see “Techniques for Creating Scalable and Highly Available Sites” on page 239.
Issues Affecting Successful Scalability Implementations 225
Note
Storing session data on the server requires that a simple identifier be stored on
the client, such as a cookie.
What is DNS?
DNS is a set of protocols and services on a TCP/IP network that allows network users
to use hierarchical natural language names rather than computer IP addresses when
searching for other computer hosts (servers) on the network. DNS is used extensively
on the Internet as well as on private enterprise networks, including LANs and WANs.
The primary capability contained within DNS is its ability to map host names to IP
addresses, and vice-versa. For example, suppose the Web server at Allaire has an IP
address of 157.55.100.1. Most people would connect to this server by entering the
domain name (www.allaire.com) and not the less friendly IP address. Besides being
easier to remember, the name is more reliable because the numeric address could
change for a variety of reasons, but the name can always be reserved.
• Translate the natural language names to server IP address mappings so that users
can find the site.
• If you have enabled round-robin distribution for multi-server load balancing, it
can distribute the load among each server in a rote, sequential distribution
manner.
However, if a spike in user activity occurs and causes servers to overload or fail,
round-robin DNS will keep distributing the requests among all of the servers, even if
some of them are no longer operational.
In short, Internet DNS is limited in its capabilities, and its round-robin distribution
mechanism does not contain any intelligence that allows it to monitor, manage, and
react to overloaded or failed servers. Consequently, DNS by itself is not a sound load
balancing or failover solution for your business-critical sites. The load balancing and
failover technology that ColdFusion Enterprise provides, ClusterCATS, compensates
for DNS limitations and allows you to create highly available, reliable, and scalable
ColdFusion Web applications.
Allaire
dev
ftp allaire.com
Zone
... ntserver
allaire.com Domain
dev.allaire.com
Zone
DNS servers store information about the domain name space and are referred to as
name servers. Name servers typically have one or more zones for which they are
responsible. The name server has authority for those zones and is aware of all the
other DNS name servers that are in the same domain.
On the Windows platform, you make DNS entries using the Domain Name Service
Manager utility.
On UNIX platforms, you make these DNS entries in the name.db file, which is read by
the DNS server’s Berkeley Internet Name Daemon (BIND).
Note
ClusterCATS for ColdFusion uses the HTTP protocol to redirect packets of data from
a failed server to an available server. Therefore, it is important to verify that your load
testing tool can handle HTTP redirections properly before you initiate load testing.
232 Chapter 11 Scalability and Availability Overview
Common failures
Following are typical types of failures that can negatively impact your Web
application’s availability and reliability:
• Hardware failures While less common than software failures, hardware failures
do occur and may include crashed hard drives, blown processors, and corrupted
network cards. Diagnosing and fixing these kinds of issues can be a lengthy
endeavor because of time spent procuring the parts and performing the labor. If
your Web application is mission-critical, you should ensure a sound hardware
redundancy strategy to avoid costly downtime. A sound strategy includes a
minimum of two Web servers but preferably three.
• Software failures The types of software failures that will most likely affect a Web
application involve the Web server’s operating system, the Web server software
itself, or the Web application software. If the operating system crashes or
becomes corrupt, the Web server cannot function properly (or perhaps at all),
causing your Web application’s availability, reliability, and performance to be
compromised. Similarly, if the Web server software crashes or acts erratically, it
will likely cause the Web server to stop running when you didn’t intend it to. It’s
hard to prepare for software failures, but if you have mirrored secondary
hardware systems in place to account for failures, you’ll minimize your Web
application’s downtime.
• Server failures In addition to the Web server, other servers on which your Web
application depends can also fail, causing either downtime or diminished
capabilities on your site. For example, for distributed applications, a proxy server
may go down, causing requests for your Web application’s services to go
unanswered. Or, the database server can crash, making it impossible for users to
236 Chapter 11 Scalability and Availability Overview
submit or retrieve information from your database. Or, a mail server can go
down, making it impossible for your users to successfully send mail to you.
Ensure that your organization’s IT architecture includes network monitoring and
notification software that can quickly report on the general health of your
network and alert you about any failed servers.
Failover considerations
The ability to fail over servers that have become unavailable to redundant servers is a
cornerstone of any mission-critical application, one that ensures an application’s
continuous and reliable operation. Such disaster planning and recovery can be
broken down into:
• “Hardware planning” on page 237
• “Systems monitoring” on page 238
• “Corrective actions” on page 238
Review the following considerations to ensure that you have a sound failover strategy
in place—one that guarantees your Web site’s availability.
Hardware planning
As illustrated in the availability example above, it’s important to acquire all of the
necessary hardware and configure it before you deploy the application. All Web sites
have different requirements, feature sets, purposes, audiences, and budgets. It all
translates into determining appropriate needs. However, if your site is a
business-critical system that affects your company’s bottom line, you must ensure an
appropriate redundancy strategy by having two or more redundant systems in place.
In fact, Allaire recommends that you use a minimum of three servers to support any
critical Web site so that you can take one server offline to perform update and
maintenance tasks while maintaining at least two servers in production at all times.
This scheme provides administrative flexibility while simultaneously protecting your
site from hardware or software failures.
The two predominant redundancy models used today are:
• Primary/Backup Servers
An example of this model would be an important Web application that receives
relatively little traffic. For instance, a corporate intranet. Typically, this
redundancy model uses an expensive, high-capacity server for the primary server
and uses an inexpensive, lower quality server for the backup server in case the
primary server fails.
• Parallel Servers
This model is known as a classic load balancing/redundancy model and is used
most often for business-critical applications. Unlike the primary/secondary
scheme discussed above, the multiple servers used in a parallel scheme are
considered peers and are grouped together as a single entity to support one or
more applications.
You can use identical cloned hardware for creating your server clusters, or you
can mix hardware sizes and models. Cloned, higher capacity, higher-end
hardware may have greater up-front hardware costs but will help minimize
administration costs down the line. Conversely, mixing hardware models and
capacities may be less expensive up-front but can add administrative costs later
on.
238 Chapter 11 Scalability and Availability Overview
If you plan to use a parallel model, Allaire recommends that you use many middle
range servers rather than fewer high-end ones or lots of inexpensive ones. Servers
that provide adequate capacity and are moderately priced can generally
accommodate all your needs just as well as expensive ones at a fraction of the
cost.
Systems monitoring
In addition to redundant hardware, you should ensure that your network and the
mission-critical sites that reside on its servers are supported by systems monitoring
software. This type of software actively and continuously monitors an application’s
availability and its service levels. These monitoring programs must not only be able
to detect problems, but they must also be able to route alerts to the correct
administrators for immediate notification of problems.
Corrective actions
The third major failover consideration is the corrective actions that need to occur if a
failure causes a server to become unavailable. Generally speaking, if a server goes
down and causes your site to become unavailable, some level of human interaction is
usually required to effectively diagnose and correct the problem.
However, before the analysis and repair can happen, the administrator needs to be
notified. Whatever failover system you put in place, it should include an automated
notification system that can route alerts via your telecommunications infrastructure
(e-mail, pagers, real time web-based alerts, etc.) to the appropriate administrator for
prompt attention.
Besides notifying the administrator that a problem has occurred, you also want your
failover solution to automatically redirect traffic intended for the unavailable server
to other available servers until the unavailable server is fixed. This crucial corrective
action is what keeps your Web site up and available to your users even if one of the
servers supporting it is experiencing problems.
Techniques for Creating Scalable and Highly Available Sites 239
What is clustering?
Clustering is a technique in which two or more Web servers supporting one or more
domains (www.yourcompany.com) are grouped together as a cluster of servers to
collectively accommodate increases in load and provide system redundancy.
The following figure shows an example of a server cluster for a sample Web site:
Clustering for scalability works by distributing load among each server in the cluster
(load balancing) using either an unintelligent-but-regular distribution sequence
(round-robin DNS and routers) or a predefined threshold or algorithm that you
specify and can adjust for each server in the cluster (specialized clustering software).
240 Chapter 11 Scalability and Availability Overview
Advantages
A hardware-based clustering solution, such as a router, is an attractive solution for
the following reasons:
• Proven technology
• Relatively low complexity
• No recurrent licensing fees
• Semi-intelligent
Routers can load balance in a round-robin fashion, detect failures, redirect traffic
and remove failed servers from a cluster.
Note
Not all load-balancing devices have the same features or offer the same capabilities.
242 Chapter 11 Scalability and Availability Overview
Considerations
Carefully evaluate the following issues against a router’s attributes:
• Expense
Hardware devices can be expensive relative to some software solutions, even
without yearly licensing fees.
• Single point of failure
If a problem develops on the load-balancing device itself and it fails, your load
balancing and failover strategies are no longer working. Although some
load-balancing devices come with secondary systems for just this reason, this
additional equipment is often what inflates the overall price of a hardware
solution.
• Not application-aware
The device cannot be tuned for particular types of Web applications (static vs.
dynamic sites) or for the development tools used to build them (scriptlets vs. JSP
vs. CGI vs. ASP and so on). Consequently, a router cannot measure the
performance of a Web application server.
• Limited intelligence
The device does not allow you to configure individual load and redirection
thresholds for each server in a cluster, and therefore, it is unable to effectively
manage load to prevent failures.
Advantages
The following benefits make a software-based clustering solution attractive:
• Relatively low expense
Compared to the cost of hardware devices, such as routers or switches,
software-based clustering solutions are relatively inexpensive. In fact, you can
cheaply implement Internet DNS on UNIX and Windows platforms for initial
load balancing needs and augment it with third-party clustering software.
• Flexibility
Some clustering software can augment existing hardware devices, thereby
providing a more robust load balancing and failover solution. Additionally, by
integrating hardware with software, you diminish, if not eliminate, losses on
capital expenditures that your organization has already made. See “Combining
hardware and software clustering solutions” on page 244 and “Load-Balancing
Devices” on page 290 for more information about how hardware and software
solutions can be integrated.
• Intelligence
Some software solutions provide a level of intelligence that enables preventive
load balancing measures that actually minimize the chance of servers becoming
unavailable. In the event that a server does becomes overloaded or actually fails,
some software can automatically detect the problem and reroute HTTP requests
to available servers in the cluster.
• No single point of failure
By distributing the load balancing and failover capabilities among multiple
servers in a cluster or multiple clusters, as opposed to relying on only a single
device, no individual server failure can disable your application.
Considerations
Consider the following issues when evaluating software-based solutions for your
environment:
• Differences among feature sets
Not all software-based clustering solutions are the same in terms of capabilities
and features. For instance, some have no automatic failure detection,
notification, or IP address assumption, and others have significantly delayed
detection. Some let you configure load thresholds to enable preventive measures,
some don’t. Determine your scalability and failover needs in advance and pick
your solution accordingly.
244 Chapter 11 Scalability and Availability Overview
• Platform constraints
Determine if the software solution you are considering will be available on your
platform or operate with your preferred Web server. If reviewing data sheets and
other marketing collateral from vendors, make sure that the robust features you
want are available on the platform you need.
• Level of complexity
Some software-based clustering solutions have relatively low complexity. Others
introduce a higher level of complexity because of the features offered, the
amount of initial configuration and subsequent administration, or the amount of
integration that needs to occur between other systems and devices.
Configuring ColdFusion
Clusters
Once you have configured your Web site and installed ClusterCATS, use the
procedures in this chapter to create and configure your clusters.
Contents
• Introduction to ClusterCATS Administration ....................................................... 246
• Creating Clusters ..................................................................................................... 252
• Removing Clusters .................................................................................................. 263
• Adding Cluster Members ........................................................................................ 264
• Removing Cluster Members ................................................................................... 266
• Server Load Thresholds .......................................................................................... 268
• Session-Aware Load Balancing .............................................................................. 276
• Load-Balancing Devices ......................................................................................... 290
• Administrator Alarm Notifications ........................................................................ 296
• Administrator E-mail Options................................................................................ 299
• Administrating Security .......................................................................................... 302
246 Chapter 12 Configuring ColdFusion Clusters
Note
Read the description of each component that is relevant to your installation in the
sections that follow. These sections contain important configuration information.
ClusterCATS Server
The ClusterCATS Server is the heart of the clustering and load balancing of
ClusterCATS. It must be installed on each server in your cluster. The server monitors
the status of all other Web servers in a cluster and tracks application and transaction
resource availability. ClusterCATS Server runs on Windows NT, Sun Solaris, and
Linux platforms. To administer the ClusterCATS Server, use the ClusterCATS Server
Administrator (Windows) or the btadmin utility (UNIX).
Each ClusterCATS Server component performs the following functions:
• Intelligently manages HTTP load across Web servers
• Proactively manages ColdFusion server load
• Provides failover support for every server in your cluster
• Proactively monitors ColdFusion servers and ColdFusion Web applications
Note
You can run the ClusterCATS Explorer from any server in the cluster, or you can run it
remotely. This flexibility allows administrators in different geographic locations the
ability to administer distributed clusters. You can also use ClusterCATS Explorer to
administer UNIX clusters from a single Windows machine. Multiple clusters can be
viewed from a single Explorer.
The ClusterCATS Explorer presents a view of your cluster in much the same manner
as the Windows Explorer presents a view of the files and directories that reside on a
PC, as the following figure shows:
Note
ClusterCATS for ColdFusion only installs ClusterCATS Web Explorer on UNIX servers
but you can access it from any computer with an Internet browser.
The Web Explorer, like its Windows counterpart, is quite robust and lets you
configure and administer clusters easily. However, it does not contain the identical
functionality provided by the Windows-based ClusterCATS Explorer. The Web
Explorer does not let you do the following:
• Install the ClusterCATS Web Explorer on an NT server; it runs only from UNIX
servers.
• Create and administer NT servers that have security enabled.
• Set or modify load thresholds via a graphical display.
• Monitor the amount of load hitting the server via a graphical display; the server’s
load statistics are only displayed textually on the Cluster Member List and Server
Properties pages.
If you require any of these capabilities, you should obtain a Windows machine and
use the Windows-based ClusterCATS Explorer for your cluster administration.
Note
For availability and security reasons, be sure to only allow access to the ClusterCATS
Web Explorer from a separate IP-based virtual host server on a port other than 80 and
password protect access to it.
Netscape considerations
By default, Netscape Enterprise Server assigns your Web server a random, six-digit
communication port number. You can either use this assigned number or change it
to something easier to remember, like port 81.
If you are not familiar with configuring your Web server’s communications ports, see
the Netscape Enterprise Server Administrator online help for instructions.
Introduction to ClusterCATS Administration 249
Apache considerations
Make the following changes to the Apache Web server’s httpd.conf file to enable the
ClusterCATS Web Explorer (btweb). Replace the IP address specified in the example
below (192.168.96.71) and the port (2222) with one appropriate for your system
and enable authentication for the virtual directory.
###
### BTWeb Administration
###
Listen 192.168.96.71:2222
<VirtualHost 192.168.96.71:2222>
ServerAdmin root@localhost
DocumentRoot /usr/lib/btcats/btweb
DirectoryIndex default.htm
ServerName btweb
ErrorLog logs/btweb_error_log
CustomLog logs/btweb_access_log combined
### BTWeb stuff ###
AddHandler cgi-script .exe
<Directory "/usr/lib/btcats/btweb/">
Options FollowSymLinks
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "btcats admin tools"
AuthType Basic
AuthUserFile /usr/local/apache/conf/users
require user admin
</Directory>
</VirtualHost>
Once you have configured your server, restart Apache. To access the Web Explorer,
point your browser to the IP address you entered as the VirtualHost.
For information on using the htpasswd utility to create and manage your
authentication file list, refer to the Apache documentation.
For Apache:
http://<virtual_host>:<admin-port>/default.html
servername or virtual_host is the name of the Web server on which you
installed ClusterCATS and <admin-port> is the communication port number that
the Web server or virtual host has been configured to listen for HTTP requests.
The Enter Network Password dialog box appears:
3 Enter your user name and password in the appropriate fields and click OK.
Note
The default user name and password is admin.
btadmin
btadmin is a scriptable utility that lets you perform server-specific maintenance
activities for each server in a cluster. btadmin is available on both UNIX and
Windows servers.
Unlike the ClusterCATS Web Explorer, which lets you administer your entire cluster
from a single, central computer, you must use btadmin from each server in your
cluster. btadmin allows you to:
• Add and remove the ClusterCATS filter from the Web server service
• Stop and start the ClusterCATS service
• Place a cluster member in maintenance mode
• Reset a clustered server’s configuration to its pre-clustered state
For more information on btadmin, refer to “Using btadmin” on page 322.
Creating Clusters
If you have successfully installed ClusterCATS, you are ready to create server clusters.
This section explains the following:
• “Creating clusters in Windows” on page 252
• “Creating clusters in UNIX” on page 261
2 Select Configure > Cluster Setup Wizard. Alternatively, you can click the Cluster
3 Enter a name for your cluster and GoColdFusion in the License Key field and
click Next.
Note
The License Key field is case-sensitive, so be sure to enter the key exactly as
shown in this step.
Make your cluster names logically consistent with their purpose. For example,
Sales Web, Customer Support Web, and so on.
The List of Web Servers dialog box appears:
5 Enter the fully qualified host name of a Web server in the New Web Server Name
field (for example, doc.allaire.com).
6 If you are using the ClusterCATS dynamic IP addressing scheme AND you do not
have the maintenance IP address bound to your NIC, select ClusterCATS
Maintenance Support.
Creating Clusters 255
If you are not configuring this Web server for offline maintenance support, go to
step 8.
Note
You can only set the maintenance support option when creating a cluster or
adding a cluster member to a cluster. You cannot configure or modify this option
after you have created and added the cluster member to the cluster.
Enabling maintenance support for clusters requires that you configure your
cluster for ClusterCATS dynamic IP addressing. For more information, see
“ClusterCATS Dynamic IP Addressing (Windows only)” on page 334.
7 Enter the fully qualified host name of the maintenance address (for example,
serv1.yourcompany.com) in the Maintenance Address field.
8 Click OK.
9 Repeat steps 4 through 8 for each Web server you want to add to the cluster and
then click Next to proceed.
The Load Management dialog box appears:
256 Chapter 12 Configuring ColdFusion Clusters
10 If you want to use the default load threshold settings, click Next and go to step 13.
However, if you do not want to use the defaults, select the server and click
Configure to configure new peak and gradual redirect load thresholds for that
cluster member.
The Load Thresholds dialog box appears:
11 Enter new numerical values (not higher than 100%) in the Peak Load Threshold
and Gradual Redirect fields and click OK.
Be sure to keep your Peak load threshold below 100% to accommodate
ColdFusion’s processing needs. Set your Gradual Redirection threshold to be
lower than your peak threshold.
12 Click Next.
The Alert Notification dialog box appears:
13 Enter the name of your outbound SMTP mail server in the SMTP Mail Server field
and the e-mail address for a recipient of cluster alerts in the E-mail Address field.
If multiple people will receive different alerts for different types of notification
events, go to step 14. Otherwise, click Next and proceed to step 16.
Creating Clusters 257
15 Select an alert event and enter the e-mail address of the recipient.
If you want the same person to receive the majority of alerts, click Propagate to
automatically fill each event’s Recipient column with the same e-mail address.
You can then manually change the few recipients that are different. If there are
multiple recipients for the same alert event, separate your e-mail address entries
with commas. Click OK to return to the Alarm Notifications dialog box and then
click Next to proceed.
The Session State Management dialog box appears:
258 Chapter 12 Configuring ColdFusion Clusters
16 If your server cluster supports a site that needs to maintain persistent state on the
same Web server during a user session, select Yes to enable session-aware load
balancing. Otherwise, select No and click Next.
The Load Balancing Device dialog box appears:
2 Select Cluster Manager > New Cluster. Alternatively, you can right-click the
Cluster Manager icon and select New Cluster or click the New Cluster button in
the toolbar.
The Create New Cluster dialog box appears:
260 Chapter 12 Configuring ColdFusion Clusters
3 Add a new cluster using the fields as described in the following table:
Field Description
Cluster Name Enter a unique name for the cluster.
Make your cluster names logically consistent with their
purpose. For example, Sales Web, Customer Support Web,
and so on.
License Key Enter GoColdFusion. This field is case-sensitive, so be sure
to enter the key exactly as shown.
Web Server Name Enter the fully qualified host name (for example,
doc.allaire.com) for the first server you want to be a
member of this cluster.
You cannot create an empty cluster; you must specify a Web
server that will be part of the cluster. If this is the first server
that you have added to the cluster, it is known as the Admin
Manager. The remaining steps guide you in configuring the
Admin Manager.
Bring Up in Passive Select this checkbox to bring the Admin Manager up in
Mode Passive mode. If you do not select this checkbox, the server
will be brought up in Active mode.
For more information on passive/active modes, refer to
“Changing Active/Passive Settings” on page 309.
ClusterCATS Select the ClusterCATS Maintenance Support check box to
Maintenance enable support for offline maintenance.. The Admin Manager
Support must be configured with a maintenance IP address.
Using maintenance support requires that your cluster
support ClusterCATS dynamic IP addressing. For more
information, refer to “ClusterCATS Dynamic IP Addressing
(Windows only)” on page 334.
Offline maintenance support is only available on Windows
NT server clusters. You can only set the maintenance
support option when creating a cluster or adding a cluster
member to a cluster. You cannot configure or modify this
option after you have created and added the cluster member
to the cluster.
Maintenance Enter the fully qualified host name of the maintenance
Address address (for example, serv1.yourcompany.com). This field
is only accessible if you selected ClusterCATS Maintenance
Support.
4 Click OK
Your cluster appears below the Cluster Manager icon in the ClusterCATS Explorer
left pane. To manually add additional cluster members to your new cluster, see to
“Adding Cluster Members” on page 264.
Creating Clusters 261
3 Add a new cluster using the fields as described in the following table:
Field Description
Cluster Name Enter a unique name for the cluster.
Make your cluster names logically consistent with their purpose.
For example, Sales Web, Customer Support Web, and so on.
Web Server Enter the fully qualified host name (for example,
Name doc.allaire.com) for the first server you want to be a member
of this cluster.
You cannot create an empty cluster; you must specify a Web
server that will be part of the cluster. If this is the first server that
you have added to the cluster, it is known as the Admin
Manager.
You cannot create an empty cluster; you must specify a Web
server that will be part of the cluster.
License Key Enter GoColdFusionGoJava.
The License Key field is case-sensitive, so be sure to enter the
key exactly as shown in this step.
Make your cluster names logically consistent with their purpose.
For example, Sales Web, Customer Support Web, and so on.
4 Click OK.
ClusterCATS creates the cluster and displays its members on the Cluster Member
List page.
Removing Clusters 263
Removing Clusters
To delete an entire cluster, you must delete each cluster member from the cluster
individually, using the procedure described in “Removing Cluster Members” on page
266.
Note
When deleting cluster members, you must delete the Admin Manager (Windows) or
the Admin Agent (UNIX) last. This server is the first server you added to the cluster.
When the last cluster member has been removed, the cluster itself is deleted.
3 In the Web Server Name field, enter the fully qualified host name of the Web
server (for example, ckatz.allaire.com).
4 If you are using the ClusterCATS dynamic IP addressing scheme AND you do not
have the maintenance IP address bound to your NIC, select ClusterCATS
Maintenance Support.
If you are not configuring this Web server for offline maintenance support, go to
step 6.
Note
You can only set the maintenance support option when creating a cluster or
adding a cluster member to a cluster. You cannot configure or modify this option
after you have created and added the cluster member to the cluster.
Adding Cluster Members 265
Enabling maintenance support for clusters requires that you configure your
cluster for ClusterCATS dynamic IP addressing. For more information, see
“ClusterCATS Dynamic IP Addressing (Windows only)” on page 334 .
5 Enter the fully qualified host name of the maintenance address (for example,
serv1.yourcompany.com) in the Maintenance Address field.
6 Click OK.
7 Repeat steps 2 through 6 to add additional servers to the cluster manually.
3 Enter the fully qualified host name (for example, doc.allaire.com) in the
Web Server Name field.
4 Click OK to add the cluster member to the existing cluster.
266 Chapter 12 Configuring ColdFusion Clusters
3 Select the cluster member you want to delete from the Web Server Name
drop-down box.
A message appears telling you that the selected server has been deleted.
Note
If you delete the last cluster member in a cluster, the cluster is also deleted and
you are returned to the default page of the ClusterCATS Web Explorer.
4 Click OK.
268 Chapter 12 Configuring ColdFusion Clusters
4 Enter a new numeric value (less than 100%) in the first Load Management field.
This is referred to as the Peak load threshold. In the example above, the Peak load
threshold is set to 90.
5 Enable the Gradual Redirection check box.
6 Enter a new value in the Gradual Redirection field. This value must be lower than
the Peak load threshold.
7 Click OK to apply your new threshold settings.
270 Chapter 12 Configuring ColdFusion Clusters
To configure load threshold settings using the Server Load dialog box:
1 Open the ClusterCATS Explorer and select a server.
2 Select Monitor > Load. Alternatively, you can right-click the server and select
Monitor > Load.
The Server Load dialog box appears:
3 Use your mouse to drag the Peak load threshold (red) up or down.
As you move the line, the Peak load threshold percentage changes.
4 Enable gradual redirection by selecting the Gradual Redirection check box.
5 Drag the Gradual Redirection load threshold (yellow) to adjust it accordingly.
6 Close the dialog box to apply the load threshold settings you configured.
272 Chapter 12 Configuring ColdFusion Clusters
3 Enter the fully qualified host name of a server in the Web Server Name field.
Server Load Thresholds 273
4 Click OK.
The Cluster Member List page appears, as the following figure shows. If you get
an "Error: Server <cluster_member_name> could not be found" message, make
sure you used the correct, fully-qualified server name and that the server is
running.
274 Chapter 12 Configuring ColdFusion Clusters
6 Select the server you want to connect to from the Web Server Name listbox.
Server Load Thresholds 275
7 Click OK.
The selected server’s Server Properties page appears:
9 To change the Peak load threshold, enter a new numeric value (less than 100%) in
the Standard Load Threshold field.
10 Enable the Gradual Redirection check box if it is not already enabled.
11 To change the Gradual Redirection load threshold, enter a new numeric value in
the Gradual Load Threshold field. This value must be lower than the Standard
Load Threshold.
12 Click OK to apply your new load threshold settings.
Note
Session-aware load balancing may not work if you use absolute hyperlinks in your
Web pages. Absolute links route the HTTP request back to the cluster entry point and
redirect according to the current load threshold without regard to the state of the
requesting client. To avoid this inadvertent loss of state, be sure to use only relative
linking in your Web pages.
3 Enter the fully qualified host name of the server for which you want to configure
session-aware load balancing in the Web Server Name field.
Session-Aware Load Balancing 279
4 Click OK.
The Cluster Member List page appears:
Note
The ColdFusion service must be running on your server to add a probe.
Session-Aware Load Balancing 281
3 Enter a name you want to assign to this probe’s monitor in the Name field on the
New Monitor dialog box and click OK.
The monitor’s Properties dialog box appears:
Field Description
Web Server Select the name of the server from the drop-down list.
Pathname Enter the absolute path to the ColdFusion probe. Do not
change the default selection unless you installed
ColdFusion to a directory other than the default
installation directory.
Session-Aware Load Balancing 283
Field Description
Working directory Enter the absolute path to the probe’s working directory.
Do not change the default selection unless you installed
ColdFusion to a directory other than the default
installation directory.
Startup Parameters Replace the <URL> with the actual URL of the site you
want the probe to access, and replace <success string>
with a text string that appears on apage on the site you
are probing.
Tips.
• Be sure to include a space between the URL and the
success string that you specify. The success string
must be enclosed in quotation marks.
• Do not modify the RESTART explicit parameter if you
want the probe to automatically restart the ColdFsion
Server upon detecting a failure. However, if you do not
want ClusterCATS to auatomatically restart the
ColdFusion Server upon detecting a failure, replace
RESTART with NORESTART.
Timeout (sec) Enter a time, in seconds, to indicate how long
ClusterCATS should wait before a ColdFusion server
failure is registered.
Do not set this value to less than 60 seconds because
ClusterCATS may restart the ColdFusion server
inadvertently (due to network congestion, for example),
rather than detect an actual failure on the ColdFusion
server.
Frequency (sec) Enter a time, in seconds, to indicate how often the probe
checks the ColdFusion server.
Probes that restart Web applications should be
configured to run no more frequently than the time it
takes to stop and restart ColdFusion. This time is highly
site-specific, because it depends on the system
resources available on the servers and the volume of
traffic at the site.
For probes that do not restart the Web application, the
Frequency depends on how long you can reasonably
afford to have your Web application off-line. A minimum
Frequency of 15 seconds is recommended.
Return Value Enter 0 so that the probe succeeds on a successful
probing of the page. Enter a non-zero number to have the
probe succeed on a failure.
The default is 0. Only under rare circumstances would
you change this to a non-zero number.
284 Chapter 12 Configuring ColdFusion Clusters
5 Configure the application probe settings as described in the table on page 282.
Session-Aware Load Balancing 285
9 To create a new probe, click New. The ColdFusion Application Probe page
appears:
If this is the first probe for this server or you clicked New to add another probe,
the ColdFusion Application Probe page appears:
Field Description
Status This is an informational field. If the probe is not registered, the
Status displays Not registered. If the probe is registered,
the Status displays Succeeding.
Pathname Enter the path to the ColdFusion probe. Do not change the
default selection unless you installed ClusterCATS for
ColdFusion to a directory other than the default installation
directory.
Working directory Enter the path to the probe’s working directory. Do not change
the default selection unless you installed ClusterCATS for
ColdFusion to a directory other than the default installation
directory.
288 Chapter 12 Configuring ColdFusion Clusters
Field Description
Startup Enter the actual URL of the site you want the probe to access
Parameters followed by a text string that appears on a page within the site
you are probing (cfprobe.cfm in the screen shown in step 9.)
Note: Do not modify the RESTART explicit parameter if you
want the probe to automatically restart the ColdFusion Server
upon detecting a failure. However, if you do not want
ClusterCATS to automatically restart the ColdFusion Server
upon detecting a failure, replace RESTART with
NORESTART.
Timeout (sec) Enter a time, in seconds, to indicate how long ClusterCATS
should wait before a ColdFusion server failure is registered.
Do not set this value to less than 60 seconds because
ClusterCATS may restart the ColdFusion server inadvertently
(due to network congestion, for example), rather than detect
an actual failure on the ColdFusion server.
Frequency (sec) Enter a time, in seconds, to indicate how often the probe
checks the ColdFusion server.
Probes that restart Web applications should be configured to
run no more frequently than the time it takes to stop and
restart ColdFusion. This time is highly site-specific, because it
depends on the system resources available on the servers
and the volume of traffic at the site.
For probes that do not restart the Web application, the
Frequency depends on how long you can reasonably afford to
have your Web application off-line. A minimum Frequency of
15 seconds is recommended.
Return value Enter 0 so that the probe succeeds on a successful probing of
the page. Enter a non-zero number to have the probe succeed
on a failure.
The default is 0. Only under rare circumstances would you
change this to a non-zero number.
11 Click Register to create the probe. ClusterCATS begins to test the selected server
immediately.
Load-Balancing Devices
You can configure ClusterCATS to work in conjunction with a third-party hardware
load balancing device or load balancing software product to provide comprehensive
load balancing and failover support for your server clusters.
This section describes the following:
• “Using Cisco LocalDirector” on page 290
• “Using third-party load balancing devices in Windows” on page 294
• “Using third-party load balancing devices in UNIX” on page 295
Note
You must use Cisco LocalDirector Version 3.1.4 software or later.
Before configuring ClusterCATS with the LocalDirector, you must configure the
LocalDirector to manage your Web servers. For more information, refer to the Cisco
documentation.
LocalDirector considerations
You must be aware of the following when using ClusterCATS with Cisco
LocalDirector:
• When load balancing with the LocalDirector, ClusterCATS sets the state of each
cluster member to Passive mode. For more information about Passive mode, refer
to “Changing Active/Passive Settings” on page 309.
• Do not use round-robin DNS.
• Turn off ClusterCATS’ Gradual Redirection load threshold. See “Server Load
Thresholds” on page 268 for information on turning off gradual redirection.
• Do not use ClusterCATS’ dynamic IP addressing feature. If ClusterCATS performs
dynamic IP failover, the LocalDirector will not be able to recover the failed-over
IP address. For more information on ClusterCATS’ server failover features, refer to
“ClusterCATS Dynamic IP Addressing (Windows only)” on page 334.
Load-Balancing Devices 291
• If two or more Web servers on the same system are in clusters using Cisco
LocalDirector load balancing, then each cluster must have the same DFP Agent
Listen Port number configured. The ClusterCATS DFP agent can only listen on
one port.
Note
Do not use the dynamic-feedback-pw command. ClusterCATS does not support
secure DFP hosts.
dynamic-feedback -timeout
Use the dynamic-feedback -timeout option to set timeout to a value larger than
the update frequency so that the LocalDirector does not prematurely terminate
the connection with the cluster because of inactivity. Allaire recommends that
you set the value to at least two times the update frequency.
dynamic-feedback -retry
Use the dynamic-feedback -retry option to set the retry value to zero (0) to
ensure that the LocalDirector will continue connection attempts to the
ClusterCATS DFP agent in the event of a lengthy period of system unavailability.
For more information on using the LocalDirector dynamic-feedback command,
refer to Cisco’s LocalDirector Command Reference.
8 Select the Load Balance tab and choose Cisco LocalDirector from the Load
Balancing Product drop-down list.
Field Description
Website Alias Enter the name of the virtual server
(www.yourcompany.com) you created in step 3.
LocalDirector IP Enter the IP address of the Cisco LocalDirector.
Address
DFP Agent Listen Port Enter the port number on which the cluster’s DFP agent
should listen for incoming LocalDirector connection
requests. This port should be the same port specified in
the LocalDirector dynamic-feedback as described in
step 5.
Update Frequency Enter the frequency, in seconds, that you want
ClusterCATS to update the LocalDirector with availability
data.
This is typically a value between 5 and 30 seconds. You
can lengthen it up to 120 seconds.
Set a longer time as you add greater numbers of Web
servers to the cluster. This minimizes the overhead of
traffic to the LocalDirector.
HTTP Port Enter the port number on which each cluster member
listens for unsecured HTTP requests. Enter 0 if not
applicable.
294 Chapter 12 Configuring ColdFusion Clusters
Field Description
HTTPS Port Enter the port number on which each cluster member
listens for secured HTTP requests. Enter 0 if not
applicable.
Bind ID Enter the same Bind ID specified for the explicit (real)
servers on the LocalDirector in step 4. In order for the
ClusterCATS/LocalDirector integration to work as
intended, the server name, port number, and bind ID
combination must be the same on this ClusterCATS
Load Balance tab as it is on the LocalDirector box.
10 Click OK.
Once configured, ClusterCATS automatically sets the state of each cluster member to
Passive and provides the load balancing and high availability data it acquires to the
LocalDirector. The LocalDirector then actively manages HTTP traffic across the
cluster.
3 Select Configure > Administration. Alternatively, you can right-click the cluster
and select Configure > Configure. The Cluster Properties dialog box appears:
4 Select the Load Balance tab.
The selection in the Load Balancing Product drop-down list indicates how
ClusterCATS will actively load balance HTTP traffic across the cluster.
5 Enter the name of the Web site in the Website Alias field.
6 Click OK to apply your changes.
Note
You cannot take advantage of ClusterCATS’ support of Cisco LocalDirector using the
ClusterCATS Web Explorer. This capability is only available in the Windows-based
ClusterCATS Explorer. You can, however, configure Cisco LocalDirector as a
third-party load balancing device to work with ClusterCATS.
6 In the Load Balancing Product field, enter the URL of the Web site for which the
load balancing product has been set up to manage HTTP traffic.
7 Click OK to apply your changes.
3 Select the event for which you want to trigger an alarm and enter the e-mail
address of the person you want to receive an e-mail notification of the event.
If you want multiple people to receive an e-mail notification about the same
event, add more e-mail addresses to the field and separate each e-mail address
with a comma.
4 Repeat step 3 for each event you want to be notified about.
To send all notifications to the same e-mail address, enter the e-mail address
once and click Propagate.
5 Enter the name of the default SMTP mail server to which your mail is delivered in
the Default SMTP Host field.
6 Click OK.
6 Enter the e-mail address of the person you want to be notified about the
occurrence of an event in that event’s corresponding field.
If you want multiple people to receive an e-mail notification about the same
event, add more e-mail addresses to the field and separate each e-mail address
with a comma.
7 Enter the name of the default SMTP mail server to which your mail is delivered in
the SMTP Host field.
8 Click OK to apply your changes.
Administrator E-mail Options 299
Field Description
SMTP Gateway Enter the name of the server through which outgoing e-mail will
be sent.
Support E-mail Enter the e-mail address of the person at your organization that
should receive a copy of the nightly technical support e-mail. If
more than one person should receive the e-mail, separate the
e-mail addresses with commas.
You do not have to enter an Allaire technical support address.
That is implicit.
Report E-mail Enter the e-mail address of the person at your organization that
should receive daily reports about your clusters. If more than
one person should receive the e-mail, separate the e-mail
addresses with commas.
3 Enter the fully qualified host name of a server for which you want to configure
administrator e-mail support in the Web Server Name field.
4 Click OK. The Cluster Member List page appears.
5 Click the Support link. The Cluster Support page appears:
Field Description
SMTP Gateway Enter the name of the server through which outgoing e-mail will
be sent.
Support e-mail Enter the e-mail address of the person at your organization that
should receive a copy of the nightly technical support e-mail. If
more than one person should receive the e-mail, separate the
e-mail addresses with commas.
You do not have to enter an Allaire technical support address.
That is implicit.
Report e-mail Enter the e-mail address of the person at your organization that
should receive daily reports about your clusters. If more than one
person should receive the e-mail, separate the e-mail addresses
with commas.
Administrating Security
When you enable ClusterCATS administration security for a specific cluster, only
authorized users are able to access and administer that cluster using their
ClusterCATS Explorer (Windows) or the ClusterCATS Web Explorer (UNIX).
ClusterCATS provides three administration security settings for securing your server
cluster environment:
• Disabled Authentication
This is the default setting. It provides no security challenge, and therefore anyone
can access the server cluster with a ClusterCATS administration tool or even a
Web browser and modify your cluster environment.
• Local User Authentication
This is the recommended security setting for most clusters residing in small to
mid-sized organizations that have only a few administrators. This setting
provides a security challenge for anyone accessing the server. The authentication
is based on administrative privileges that you define for specific users on each
server in the cluster.
• Windows NT Domain Authentication (Windows NT Only)
You may want to use this security setting if your organization is fairly large and
contains many distributed administrator groups that need to access your server
clusters. To use this setting, you must define your global administrators’ group in
the form “BT_clustername”, where clustername is the exact name of the cluster
you created with the ClusterCATS Explorer. The global administrators group must
exist within the same domain as the clustered servers.
This section describes the following:
• “Configuring authentication on Windows” on page 302
• “Configuring authentication on UNIX” on page 306
Note
If only one person will administer all cluster members in the cluster, be sure to
create the same user account (identical user name and password) on each cluster
member. The ClusterCATS Explorer will consequently prompt you only once for a
user name and password. However, if multiple, different administrator accounts
are created on each server, ClusterCATS Explorer will display user name and
password prompts upon each attempt to access the servers from the ClusterCATS
Explorer.
Note
ClusterCATS requires you to enter a valid user name and password after selecting
the type of authentication you are using so that you do not inadvertently lock
yourself out of the cluster.
6 Click OK to enable local user authentication for the selected cluster. Only
administrators who have accounts on each secured server can access and
administer those cluster members using ClusterCATS Explorer.
Note
This authentication mode can only be used on NT servers.
Before you can enable NT domain authentication on any specific cluster, you must
create an NT global user group within the domain you want to secure. You can do
this using the standard Windows NT User Manager for Domains utility. After you
create a user group, add users to it, and enable the NT Domain authentication mode
from the ClusterCATS Explorer, all users you add to that group are automatically
authenticated to view and change the cluster. All servers in the cluster must reside in
the same Windows NT domain unless a trusted relationship is set up between two or
more domains.
A global group must exist in the domain from which the ClusterCATS Explorer is
executed. Cluster members in other domains need only the trust relationship.
ClusterCATS Explorer determines what servers exist in which NT domain by
communicating with any Windows NT domain controller for the domain. The list of
servers that exist in the Windows NT domain can be viewed by looking at the
Network Neighborhood Windows NT utility. If no trust relationship exists, then
cluster members must be from the same Windows NT domain.
Note
ClusterCATS requires you to enter a valid user name and password after selecting
the type of authentication you are using so that you do not inadvertently lock
yourself out of the cluster.
Disabling authentication
Disabling authentication lets any user use the ClusterCATS Explorer to create,
configure, or administer clusters. Once the cluster is added, administrators have
unrestricted access to the content in that cluster. Therefore, you should only choose
Disabled mode if security is not a concern (for example, in a development or QA
environment).
By default, ClusterCATS administrator security is disabled. However, if you have
previously configured the security mode for your cluster and now want to turn if off,
perform the following procedure.
To disable authentication:
1 Open the ClusterCATS Explorer and select a cluster with authentication enabled.
2 Select Configure > Authentication or select Cluster > Properties. Both menu
selections display the Properties dialog box. Alternatively, you can right-click the
cluster and select Configure > Administration.
3 Select Disabled from the Mode drop-down box.
4 Click OK to apply your changes.
306 Chapter 12 Configuring ColdFusion Clusters
6 Select Local User from the Authentication drop-down box to enable local-user
authentication.
7 Select Disabled to disable authentication.
8 If using local user authentication, enter a valid user name and password and click
OK.
ClusterCATS requires you to enter a valid user name and password after selecting
the type of authentication you are using so that you do not inadvertently lock
yourself out of the cluster.
Chapter 13
After you have created your clusters, added servers to those clusters, and configured
them with load balancing and high availability features, they will likely run
inconspicuously in your environment for quite some time. However, at some point
you may need to update software and content or perform general maintenance tasks
that are beyond the typical cluster creation and configuration activities.
Contents
• Understanding ClusterCATS Server Modes .......................................................... 308
• Changing Active/Passive Settings .......................................................................... 309
• Changing Restricted/Unrestricted Settings .......................................................... 311
• Using Maintenance Mode (Windows only) .......................................................... 313
• Updating an Existing Cluster Member (Windows only) ...................................... 317
• Resetting Cluster Members .................................................................................... 319
308 Chapter 13 Maintaining Cluster Members
Mode Description
Active/Passive Setting Turns on and off the ClusterCATS Server. In Active state,
the ClusterCATS Server intercepts HTTP requests and
processes them for load balancing and availability. In
Passive state, all HTTP requests are passed directly to the
Web server without the ClusterCATS Server intercepting
them.
For more information on Activating/Deactivating
ClusterCATS Servers, refer to “Changing Active/Passive
Settings” on page 309.
Restricted/Unrestricted Determines whether Active cluster members receive any
Setting HTTP traffic. Restricted ClusterCATS Servers do not
receive any HTTP traffic. Unrestricted ClusterCATS
Servers are sent traffic as normal.
For more information on setting ClusterCATS Servers to
Restricted or Unrestricted mode, refer to “Changing
Restricted/Unrestricted Settings” on page 311.
Maintenance Mode Allows you to gracefully remove a server from a cluster by
draining off all users without cutting connections. This is
typically used when you want to upgrade a server or
remove it entirely from the cluster.
For more information on putting clusters in and out of
Maintenance mode, refer to “Using Maintenance Mode
(Windows only)” on page 313.
Note that only Windows cluster members can be put in
Maintenance mode.
Changing Active/Passive Settings 309
3 To have the ClusterCATS Server ignore incoming HTTP requests and pass them
directly to the Web server, select the Passive Member option.
4 To have ClusterCATS Servers intercept requests to your Web resources, select the
Active Member option.
5 Click OK to apply your changes.
The color of the cluster member’s icon in the ClusterCATS Explorer turns white,
indicating that the cluster is passive.
6 Repeat steps 1 through 5 to change other members in the cluster.
310 Chapter 13 Maintaining Cluster Members
3 Select the Active Member option if the server has been in passive state.
4 To ensure that HTTP requests sent explicitly to this cluster member are redirected
to another server within the cluster, select Restricted in the Server Access area.
6 Click OK.
9 To ensure that HTTP requests sent explicitly to this cluster member are redirected
to another server within the cluster, select Restricted from the Restriction Status
drop-down box.
Using Maintenance Mode (Windows only) 313
Note
Allaire recommends that you set up your clusters with ClusterCATS dynamic IP
addressing for using Maintenance mode. For more information, see “Using Server
Failover” on page 340.
3 Change the Peak load threshold to 0% so that any additional HTTP requests will
be redirected to other servers in the cluster.
4 OK.
Using Maintenance Mode (Windows only) 315
5 Physically go to the server you selected in step 1 and open the ClusterCATS Server
Administrator utility on this server by selecting
Start > Programs > ColdFusion 3.0 > ClusterCATS Server Administrator
The ClusterCATS Server Administrator appears:
6 Click the Service Status window button to display the Manage ClusterCATS
Services dialog box.
316 Chapter 13 Maintaining Cluster Members
7 Select the Stopped option to stop the ClusterCATS service and enter a value, in
minutes, in the Drain Down Period field. This allows current users to conclude
their sessions within the time indicated.
8 Click OK.
When the drain-down period expires, the server will fail over to another server in
the cluster.
7 Select Running.
ClusterCATS will add the cluster member back into the cluster.
8 To initially limit the amount of HTTP traffic sent to the server, return to the
ClusterCATS Explorer and reconfigure the cluster member’s Peak Load threshold
to a low value such as 10%.
9 Click OK.
10 Within the ClusterCATS Explorer, right-click the cluster member and select
Monitor > Load.
The Server Load Monitor appears:
11 Observe your cluster member at low usage levels until you are satisfied that your
new changes are working properly.
12 When you are certain that the updates you made have not adversely affected the
server’s operation, set the Peak and Gradual Redirection load thresholds back to
their original values.
Resetting Cluster Members 319
ClusterCATS Utilities
Contents
• Using btadmin ......................................................................................................... 322
• Using bt-start-server and bt-stop-server (UNIX only) ......................................... 325
• Using btcfgchk ......................................................................................................... 325
• Using hostinfo ......................................................................................................... 328
• Using sniff ................................................................................................................ 329
322 Chapter 14 ClusterCATS Utilities
Using btadmin
btadmin is a scriptable utility installed on each server in cluster. It provides most of
the functionality of the Windows-based ClusterCATS Server Administrator so that
UNIX and Windows administrators can include calls in automated scripts.
This section describes the following:
• “Using btadmin on UNIX” on page 322
• “Using btadmin on Windows” on page 324
Daemon Description
ccmgr Application manager daemon.
dfp Cisco LocalDirector’s Dynamic Feedback Protocol daemon.
failover The failover daemon.
ipaliasd The ClusterCATS failover daemon.
ns-httpd The HTTP daemon.
wsprobe Web server probe daemon.
Note
Stopping and starting some daemons may result in multiple daemons being stopped
or started.
Following are examples of how you start and stop daemons with the btadmin utility:
btadmin start appmgr
btadmin stop failover
btadmin restart ns-httpd
[enable | disable | add | delete |
config <option> _ <Web_server_instance>]
Using btadmin 323
The following table describes the btadmin options for changing the ClusterCATS
settings:
Option Description
enable Enable the specified option for a Web server instance.
disable Disable the specified option for a Web server instance.
add Add a new Web server instance.
delete Delete an existing Web server instance.
config Configure a specified option for an instance. btadmin
prompts you for additional information when using the
config option.
For Netscape Web servers, enter the Web server instance as https-<server>. For
Apache Web servers enter https-<hostname>.
You can enable, disable and configure the following ClusterCATS options using the
btadmin utility:
Option Description
btcats Configures the ClusterCATS Server.
dfp Configures Cisco LocalDirector’s Dynamic Feedback Protocol.
failover Configures the ClusterCATS failover (ipaliasd) support.
load Configures the load balancing preferences.
wsroot Configures a Web server root directory in case you upgrade your
installation or move the root directory.
wsprobe Configures the Web server probes.
[show]
Use the show option to display the currently enabled ClusterCATS configuration
settings.
[reset]
Use the reset option to reinitialize your cluster configuration settings on the current
server. For more information on the effects of resetting a cluster member, refer to
“Resetting Cluster Members” on page 319.
324 Chapter 14 ClusterCATS Utilities
[help]
Use the help option to get a list of the btadmin utility’s features and syntax.
Option Description
btadmin Displays btadmin online help.
btadmin -v Displays the current version of Microsoft’s IIS if it is bound
to the ClusterCATS Server.
btadmin -f Removes the ClusterCATS Web server filter and all virtual
directories.
btadmin +f Adds the ClusterCATS filter to your Web server.
btadmin -b Stops all ClusterCATS services.
btadmin +b Starts all ClusterCATS services.
btadmin +m Reconfigures all ClusterCATS services to Manual start
mode.
btadmin -m Reconfigures all ClusterCATS services to Automatic start
mode.
btadmin -r Removes all servers, delete database files and registry
keys related to servers
btadmin -s <seconds> Puts server into Maintenance mode after a set delay (in
seconds). This shuts down all ClusterCATS services. For
more information on using Maintenance mode, refer to
“Using Maintenance Mode (Windows only)” on page 313.
btadmin can be invoked with more than one options. For example to stop and restart
ClusterCATS services enter btadmin -b +b.
Using bt-start-server and bt-stop-server (UNIX only) 325
Using btcfgchk
The btcfgchk utility is a network management tool that displays information about
your IP and DNS configurations. Use it to analyze and troubleshoot your servers and
network.
Syntax
Invoke btcfgchk from the command line in the <CC_install_directory>/
program/directory using the following syntax:
btcfgchk
Sample output
The following sample output shows how btcfgchk displays configuration
information for a system with one network adapter and two IP addresses:
btcfgchk FQHN is hartford.brighttiger.com
El90x1 [PRIMARY]:
hartford.brighttiger.com 192.168.0.31
255.255.255.0
hartford.brighttiger.com
hartford1.brighttiger.com 192.168.0.32
255.255.255.0
hartford1.brighttiger.com
326 Chapter 14 ClusterCATS Utilities
Error Description
Host name does not map to The main host name for this system is not mapping to
a single IP address one IP address. Possible problems are:
• The main host name of the system could not be
resolved to any IP address.
Your fully qualified host name is the combination of the
host name and the domain name. Make sure no typos
appear in these names in your DNS definitions, both on
the DNS server and on each cluster member’s DNS
definition.
To verify that the host name is correct, enter nslookup
<FQHN> at a command-line prompt.
• The host name is a round-robin DNS name. Run the
ClusterCATS hostinfo utility to see if more than one
IP address is configured for the domain. For more
information on using hostinfo, see “Using hostinfo”
on page 328.
No adapter associated with btcfgchk is unable to find the primary network adapter.
host name found The primary network adapter should be the network
adapter containing the IP address of the main host name.
Duplicate Primary Adapter btcfgchk found two network adapters with the same IP
address. Use the ifconfig -a command to see
information about your adapter.
Name lookup for btcfgchk was not able to determine the IP address for
<hostname> failed the specified host. Your DNS server may be down. Use
nslookup to see if it can contact your DNS server.
<IP_address1> reverse btcfgchk did a lookup on <IP_address1> and found a
maps to <hostname> which host name to which it is mapped. It then attempted to
then forward maps to verify that this host name maps back to the IP address
<IP_address2> specified, and the verification failed.
There is likely an issue with your DNS configuration. Use
the ClusterCATS hostinfo utility to gather more
information on how the names and IP address are
configured. For more information on using hostinfo,
refer to “Using hostinfo” on page 328.
Using btcfgchk 327
Error Description
Error looking up ClusterCATS could not resolve the given host name to an
<hostname> by name IP address. Use nslookup to look up the host name in
DNS.
Host name a round-robin The host name maps to more than one IP address
name, or does not map to (round-robin DNS) or maps to an IP address not found
configured IP address on this machine. Use the ClusterCATS hostinfo utility to
check the host name DNS configuration:
hostinfo <hostname>
If you see more than one IP address listed, then
round-robin DNS is being used. If you see one IP
address, check to see if that address is configured on
this machine. You can use the ipconfig/all command
to view all IP addresses on this machine.
Host name not found in any For each IP address found on the system, an attempt
reverse mapping was made to find the corresponding host name. None of
Probable forward mapping the IP addresses on the system reverse mapped to the
misconfiguration for system’s main fully qualified host name. The problem is
<hostname> either:
• The host name maps to the wrong IP address.
• The IP address that the host name maps to does not
have an entry in the DNS table for the reverse map.
Consequently, nslookup does not return the
hostname.
Probable round robin The host name does not map to a single IP address. Use
configuration for the hostinfo tool to determine to which IP address it
<hostname> maps. For more information on using hostinfo, refer to
“Using hostinfo” on page 328.
328 Chapter 14 ClusterCATS Utilities
Using hostinfo
The hostinfo utility is a network management tool that displays information about a
specified domain name. Use it to analyze and troubleshoot problems you are having
with DNS mappings to a particular domain.
Syntax
Invoke hostinfo from the command line in the <CC_install_directory>/
program/directory using the following syntax:
hostinfo [fully_qualified_host_name]
Specifying a fully qualified host name is optional. If you do not specify one, then
hostinfo returns information about the current host.
Sample output
The following sample output from the hostinfo utility provides information about a
set of round-robin DNS host names.
>hostinfo allaire.com
Information for host ’allaire.com’:
FQHN: allaire.com
Primary Address: 0.0.0.0
Domain: .com
Aliases:
allaire.com
www1.allaire.com
www2.allaire.com
www3.allaire.com
Addresses:
205.181.25.81
205.181.25.82
205.181.25.83
The hostinfo utility displays the domain name, the primary IP address, and any IP
aliases. If the primary IP address is set to 0.0.0.0, the domain is using round-robin
DNS. The round robin names appear under the Alias section of the DNS table and
the round-robin addresses appear under the Addresses section.
Using sniff 329
Using sniff
The sniff utility is a network management tool that displays the packets that a
specific Network Interface Card (NIC) is hearing.
Syntax
Invoke sniff from the command line in the <CC_install_directory>/program
directory using the following syntax:
sniff
Sample output
Below is sample output from the sniff utility:
Mail Test Environment Variables:
BTMailHost, BTSender, BTRecipients, BTSubject, BTText
Packet Test Environment Variables:
BTPort, BTMcastTTL, BTUcastCount, BTBcastCount, BTMcastCount
BTSendInterval, BTDoLocalBind, BTUcastAddress, BTBcastAddress
BTMcastAddress, BTLocalAddress, BTSendSize, BTRecvSize
BTConsole, BTLogFile, BTSystem
Optimizing ClusterCATS
Contents
• ClusterCATS Dynamic IP Addressing (Windows only) ........................................ 334
• Using Server Failover............................................................................................... 340
• Configuring Load-Balancing Metrics .................................................................... 341
334 Chapter 15 Optimizing ClusterCATS
Note
All computer names associated with the ClusterCATS dynamic IP addresses must
have fully qualified host names (FQHNs) in DNS and DNS forward and reverse
entries.
Note
You must have at least two IP addresses available for a machine in order to use one
for a maintenance IP address.
This section shows you how to add a maintenance address that will support
ClusterCATS dynamic IP addressing. If your server has only one static address that
corresponds to both the computer name and the Web site, you must reconfigure it to
allow for a maintenance address.
Note
This procedure must be performed on each system in the cluster and must be done
before installing ClusterCATS.
336 Chapter 15 Optimizing ClusterCATS
6 Select the machine’s primary NIC in the Adapter field. Add the new IP address in
the IP Addresses region. You will use this address as the maintenance address and
machine address. Make a note of all IP addresses on the NIC.
7 Click OK and OK again and select the Identification tab. Click Change.
ClusterCATS Dynamic IP Addressing (Windows only) 337
8 Enter a new name for the computer in the Computer Name field. This name
corresponds to the new IP address that you just added. Do not change the
Domain field on this tab.
Note
The Computer Name on the Identification tab should only be a NetBIOS name,
not a fully-qualified host name (FQHN). For example, support1.allaire.com is a
possible FQHN. The first portion of this FQHN (support1) can be a NetBIOS
name. support1 would also appear as the host name under the DNS tab in
Protocols. The domain under the DNS tab in this case would be allaire.com.
The Domain field on the Identification tab is different; it has nothing to do with
DNS but only corresponds to your NT domain.
Note
Do not create any clusters at this time.
Note
Do not specify a maintenance address when adding cluster members. Since the
IP addresses for the cluster members are still bound to their NICs, there is no
need to do this. For more information about creating clusters, refer to “Creating
clusters with the Cluster Setup Wizard” on page 252.
7 Unbind the IP addresses from the Web server’s NIC by selecting each IP address
in the IP Addresses region and clicking Remove. This removes the IP addresses
corresponding to the Web Site.
8 Click OK three times.
9 Simultaneously reboot all the systems in the cluster. Note that you do not want to
eboot them one at a time or they will failover.
ClusterCATS assigns the IP addresses dynamically to your Web servers.
340 Chapter 15 Optimizing ClusterCATS
Overview of metrics
The ColdFusion server records the time each JSP page and servlet request takes to be
processed and can return metrics derived from this timing data upon request. These
metrics are:
• Average Request Time This metric reflects the average processing time of all
requests that fall within a one-minute moving window. The use of an average
smooths the affects of brief spikes in request volume and in a mixture of short-
and long-running requests.
• Last Request Time This metric reflects the time it took to process the last
request to the server. Because it is a single, undiluted snapshot of request time, it
will immediately reflect peaks and troughs in request processing time.
For these time-based metrics to be translated into a single load value for the Web
server, they must be weighed against a more subjective measure of server
performance—a maximum acceptable response time. This maximum reflects the
upper threshold of performance at which a server should be declared "busy" for
load-balancing purposes. Once a server reaches this critical busy threshold, the
ClusterCATS software will redirect further service requests away from the server until
it becomes more responsive to its clients.
A further enhancement in load-balancing options is provided by the ClusterCATS
software. A ClusterCATS agent process performs a probe of a special JSP page
—getsimpleload.jsp (every five seconds)—and records the round-trip time (RTT)
for each request. From this data, it computes its own average RTT over a one-minute
moving window.
This external view of request time accounts for the processing time of the JSP page
request itself, but, more importantly, for other system overhead involved in reaching
the Web server and receiving an acceptable response back again. By factoring in
external influences on Web server responsiveness—such as network load, scheduling
load, and disk I/O load—the ClusterCATS probe agent can adjust the load reported
by the ColdFusion engine to create a more realistic picture overall of the Web server's
performance for its clients.
For example, if the ColdFusion server is reporting a light load of requests, but the
probe agent is seeing significant round-trip times to and from the Web server, then it
will report a proportionally higher load for server and ColdFusion reported.
342 Chapter 15 Optimizing ClusterCATS
Load types
The probed JSP page is located at <CC_install_directory>/btauxdir/
getsimpleload.jsp. The probe agent responds to output generated by this page and
uses it to calculate the overall load based on the weighting of the two available
metrics set in the LOADTYPE variable:
• AVG_REQ_TIME
AVG_REQ_TIME calculates load based on the average service request time. The
load is derived by dividing the request time by the maximum acceptable request
time. This is the default metric.
• ROUND_TRIP_TIME
ROUND_TRIP_TIME calculates load based on the round trip time for the request.
This metric leaves all load calculation in the hands of the probe agent.
For servers that process database-intensive requests, ROUND_TRIP_TIME is not a good
indication of load because ColdFusion processes the threads that calculate
ROUND_TRIP_TIME differently than queued database connection requests. With this
in mind, if you have a Web server that uses many concurrent connections to a
database, either use AVG_REQ_TIME rather than ROUND_TRIP_TIME as your load type,
or include a database call in getsimpleload.jsp to make this load type’s results
more indicative of actual conditions.
Output variables
During processing, getsimpleload.jsp generates three significant output variables
that are sent in response to the probe agent's HTTP query. This section describes
these variables.
• CCLOADVALUE
CCLOADVALUE is the load calculated by getsimpleload.jsp using one of the
available load metrics. The load value identifies how busy the server is as a
percentage of its total capacity.
• CCLOADMAX
CCLOADMAX is the maximum acceptable time (in milliseconds) for a request to
complete and marks the "busy threshold" for this server. In other words, this is
the basis upon which a load percentage is calculated given the results of the
AVG_REQ_TIME metric. The default maximum is 8 seconds (8000 ms), but this
value is arbitrary and should be customized to fit the capacity and expectations
of a particular Web site.
CCLOADMAX is one of two variables that you would typically change in
getsimpleload.jsp to customize your server’s load metrics. If you increase the
value of CCLOADMAX, then the server can take longer for each request (on average)
before the server is declared busy. If you decrease CCLOADMAX, then the server's
average request must be shorter before the server is declared busy.
Configuring Load-Balancing Metrics 343
• CCRTTPercent
CCRTTPercent represents the percentage of the calculated average
ROUND_TRIP_TIME that the probe agent should apply to the load metric supplied
by CCLOADVALUE.
CCRTTPercent is the second variable that you might change in
getsimpleload.jsp to customize your server’s load metrics. It acts as a tuning
knob to determine how much external influence on server performance should
be calculated into the server's overall load value.
For example, increase CCRTTPercent to apply a greater weighting to the
ROUND_TRIP_TIME metric in the overall load calculations. The default value of
CCRTTPercent is 0 (disabled). If you change the load type to ROUND_TRIP_TIME,
then the default value of CCRTTPercent is 100, which gives ROUND_TRIP_TIME the
maximum weighting.
Sybase client software 9 Verity rcvdk utility, viewing results -noproxy 160
syntax, mkvdk 186 of 203 -proxy 161
System and services files 16 Verity Spider -proxyauth 161
systems monitoring for DNS lookups 147 -retry 161
failover 238 flow control 147 -timeout 161
multithreading 147 Verity Spider paths & URL options
T overview 146 -auth 163
technical support performance 146 -cgiok 163
e-mail support 299 proxy handling 147 -domain 163
testing Web site load 232 restart capability 146 -followdup 164
text databases state maintenance via persistent -followsymlink 164
connecting 35 store 146 -host 164
third-party load balancing Web standards support 146 -https 164
devices 294 Verity Spider content options -jumps 164
using in UNIX 295 -casesen 168 -nodocrobo 165
using in Windows 294 -exclude 168 -nofollow 165
thresholds 268 -include 168 -norobo 165
gradual redirection 268 -indexclude 169 -pathlen 166
training. See Allaire -indinclude 170 -refreshtime 166
troubleshooting -indmimeexclude 171 -reparse 167
e-mail support 299 -indmimeinclude 171 -unlimited 167
load-balancing metrics 343 -indskip 172 -virtualhost 167
using sniff 329 -maxdocsize 172 Verity Spider processing options
troubleshooting DNS -metafile 173 -abspath 153
troubleshooting with -mimeexclude 173 -detectdupfile 153
btcfgchk 325 -mimeinclude 174 -indexers 153
with hostinfo 328 -mindocsize 174 -license 153
-skip 174 -maxindmem 153
U Verity Spider core options -maxnumdoc 154
Unrestricted mode 308 -cmdfile 151 -mimemap 154
Unsecured tags directory 77 -collection 151 -nocache 154
updating cluster members 317 -help 151 -nodupdetect 154
upgrading servers 313 -jobpath 152 -noindex 155
usage error codes 213 -style 152 -nosubmit 155
User directories 81 Verity Spider locale options -persist 155
identifying 92 -charmap 176 -preferred 156
LDAP 92 -common 176 -prefixmap 156
NT domains 92 -datefmt 176 -processbif 157
ODBC 92 -language 176 -regexp 157
User directories, identifying 92 -locale 176 -submitsize 158
User security -msgdb 176 -temp 158
components 99 Verity Spider logging options Verity Spider setting MIME types
implementing 99 -loglevel 178 indexing unknown MIME
runtime 99 Verity Spider maintenance options types 182
Using bulk insert and delete 194 -nooptimize 180 known MIME types for file
utilities, overview of Verity 200 -purge 180 system indexing 183
-repair 180 MIME types and file system
V Verity Spider networking options indexing 182
VDK error messages 213 -agentname 159 MIME types and Web
Verity browse utility, using 209 -connections 159 crawling 181
Verity didump utility, using 206 -delay 159 multiple parameter values 181
Verity error codes, warnings 217 -header 159 syntax restrictions 181
Verity merge utility, using 211 -hostcache 160 using the wildcard character
Verity rcvdk utility, using 201 -noflowctrl 160 (*) 181
Index 353
W
warnings, Verity error codes 217
Web applications
database locking
mechanisms 226
load testing 231
managing state 225
scalability bottlenecks 227
Web Explorer
Apache considerations 249
configuring com port on Web
server 248
limitations 248
Netscape considerations 248
opening 249
Web server failover
alarm notification 296
Web servers
configuring com port via Web
Explorer 248
determining
responsiveness 341
DNS concerns 228
stopping and starting 325
Web site availability & reliability
defined 234
example 236
failover considerations 237
Web site scalability
defined 222
354 Index