Excite, Inc. Excite for Web Servers Help

Preferences and Customization

Introduction

There are a number of different customizations one can make to the search/retrieval behavior of Excite for Web Servers. These customizations affect such things as how the query results look, whether certain features are enabled, and (if enabled) how those features behave. All customizations affect the query-results and summarization pages.

Perhaps the most important thing to understand about these customizations is that they are of two classes:

For a better idea of the possible customizations available (and which above-mentioned category they fall into), here's a complete breakdown:

The bold-face notes in the outline above indicate perl variables or subroutines in a particular script which must be changed in order for you to effect the customization -- that's right, until we have time to get a forms-based interface on these preferences, you'll have to do a little hacking to make them work to your liking.

A description of how to do all this customization follows directly.

afeatures.pl

This file is located in the perllib directory (a subdirectory of the one in which you installed this software), and it is the file you'll have to modify in order to make your customizations. By changing the values of certain variables and modifying the return values of certain subroutines in this file, you can customize Excite for Web Servers's behavior to suit your needs. Keep reading...

Generation-Time Options

The options listed in this section affect the generation of query-results pages.

Remember that any query-results pages generated before modifications to these preferences will not be affected by the modifications. If you wish for these preferences to affect old query-results pages as well, you must regenerate those pages.

$show_legend

There are three options for showing the legend:

$subject_group_mode

The value of $subject_group_mode determines whether a special graphic will appear at the top of the query-results page allowing one to group the results by subject as well as by confidence. Options:

Query-Time Options

The options listed in this section affect the display of all query-results pages (and summary results), regardless of generation time.

$graphic_relevance_mode

In addition to numeric scores, Excite for Web Servers uses either a color-coded graphic or a '+'/'-' character which indicates the relevance of a particular document to a query (and also serves as the query-by-example link, if $query_by_example_mode == 1). The value of the variable $graphic_relevance_mode determines whether the graphic or the character is used:

$query_by_example_mode

When Query By Example is enabled, the relevance indicator -- either a black/red graphic or a '+'/'-' character -- is also a query-by-example link. By clicking this link, one can submit an entire document as a query -- "give me other documents like this one." The value of variable $query_by_example_mode determines whether or not Query By Example is enabled:

$inline_summaries

The default value of $inline_summaries is normally set to 1, 'On', in which case document summaries will appear directly below the document's title in the results list instead of a separate '(summary)' link, regardless of the $summary_mode variable setting, see below. Turning this variable on will also add a Summary Mode option to the collection configuration forms interface to allow the user to specify fast (first two lines of a document) or a more slower computed quality summary for the documents in a collection. Refer to the Summary Mode section in the Using The Forms-Based Administration Tools documentation for further information.

$summary_mode

The value of the variable $summary_mode determines whether or not Automatic Summarization is enabled:

If Automatic Summarization is enabled and the variable $inline_summaries is set to 0, 'Off', then the text '(summary)' is displayed to the right of each document title in a results list. By clicking this link, one can request a short summarization of the document. If both of the variables $inline_summaries and $summary_mode are set to 0, 'Off', then neither the '(summary)' link nor inline document summaries will appear on the results list.

$summary_link_mode

This variable is normally 'on'. It determines whether or not a link to the original document is available on the summary page.

$number_of_summary_sentences

If Automatic Summarization is enabled, then one can specify the maximum number of sentences which will be used to create summaries by setting the value of:

$number_of_summary_sentences

The default is 5 sentences. If Automatic Summarization is disabled, this variable has no effect.

$maximum_summary_length

If Automatic Summarization is enabled, then one can specify the maximum number of characters to be used in the creation of summaries. This limit takes precendence over:

$number_of_summary_sentences

reducing the number of sentences displayed if necessary.

To set a maximum summary length, set the value of:

$maximum_summary_length

If this variable is unset -- that is, commented out (by preceding it with a '#') --, then no maximum limit is applied. (This is the default.)

$number_of_subject_groups

If Automatic Subject Grouping is enabled, then one can specify the maximum number of groups into which a set of query results will be divided by setting the value of:

$number_of_subject_groups

Default: $number_of_subject_groups = 6;

(Note: Logically, this number should be much less than the number of documents returned from a query. Setting it higher than the number of returned documents will produce the same behavior as setting it equal to the number of returned documents.)

$show_additional_docs_in_grouping

Automatic subject grouping works best when it has a large number of documents to be putting in groups, so by default, additional documents from the ones originally displayed in "Grouped by Confidence" mode are brought in for the groupings. However, some people find this confusing. With $show_additional_docs_in_grouping, you can control whether this happens:

$max_docs_to_return

The value of $max_docs_to_return determines the upper limit on the amount of documents that are returned by a query. By default this variable is set to 20.

$log_searches

Normally, this variable is 'off'. However, if you set it to a non-zero value, every search done on EWS will be logged to a file in the install directory called query.log. If you wish to change the name of the log file, you can do this on a case by case basis by editing the generated CGI script. Changing the last argument in the call to:

&ArchitextQuery'directQuery()
will change the file that queries are logged to.

$maximum_query_time

You may wish to limit the amount of time, in seconds, that a search can last. This is defaulted to 60 seconds normally (which should always be more than sufficient), but if you want to adjust it, just change the value of this varaible.

$stem_by_default

This variable will affect how an index is generated. The default for this variable is set to 1, which causes only the roots of terms to be included in an index. Thus the keyword "smiles" would be indexed as "smile" and so on. Performing the query "smile" on a stemmed index would return the documents containing the term "smiles" as well as those containing "smile".

$index_html_comments

This variable will also affect how an index is generated. The default for this variable is commented out, effectively turned off. If it is un-commented, this will cause the text occuring between html comment tags to be included in an index. Note that this variable does not in any way affect the summarization of a document. The summarization algorithm may still use html comments in a document's summary.

Result List Customization

By default, Excite for Web Servers displays each document's score and title in the results lists for regular queries, providing a link to the document itself. We think it's a pretty good thing to do -- we made it the default behavior, right? --, but maybe you'd like something else. Perhaps you'd prefer to display its first three lines. Or maybe you'd prefer to have the link be one which invokes a CGI script to format the document in a special way. Both those things, and more, are possible. With a little perl programming, you may customize results lists -- both regular and group-by-subject -- to your liking.

That perl programming involves changing subroutines which determine the display of the results. Since there are two different types of results -- those for regular queries, and those for group-by-subject queries -- there is a subroutine for each. If "activated", the appropriate subroutine is called for each document in the list, and it specifies what should appear on the line for that document.

$customize_result_list
In order to activate the subroutines which produce the results-list lines, set the variable $customize_result_list to 1. The subroutines are described below.

customize_result_list_line
The subroutine customize_result_list_line is for specifying the format of lines on regular query-results pages. By default, this subroutine appears as follows:

  sub customize_result_list_line {
      local($collection_name, $file, $doc_root, $relevance_qbe, $score,
            $title, $summary, $original_line) = @_;
      return "$original_line";
  }
The variables $collection_name, $file, $doc_root, $relevance_qbe, $score, $title, $summary, and $original_line are the actual arguments to this subroutine. Here is a brief description of each:

The subroutine -- really a function -- simply returns the string to be used as the result line: in this case just the $original_line (that is, the default output).

You can use the information provided to you to create whatever string you like, then return that value for the result.

customize_grouping_line

The subroutine customize_grouping_line is for specifying the format of lines on group-by-subject results pages. The default format of the lines displayed on group-by-subject results pages is different from that of the lines on regular results pages. (In particular, relevance scores are not displayed.) For this reason, and because it's nice to have the added flexibility which doing so provides, we have a different subroutine for cutomizing subject-grouped lines than for regular query-results lines.

By default, this subroutine is defined as follows:

  sub customize_grouping_line {
      local($collection_name, $file, $doc_root, $relevance_qbe,  
            $title, $summary, $original_line) = @_;
      return "$original_line";
  }
All of these arguments have the same values and meanings as those described above in the customize_result_list_line function. The only difference between the two routines is the missing $score argument, not needed (since scores are not displayed in the subject grouping output).

Command Line Applications

Documentation for accessing the functionality offered by the forms from the command line is also available.