|
Query
Language
You can search for any
word or phrase on a Web site by typing the word or
phrase into a query form and clicking the button to
execute the query (for example, the Execute Query
button on the sample query form). This section covers
the following topics:
Searches produce a list
of files that contain the word or phrase no matter
where they appear in the text. This list gives the
rules for formulating queries:
- Consecutive words
are treated as a phrase; they must appear in the
same order within a matching document.
- Queries are
case-insensitive, so you can type your query in
uppercase or lowercase.
- You can search for
any word except for those in the exception list
(for English, this includes a, an,
and, as, and other common
words), which are ignored during a search.
- Words in the
exception list are treated as placeholders in
phrase and proximity queries. For example, if you
searched for “Word for Windows”, the results
could give you “Word for Windows” and “Word
and Windows”, because for is a noise
word and appears in the exception list.
- Punctuation marks
such as the period (.), colon (:), semicolon (;),
and comma (,) are ignored during a search.
- To use specially
treated characters such as &, |, ^, #, @, $,
(, ), in a query, enclose your query in quotation
marks (“).
- To search for a word
or phrase containing quotation marks, enclose the
entire phrase in quotation marks and then double
the quotation marks around the word or words you
want to surround with quotes. For example,
“World-Wide Web or ““Web””” searches
for World-Wide Web or “Web”.
- You can insert Boolean
operators (AND, OR,
and NOT) and the proximity
operator (NEAR) to specify
additional search information.
- The wildcard
character (*) can match words with a given
prefix. The query esc* matches the terms
“ESC,” “escape,” and so on.
- Free-text
queries can
be specified without regard to query syntax.
- Vector
space queries
can be specified.
- ActiveX™ (OLE) and
file attribute property
value queries can be issued.
Boolean and proximity
operators can create a more precise query.
| To
Search For |
Example |
Results |
| Both
terms in the same page |
access
and basic
—Or—
access & basic |
Pages
with both the words “access” and “basic” |
| Either
term in a page |
cgi
or isapi
—Or—
cgi | isapi |
Pages
with the words “cgi” or “isapi” |
| The
first term without the second term |
access
and not basic
—Or—
access & ! basic |
Pages
with the word “access” but not “basic” |
| Pages
not matching a property value |
not
@size = 100
—Or—
! @size = 100 |
Pages
that are not 100 bytes |
| Both
terms in the same page, close together |
excel
near project
—Or—
excel ~ project |
Pages
with the word “excel” near the word
“project” |
Hints:
- You can add
parentheses to nest expressions within a query.
The expressions in parentheses are evaluated
before the rest of the query.
- Use double quotes
(“) to indicate that a Boolean or NEAR
operator keyword should be ignored in your query.
For example, “Abbott and Costello” will match
pages with the phrase, not pages that match the
Boolean expression. In addition to being an
operator, the word and is a noise word in
English.
- The NEAR
operator is similar to the AND
operator in that NEAR returns a
match if both words being searched for are in the
same page. However, the NEAR
operator differs from AND because
the rank assigned by NEAR depends
on the proximity of words. That is, the rank of a
page with the searched-for words closer together
is greater than or equal to the rank of a page
where the words are farther apart. If the
searched-for words are more than 50 words apart,
they are not considered near enough, and the page
is assigned a rank of zero.
- The NOT
operator can be used only after an AND
operator in content queries; it can be used only
to exclude pages that match a previous content
restriction. For property value queries, the NOT
operator can be used apart from the AND
operator.
- The AND
operator has a higher precedence than OR.
For example, the first three queries are equal,
but the fourth is not:a AND b OR c
c OR a AND b
c OR (a AND b)
(c OR a) AND b
Note The
symbols (&, |, !, ~) and the English keywords AND,
OR, NOT, and NEAR
work the same way in all languages supported by Index
Server. Localized keywords are also available when the
browser locale is set to one of the following six
languages:
| Language |
Keywords |
| German |
UND,
ODER, NICHT, NAH |
| French |
ET,
OU, SANS, PRES |
| Spanish |
Y,
O, NO, CERCA |
| Dutch |
EN,
OF, NIET, NABIJ |
| Swedish |
OCH,
ELLER, INTE, NÄRA |
| Italian |
E,
O, NO, VICINO |
Note The
NEAR operator can be applied only to words or phrases.
Wildcard
operators help you find pages containing words similar
to a given word.
The
query engine finds pages that best match the words and
phrases in a free-text query. This is done by
automatically finding pages that match the meaning,
not the exact wording, of the query. Boolean,
proximity, and wildcard operators are ignored within a
free-text query. Free-text queries are prefixed with
$contents.
The query engine
supports vector space queries. Vector queries return
pages that match a list of words and phrases. The rank
of each page indicates how well the page matched the
query.
| To
Search For |
Example |
Results |
| Pages
that contain specific words |
light,
bulb |
Files
with words that best match the words being
searched for |
| Pages
that contain weighted prefixes, words, and
phrases |
invent*,
light[50], bulb[10], "light bulb"[400] |
Files
that contain words prefixed by “invent,” the
words “light,” “bulb,” and the phrase
“light bulb” (the terms are weighted) |
- Components in vector
queries are separated by commas.
- Components in vector
queries can be weighted by using the [weight]
syntax.
- Pages returned by
vector queries do not necessarily match every term
in the query.
- Vector queries work
best when the results are sorted by rank.
With property value
queries, you can find files that have property values
that match a given criteria. The properties over which
you can query include basic file information like file
name and file size, and ActiveX properties including
the document summary (information) that is stored in
files created by ActiveX-aware applications.
There are two types of
property queries:
- Relational
property queries
consist of an “at” character (@), a property
name, a relational
operator, and a property
value. For example, to find all of the files
larger than one million bytes, issue the query
@size > 1000000.
- Regular
expression property queries
consist of a number sign (#), a property name, and
a regular expression
for the property value. For example, to find to
find all of the video (.avi) files, issue the
query #filename *.avi. Regular expressions will
never match the special properties contents
(#contents) and all (#all). Properties that are
not retrievable at query time cannot be used in #
queries. these include HTML META properties not
stored in the property cache.
This section covers the
following topics:
Property names are
preceded by either the “at” (@) or number sign (#)
character. Use @ for relational queries, and # for
regular expression queries.
If no property name is
specified, @contents is assumed.
Properties available
for all files include:
| Property
Name |
Description |
| All |
Matches
words, phrases, and any property |
| Contents |
Words
and phrases in the file |
| Filename |
Name
of the file |
| Size |
File
size |
| Write |
Last
time the file was modified |
ActiveX property values
can also be used in queries. Web sites with files
created by most ActiveX-aware applications can be
queried for these properties:
| Property
Name |
Description |
| DocTitle |
Title
of the document |
| DocSubject |
Subject
of the document |
| DocAuthor |
The
document’s author |
| DocKeywords |
Keywords
for the document |
| DocComments |
Comments
about the document |
For a complete list of
property names, see the List
of Property Names later on this page.
Relational operators
are used in relational property queries.
| To
Search For |
Example |
Results |
| Property
values in relation to a fixed value |
@size
< 100
@size <= 100
@size = 100
@size != 100
@size >= 100
@size > 100 |
Files
whose size matches the query |
| Property
values with all of a set of bits on |
@attrib
^a 0x820 |
Compressed
files with the archive bit on |
| Property
values with some of a set of bits on |
@attrib
^s 0x20 |
Files
with the archive bit on |
| To
Search For |
Example |
Results |
| A
specific value |
@DocAuthor
= Bill Barnes |
Files
authored by “Bill Barnes” |
| Values
beginning with a prefix |
#DocAuthor
George* |
Files
whose author property begins with “George” |
| Files
with any of a set of extensions |
#filename
*.|(exe|,dll|,sys|) |
Files
with .exe, .dll, or .sys extensions |
| Files
modified after a certain date |
@write
> 96/2/14 10:00:00 |
Files
modified after February 14, 1996 at 10:00 GMT |
| Files
modified after a relative date |
@write
> -1d2h |
Files
modified in the last 26 hours |
| Vectors
matching a vector |
@vectorprop
= { 10, 15, 20 } |
ActiveX
documents with a vectorprop value of { 10, 15,
20 } |
| Vectors
where each value matches a criteria |
@vectorprop
>^a 15 |
ActiveX
documents with a vectorprop value in which all
values in the vector are greater than 15 |
| Vectors
where at least one value matches a criteria |
@vectorprop
=^s 15 |
ActiveX
documents with a vectorprop value in which at
least one value is 15 |
- Be sure to use the
pound (#) character before the property name when
using a regular expression in a property value,
and an “at” (@) character otherwise. The equal
(=) relational operator is assumed for
regular-expression queries.
- File name
(#filename) is the only property that efficiently
supports regular expressions with wildcards to the
left of text.
- Date and time values
are of the form yyyy/mm/dd hh:mm:ss or yyyy-mm-dd
hh:mm:ss. The first two characters of the
year and the entire time can be omitted. If you
omit the first two characters of the year, then 29
or less is interpreted as the year 2000, and 30 or
greater is interpreted as the year 1900. All dates
and times are in Greenwich Mean Time (GMT).
- Dates and times
relative to the current time can be expressed with
a minus (-) character followed by zero or by more
integer unit and time unit pairs. Time units are
expressed as: (y) for years, (m) for months, (w)
for weeks, (d) for days, (h) for hours, (n) for
minutes, and (s) for seconds. A three-digit
millisecond value can be optionally specified
after the seconds value in date expressions. For
example, 1997/12/8 10:10:03:452
- Currency values are
of the form x.y, where x is the
whole value amount and y is the
fractional amount. There is no assumption about
units.
- Boolean values are
(t) or (true) for TRUE and (f) or
(false) for FALSE.
- Vectors (VT_VECTOR)
are expressed as an opening brace ({), followed by
a comma-separated list of values, then a closing
brace (}).
- Single-value
expressions that are compared against vectors are
expressed as a relational
operator, then a (^a) for all of or a
(^s) for some of.
- Numeric values can
be in decimal or hexadecimal (preceded by 0x).
- The contents
property does not support relational operators. If
a relational operator is specified, no results
will be found. For example, @contents Microsoft
will find documents containing Microsoft, but
@contents=Microsoft will find
none.
Regular expressions in
property queries are defined as follows:
- Any character except
asterisk (*), period (.), question mark (?), and
vertical bar (|) defaults to matching just itself.
- Regular expressions
can be enclosed in matching quotes (“), and must
be enclosed in quotes if they contain a space ( )
or closing parenthesis ()).
- The characters *, .,
and ? behave as they behave in Windows; they match
any number of characters, match (.) or end of
string, and match any one character, respectively.
- The character | is
an escape character. After |, the following
characters have special meaning:
( opens a group. Must
be followed by a matching ).
) closes a group.
Must be preceded by a matching (.
[ opens a character
class. Must be followed by a matching (un-escaped)
].
{ opens a counted
match. Must be followed by a matching }.
} closes a counted
match. Must be preceded by a matching {.
, separates OR
clauses.
* matches zero or
more occurrences of the preceding expression.
? matches zero or one
occurrences of the preceding expression.
+ matches one or more
occurrences of the preceding expression.
Anything else,
including |, matches itself.
- Between square
brackets ([]) the following characters have
special meaning:
^ matches everything
but following classes. Must be the first character.
] matches ]. May only
be preceded by ^, otherwise it closes the class.
- range operator.
Preceded and followed by normal characters.
Anything else matches
itself (or begins or ends a range at itself).
- Between curly braces
({}) the following syntax applies:
|{m|} matches exactly
m occurrences of the preceding expression.
(0 < m < 256).
|{m,|} matches at
least m occurrences of the preceding
expression. (1 < m < 256).
|{m,n|} matches
between m and n occurrences of the
preceding expression, inclusive. (0 < m < 256,
0 < n < 256).
- To match *, ., and
?, enclose them in brackets (for example,
|[*]sample will match “*sample”).
| Example |
Results |
@size
> 1000000 |
Pages
larger than one million bytes |
@write
> 95/12/23 |
Pages
modified after the date |
Apple
tree |
Pages
with the phrase “apple tree” |
"apple
tree" |
Same
as above |
@contents
apple tree |
Same
as above |
Microsoft
and @size > 1000000 |
Pages
with the word “Microsoft” that are larger
than one million bytes |
"microsoft
and @size > 1000000" |
Pages
with the phrase specified (not the same as
above) |
#filename
*.avi |
Video
files (the # prefix is used because the query
contains a regular expression) |
@attrib
^s 32 |
Pages
with the archive attribute bit on |
@docauthor
= John Smith |
Pages
with the given author |
$contents
why is the sky blue? |
Pages
that match the query |
@size
< 100 & #filename *.gif |
Graphics
Interchange Format (GIF) files less than 100
bytes in size |
These properties are
always available for queries. Additional properties
may also be available depending on the configuration
of the Web server.
| Friendly
Name |
Datatype |
Property |
| A_HRef |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML HREF. This property name was created for
Microsoft® Site Server and corresponds with the
Index Server property name HtmlHRef. Can be
queried but not retrieved. |
| Access |
VT_FILETIME |
Last
time file was accessed. |
| All |
(not
applicable) |
Searches
every property for a string. Can be queried
but not retrieved. |
| AllocSize |
DBTYPE_I8 |
Size
of disk allocation for file. |
| Attrib |
DBTYPE_UI4 |
File
attributes. Documented in Win32 SDK. |
| ClassId |
DBTYPE_GUID |
Class
ID of object, for example, WordPerfect, Word,
and so on. |
| Characterization |
DBTYPE_WSTR
| DBTYPE_BYREF |
Characterization,
or abstract, of document. Computed by Index
Server. |
| Contents |
(not
applicable) |
Main
contents of file. Can be queried but not
retrieved. |
| Create |
VT_FILETIME |
Time
file was created. |
| Directory |
DBTYPE_WSTR
| DBTYPE_BYREF |
Physical
path to the file, not including the file name. |
| DocAppName |
DBTYPE_WSTR
| DBTYPE_BYREF |
Name
of application that created the file. |
| DocAuthor |
DBTYPE_WSTR
| DBTYPE_BYREF |
Author
of document. |
| DocByteCount |
DBTYPE_14 |
Number of bytes in
a document. |
| DocCategory |
DBTYPE_STR |
DBTYPE_BYREF |
Type of document
such as a memo, schedule, or whitepaper. |
| DocCharCount |
DBTYPE_I4 |
Number
of characters in document. |
| DocComments |
DBTYPE_WSTR
| DBTYPE_BYREF |
Comments
about document. |
| DocCompany |
DBTYPE_STR |
DBTYPE_BYREF |
Name of the
company for which the document was written. |
| DocCreatedTm |
VT_FILETIME |
Time
document was created. |
| DocEditTime |
VT_FILETIME |
Total
time spent editing document. |
| DocHiddenCount |
DBTYPE_14 |
Number of hidden
slides in a Microsoft® PowerPoint document. |
| DocKeywords |
DBTYPE_WSTR
| DBTYPE_BYREF |
Document
keywords. |
| DocLastAuthor |
DBTYPE_WSTR
| DBTYPE_BYREF |
Most
recent user who edited document. |
| DocLastPrinted |
VT_FILETIME |
Time
document was last printed. |
| DocLastSavedTm |
VT_FILETIME |
Time
document was last saved. |
| DocLineCount |
DBTYPE_14 |
Number of lines
contained in a document. |
| DocManager |
DBTYPE_STR |
DBTYPE_BYREF |
Name of the
manager of the document’s author. |
| DocNoteCount |
DBTYPE_14 |
Number of pages
with notes in a PowerPoint document. |
| DocPageCount |
DBTYPE_I4 |
Number
of pages in document. |
| DocParaCount |
DBTYPE_14 |
Number of
paragraphs in a document. |
| DocPartTitles |
DBTYPE_STR |
DBTYPE_VECTOR |
Names of document
parts. For example, in Excel part titles are the
names of spread sheets, in PowerPoint slide
titles, and in Word for Windows the names of the
documents in the master document. |
| DocPresentationTarget |
DBTYPE_STR|DBTYPE_BYREF |
Target format
(35mm, printer, video, and so on) for a
presentation in PowerPoint. |
| DocRevNumber |
DBTYPE_WSTR
| DBTYPE_BYREF |
Current
version number of document. |
| DocSlideCount |
DBTYPE_14 |
Number of slides
in a PowerPoint document. |
| DocSubject |
DBTYPE_WSTR
| DBTYPE_BYREF |
Subject
of document. |
| DocTemplate |
DBTYPE_WSTR
| DBTYPE_BYREF |
Name
of template for document. |
| DocTitle |
DBTYPE_WSTR
| DBTYPE_BYREF |
Title
of document. |
| DocWordCount |
DBTYPE_I4 |
Number
of words in document. |
| FileIndex |
DBTYPE_I8 |
Unique
ID of file. |
| FileName |
DBTYPE_WSTR
| DBTYPE_BYREF |
Name
of file. |
| HitCount |
DBTYPE_I4 |
Number
of hits (words matching query) in file. |
| HtmlHRef |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML HREF. Can be queried but not
retrieved. |
| HtmlHeading1 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H1. Can be queried
but not retrieved. |
| HtmlHeading2 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H2. Can be queried
but not retrieved. |
| HtmlHeading3 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H3. Can be queried
but not retrieved. |
| HtmlHeading4 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H4. Can be queried
but not retrieved. |
| HtmlHeading5 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H5. Can be queried
but not retrieved. |
| HtmlHeading6 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H6. Can be queried
but not retrieved. |
| Img_Alt |
DBTYPE_WSTR
| DBTYPE_BYREF |
Alternate
text for <IMG> tags. Can be queried
but not retrieved. |
| Path |
DBTYPE_WSTR
| DBTYPE_BYREF |
Full
physical path to file, including file name. |
| Rank |
DBTYPE_I4 |
Rank
of row. Ranges from 0 to 1000. Larger numbers
indicate better matches. |
| RankVector |
DBTYPE_I4
| DBTYPE_VECTOR |
Ranks
of individual components of a vector
query. |
| ShortFileName |
DBTYPE_WSTR
| DBTYPE_BYREF |
Short
(8.3) file name. |
| Size |
DBTYPE_I8 |
Size
of file, in bytes. |
| USN |
DBTYPE_I8 |
Update
Sequence Number. NTFS drives only. |
| VPath |
DBTYPE_WSTR
| DBTYPE_BYREF |
Full
virtual path to file, including file name. If
more than one possible path, then the best match
for the specific query is chosen. |
| WorkId |
DBTYPE_I4 |
Internal
ID for file. Used within Index Server. |
| Write |
VT_FILETIME |
Last
time file was written. |
To define properties
that are not in the previous list, you must list them
in a [Names] section in the .idq file. To use these
properties in a restriction, sort specification, or as
a retrieved column, you have define them in the .idq
file, using the following format:
[Names]
#Properties that are not in the standard list
Propertyname ( Datatype ) = GUID
["Name" | propid]
In the syntax, "Name"
is the property name ("Sales"
in the following example), and propid is the
property ID in hexadecimal. Note that you need to
surround the friendly name with quotation marks, but
the property ID does not take quotation marks.
For example, suppose
you want to define an HTML meta tag as a property name
that somebody can search for. The property you want to
define is Sales.
To define the Sales
property
- In the .idq file,
under the [Names] section, add the following line.
MetaDescription(DBTYPE_WSTR)
= d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1
"Sales"
The GUID number comes
from the MetaTagClsid parameter in
the registry, at the following location:
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Control
\HtmlFilter
\MetaTagClsid
- Then, in the HTML
files where you want the tag to appear, define the
meta description.
For example, say you
want to search for all files that give sales
projections for the future:
In File1.htm:
<META
NAME="Sales" CONTENT="Projections for
1998">
In File2.htm:
<META
NAME="Sales" CONTENT="Projections for
1999">
In File3.htm:
<META
NAME="Sales" CONTENT="Sales in
1997">
Note Be
sure to add your META NAME tags between the
<head> and </head> HTML tags at the
beginning of the file.
You can now search for
all files that show sales projections. Send the
following query:
@metadescription
projections
This query returns all
the files with the word projections in the
CONTENT field of the meta tag. In this example,
File1.htm and File2.htm are returned.
But suppose you want to
search for sales by year, for example a list of sales
in 1997. Send the following query:
@metadescription 1997
File3.htm is returned.
©
1997 by Microsoft Corporation. All rights reserved. |