Page

10.5.1- RegExp Properties

  by NT Community Manager.
Last Updated  by NT Community Manager.  

PublicCategorized as 10. The Scripting Objects.

Not tagged.
<< 10.5.0- The RegExp ObjectChapter1010.5.2- RegExp Methods >>

RegExp Properties

The RegExp object has the following three properties:

 

  • Global: used to indicate whether every occurrence of the search string should be matched, or just the first one VBScript comes across
  • IgnoreCase: used to indicate whether or not the case should be taken into account when trying to match a search string
  • Pattern: used to set (or return) the sequence being searched for

 

The first two properties mentioned both taken Boolean values and so they can only be set to true or false. Consider the following:

 

objRegExp.Pattern = "Hello Hello"

objRegExp.IgnoreCase = False

objRegExp.Global = True

In this example we are searching for the pattern "Hello Hello". Because we have set Global to True, we are saying that we want every occurrence of this sequence that is matched to be returned. By setting IgnoreCase to false then it will take account of the case of the text to be matched. So for example "hello hello" or "Hello hello" would both be ignored by the search.

The Pattern Property

However what gives RegExp its power is the fact that you don't have to know exactly what you're looking for. The Pattern property can take the form of a regular expression - a sequence of special characters and wildcards which specify which parts of the string are required for a match, and which parts we aren't interested in.

 

The regular expression is compared to the text we want to search character by character. If the RegExp object finds a match for the first character, it then looks to see if the following character matches the next character in the regular expression, and if so, it checks the next, and so on. The clever part is that we can tell the RegExp object to match one of a specific set of letters, or any letter, or a sequence of repeated letters. We do this using wildcards and special characters, and we'll take a look at those most commonly used now:

 

Character

Purpose

*

Matches zero or more occurrences of the preceding character. For example "be*" would match "b", "be", "bee", "beee" and so on.

?

Matches the preceding character zero or one times. For example "bottles?" would match both "bottle" and "bottles".

+

Matches preceding character one or more times. For example "to+" would match "to" and "too", but not t on its own

.

Matches any character. For example "ba." would match "bat", "bag" and "ban". It won't match anything that contains more or less characters.

(x|y)

Matches the x pattern OR the y pattern. "(bat|hit)" would match both "bat" and "hit", but nothing else.

{x}

Matches the preceding character exactly x times (obviously it can't be negative). So "e{3}" is the equivalent of the expression "eee".

[xyz]

Will match any of the enclosed characters, so "b[aeu]d" would match "bad", "bed" or "bud".

[^xyz]

Matches any of the character NOT contained in the set, so "[^sb]et" would match "get", but wouldn't match "set" or "bet".

[a-z]

Matches any character within the range specified. "[a-z]{3}", for example, will match any three letter word.

 

Character

Purpose

^

Matches the start of a line, for example "^a" would match any line beginning with the letter "a".

$

Matches the end of a line, for example "e$" would match any line ending with the letter "e".

\

This is used to allow us to match characters which otherwise have a special meaning in regular expressions, or which we can't type. for example, "\*" is used to match an asterisk character, and "\t" matches a tab. "\\" is used to match a backslash character.

 

Let's take a string and look at whether some regular expressions will return a match.

 

The quick brown fox jumps over the lazy dog

 

First, remember that unless we specify the character "^" or "$", we'll get a match for our expression if it occurs anywhere within the string. So, the following regular expressions will all match:

 

quick

jumps

ver th

row

 

^The will match, since this word occurs at the start of the string, but ^lazy won't match.

 

Now, let's try something a bit more complicated.

 

The expression [a-z]+ can be used to match any word consisting of more than one letter (it matches one or more occurrences of a character between a and z). We could use that as follows:

 

The quick [a-z]+ fox jumps over the [a-z]+ dog

 

This will match our string, but it will also match different strings with different colored foxes and differently tempered dogs. Why? The RegExp object comes to the b in brown, and sees it is allowed, because it's in the range a to z. Likewise the r, the o, the w, and the n. When it comes across the next character, a space, it sees it is no longer allowed by the [a-z] expression, but that the next character is a space, so it carries on matching the literal characters we specified.

Here are some useful regular expression shorthands you might use:

 

Sequence

Purpose

.*

Matches zero or more characters. That's any character – including spaces, tabs and punctuation

[a-z]{x}

Matches an x letter word

[0-9]+

Matches a number consisting of one or more digits

[a-z]+@[a-z]+\.com

Matches any .com email address (assuming it doesn't contain any non alphabet characters)

[0-2]?[0-9]:[0-5][0-9]

Matches a time in 12 or 24 hour format

<< 10.5.0- The RegExp ObjectChapter1010.5.2- RegExp Methods >>

Copyright © 2003 by Wiley Publishing, Inc.

Powered by Near-TimeTerms of Services | Privacy Policy | Security Policy |