Character classes
Character classes are an important component of regular expression. It is used for specifying which characters are acceptable at particular point or which are not. With character classes, you can specify characters individually or give a range of allowable characters. More over with the character classes you can negate the characters which are not acceptable. Some of the character classes are given below:
1. Simple Classes ( [ ] ):~
The most basic form of a character class is to place a set of characters side-by-side within square brackets. For example, the regular expression “[bcr]at” will match the words "bat", "cat", or "rat" because it defines a character class (accepting either "b", "c", or "r") as its first character. Here “[bcr]” is a simple character class.
Enter your regex: [bcr]atEnter input string to search: bat
I found the text "bat" starting at index 0 and ending at index 3.
Enter your regex: [bcr]at
Enter input string to search: cat
I found the text "cat" starting at index 0 and ending at index 3.
Enter your regex: [bcr]at
Enter input string to search: rat
I found the text "rat" starting at index 0 and ending at index 3.
Enter your regex: [bcr]at
Enter input string to search: hat
No match found.
In the above examples, the overall match succeeds only when the first letter matches one of the characters defined by the character class.
2. Negation ( ^ ):~
One of the other most important character class which is widely used is the negation character class. It is used to match all characters except those listed, insert the "^" metacharacter (called leading caret) at the beginning of the character class. This technique is known as negation. In the given regular expression, the “[^bcr]” is a character class.
Enter your regex: [^bcr]atEnter input string to search: bat
No match found.
Enter your regex: [^bcr]at
Enter input string to search: cat
No match found.
Enter your regex: [^bcr]at
Enter input string to search: rat
No match found.
Enter your regex: [^bcr]at
Enter input string to search: hat
I found the text "hat" starting at index 0 and ending at index 3.
In the given example I apply the negation on “b”, “c” and “r”. The regular expression engine matches all the character except those which is started from above three and displays on the screen. For example when I enter the search string “bat”, “cat” and “rat” the regular expression engine does not find it and when I enter “hat”, it shows that the character found.
3. Ranges ( - ):~
Sometimes you'll want to define a character class that includes a range of values, such as the letters "a through h" or the numbers "1 through 5". To specify a range, simply insert the "-" metacharacter between the first and last character to be matched, such as [1-5] or [a-h]. You can also place different ranges beside each other within the class to further expand the match possibilities. For example, [a-zA-Z] will match any letter of the alphabet: a to z (lowercase) or A to Z (uppercase).
Here are some examples of ranges and negation:
Enter your regex: [a-c]Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
Enter your regex: [a-c]
Enter input string to search: b
I found the text "b" starting at index 0 and ending at index 1.
Enter your regex: [a-c]
Enter input string to search: c
I found the text "c" starting at index 0 and ending at index 1.
Enter your regex: [a-c]
Enter input string to search: d
No match found.
Enter your regex: foo[1-5]
Enter input string to search: foo1
I found the text "foo1" starting at index 0 and ending at index 4.
Enter your regex: foo[1-5]
Enter input string to search: foo5
I found the text "foo5" starting at index 0 and ending at index 4.
Enter your regex: foo[1-5]
Enter input string to search: foo6
No match found.
Enter your regex: foo[^1-5]
Enter input string to search: foo1
No match found.
Enter your regex: foo[^1-5]
Enter input string to search: foo6
I found the text "foo6" starting at index 0 and ending at index 4.
4. Union ( [ ][ ] ):~
You can, however, combine character classes to form new types of patterns. For instance the following regular expression
Letters from 19[89][2-5].
With this pattern, any year whose third digit is an 8 or 9 and the final digit between 2 and 5, inclusive, will be matched. Thus, these are the potential matches for the previous regular expression pattern:
Letter from 1982
Letter from 1983
Letter from 1984
Letter from 1985
Letter from 1992
Letter from 1995
As you can see from the output that the third digit of the year is lie between 8 and 9 and the final digit lie between 2 to 5.
Some of the other short method of character classes are given below:
Sr. No Symbol Function
1. \d Any digit [0 – 9]
2. \D Any non digit [^0-9]
3. \w Any alphanumeric [a-zA-Z0-9_]
4. \W Any non-alphanumeric [^a-zA-Z0-9_]
5. \s Any space [ \t\n\r\f]
6. \S Any non-space [^ \t\n\r\f]
For example this can be used as
Enter your regular expression [\d]Enter input string to search: 1
I found the text "1" starting at index 0 and ending at index 1.
And so on.