Sie sind auf Seite 1von 4

Source : http://cflove.

org/2012/10/simple-regular-expression-tutorial-for-coldfu
sion-developers.cfm
Let's start with the simplest. To match a letter or a word directly:
a - find a letter a
cake - find a word cake.
1234 - find 1234
- find hypen
Match Unicode characters by \u followed by the hexadecimal character position:
\u0D85 matches letter from Sinhala Unicode range.
\u3042 matches letter from Hiragana Unicode range.
(Unicode is the standard for universal characters)
Match digits in our document (0123456789):
\d - find all numbers
Matches none digits - everything else except 0123456789:
\D - find everything but numbers
Matchers word characters:
\w - find all word characters.
Word characters include digits and underscore but not symbols/punctuation or whi
tespace. Can you guess the Regex for "non-word characters"? It is uppercase of t
he Regex Regex command:
\W - find everything but word characters
Matches whitespace. Whitespace includes space, tab, line feed, next line, etc.:
\s - find whitespace
Guess the Regex for none whitespace? It is:
\S - find everything but white space
Replete {n} of times. Match 4 digits:
\d{4} - find 4 digits
\d{4}-\d{4} - match 4 digits, a hypen and 4 digits immediately after it.
likewise four characters:
\w{4} - find 4 word character
\w{4} \w{4} - match 4 word character, space, and 4 word character immedi
ately after it.
Try matching the number of whitespaces, none characters and none digits. We can
use this syntax to repeat most of Regex.
Lets match a minimum 6 characters, but not more than 8:
\w{6,8} - find word characters between 6 and 8
Match more than 6 characters:
\w{6,} - six or more word chaarcters
Match the beginning of the line with ^:
^\d{4} - 4 digits at the beginning of line
4 digits at the beginning of a line
^\D{4} - 4 nondigits at a beginning of a line
^\w{4} - 4 word characters at the geginning of the line
Match a line by matching the beginning and the end:
^\w{8}$ - eight word character boundary with beginning of line and end o
f line
\b matches a word boundary. When it is in front, it matches the beginning of a w
ord:
\b\w{4} - four word character boundary with beginning of a word
When it is at the end, it matches the end of a word:
\w{4}\b - four word character boundary with end of a word
Place it at both ends to match a word.
\b\w{4}\b - four word character boundary with beginning of a word and en
d of a word - means four character word
Like always, the uppercase of the same expression means the opposite:
\B\w{4} - four word character does not boundary with a beginning of a wo
rd
.(Period) matches any single character. Two of them equals, two characters:
r.d - letter 'r', followed by any letter, followed by letter 'd'
c..e - letter 'c' followed by two letters, followed by letter 'e'
Pipe | creates "OR" conditions:
cake|1234 - find 'cake' or '1234'
We can use brackets to group "OR" conditions:
1234-(5678|milk) - find 1234 followed by hypen. followed by 5678 or milk
We can repeat groups too:
(cous){2} - find 'cous' two times
? (a question mark) makes the preceding expression optional - run once or zero t
imes:. :
(1234-)(milk)? - Must find '1234' and a hypen, if 'milk' followed match
it too
(1234-)?(milk) - Match 1234 and a hypen if can be found mmediately befor
e milk. must find 'milk'
This is called a "lazy" search. Lazy searches are happy with a single match givi
ng the next expression the chance, but "greedy" searchers keep on matching all t
he combinations before letting the next expression precede. The "lazy" kid eats
a single ice cream, but the "greedy" kid eats everything.
Here is a very important feature of regular expressions. We can refer back to ma
tches found by a group by calling the positions of that group:.
(\w{3} )(cream )\1 - (find any three words and a space) ('cream' and 'sp
ace') (find the match of the first group again)
Keep in mind, we are not referring to the group, we are not asking for the group
to repeat, but we are asking to find the match found by that group again. We ca
n call any number of groups:
(\w{3} )(cream )\1\2 - (find any three words and a space) ('cream' and
'space') (find the match of the first group again) (find the match of the second
group again)
How this works:
When each group satisfies a match, it keeps a record. We can recall that history
record by the position of it. Recording history is somewhat resource consuming;
we can specifically ask not to record group matches by placing a ?: in the begi
nning of a group if there is no need for us to recall them, it improves performa
nce:
(?:\w{4}) ( cous)\1 (find any four words, but do not record it for f
uture reference) space (cous) (find the match of the first recorded group, it is
second group here since we told not to record the first group)
Since we ask the first group not to be captured, second group took the first pos
ition of the history records.
Match any one character from the list of characters:.
[abc] matches characters "a", "b" or "c" - find letter 'a' 'b' or 'c'
[ABC] is the same, but uppercase. Regular expressions are case sensitive unless
if you ask them not to be. - find "A" or "B" or "C"
['"] search for single quotes or double quote - find single quote or double quo
te
r[abcde]d - find letter 'r' and any one of 'a', 'b', 'c' 'd' or 'e', fol
lowed by letter 'd'
[a-z] any lower case characters between a to z
[A-Z] any upper case characters between A to Z
[A-Z0-9] any upper case characters between A to Z or digits between 0 to 9
You can have multiple brackets too.
m[d-j][d-l][ukt] - letter 'm' and any letter between 'd' and 'j' and any
letter between 'd' and 'l' and any one of 'u', 'k', and 't'
+ Makes preceding expression repeat greedily until it won't find any more matche
s:.
C[^\d]+e - [starts with "C"] [ all characters that are not digits] [ end
ing with "e"]
+ is a greedy match - it keep on eating all possible matches before give the nex
t expression a change. Lazy matches stops once they find a match and give the ne
xt command a chance:
["][^"]+["] - [Double quote] [Any number of characters that are not doub
le quotes] [ ending with Double quote]
* makes preceding expression repeat greedily until it won't find any more matche
s and it can also make preceding expression optional.;
re[123]*d 123 matches nothing, but * still returns "true" and lets the next comm
and proceed by making [123] optional.
re[123]*d - letters 're' and (find 123 if it can be found, if not it is
ok) and letter 'd' at the end
C[^\d]*e - [start with "C"] [all characters that is not digits] [ending
with "e"]
[No digit match] is repeated until the end of the line because * greedily matche
s everything it can just like +.
?= is a positive look-ahead.
We used ^ to match beginning of a line, $ match end of a line, and \b for word b
oundary. Positive look-ahead helps you to create boundaries of your own and the
boundary starts at the beginning of the look-ahead.
Match the word "ice" in "ice cream", but not in the "ice coffee". We can create
a boundary for "ice cream" and search for "ice":
(?=ice cream)ice - find the word "ice" only in places where it starts as
'ice cream'
(?=cake)c - find the letter 'c' only in places where it starts as 'cake'
(?=cola)co - find letters 'co' only in place where it starts as 'cola'
We can use this to define an end position also:
ice(?= coffee) - find the word 'ice' where it end with 'space' and 'coff
ee'
nothing(?= is) - find the word 'nothing' where it ending with 'space' an
d 'is'
?! is a negative look-ahead. This creates a boundary to avoid and the boundary s
tarts at the beginning of the look-ahead value:
(?!nothing is)nothing - find the word 'nothing' but not in the places wh
ere it starts with 'nothing is'
It matches the word "nothing" as long as it is not in "nothing is":
ice(?! coffee) - find the word 'ice' but not in the place where it ends
with 'space' and 'coffee'
When used with other regexes, this is a very powerful feature of regular express
ion:
ice(?! \w{5} ) -match 'ice' not ending with 'space' and five characters
and a space
1234(?!-\d{4}) - match 1234 not ending with hypen and four digits
\d{4}(?!-\d{4}) - match four digits not ending with hypen and four digit
s
\W{2} - match two none word characters
let's build into
\d{4}(?=\W{2}) - match fourdigit boundary with a end of a word that have
two non word characters
.*day - greedily match single characters ubtil letter 'day'
let's build into
say(?!.*day) - find letters 'say' that is not in a line ending with 'day
'
We can use escape key to search for literal value of regular expressions. We use
\w to match word characters. But when you want to search \w in your document, w
e can use escape just like JavaScript:.
\\w - literally match 'w'
[a-z] matches characters between "a" and "z". But how about if you want to match
"a" or "-" or "z"?
[a\-z] - match 'a', hypen or 'z'
^ matches the beginning of a line, but if you want to search for ^ in your docum
ent, you can escape it in your regular expression:
\^ - literally match ^ character

Das könnte Ihnen auch gefallen