Beruflich Dokumente
Kultur Dokumente
learning the Regex syntax learning how to work with Regex in your programming language This article introduces you to the Regular Expression syntax. After learning the syntax for Regular Expressions you can use it many different languages as the syntax is fairly similar between languages. Microsoft's .NET Framework contains a set of classes for working with Regular Expressions in theSystem.Text.RegularExpressions namespace.
2|Page
ana ant app Backslash and an uppercase 'W' (\W) will match any non-word character.
Matching white-space
White-space can be matched using \s (backslash and 's'). The following Regular Expression matches the letter 'a' followed by two word characters then a white space character. Text: "abc anaconda ant" Regex: a\w\w\s Matches: "abc " Note that ant was not matched as it is not followed by a white space character. White-space is defined as the space character, new line (\n), form feed (\f), carriage return (\r), tab (\t) and vertical tab (\v). Be careful using \s as it can lead to unexpected behaviour by matching line breaks (\n and \r). Sometimes it is better to explicitly specify the characters to match instead of using \s. e.g. to match Tab and Space use [\t\0x0020]
Matching digits
The digits zero to nine can be matched using \d (backslash and lowercase 'd'). For example, the following Regular Expression matches any three digits in a row. Text: 123 12 843 8472 Regex: \d\d\d Matches: 123 12 843 8472 123 843 847
The caret (^) can be added to the start of the set of characters to specify that none of the characters in the character set should be matched. The following Regular Expression matches any three character where the first character is not 'd' and not 'a'. Text: abc def ant cow Regex: [^da].. Matches: "bc " "ef " "nt " "cow"
3|Page
Ranges of characters can also be combined together. the following Regular Expression matches any of the characters from 'a' to 'z' or any digit from '0' to '9' followed by two word characters. Text: abc no 0aa i8i Regex: [a-z0-9]\w\w Matches: abc no 0aa i8i abc 0aa i8i The pattern could be written more simply as [a-z\d]
4|Page
an an a a
5|Page
Matches a tab Matches a carriage return Matches a new line Matches a Unicode character using hexadecimal representation. Exactly four digits must be specified.
In this example, the Regular Expression pattern matches one or more word characters followed by a carriage return then a new line. Text: an anaconda ate Anna Jones Regex: \w+\r\n Match: ate Depending on your operating system you might have to combine the \r and \n character escapes to create the correct new line sequence for your platform. For Microsoft Windows systems you should generally use \r\n which is a carriage return then line feed (CRLF). To simply match the end of a line or string use the dollar sign ($).
Match Grouping
Groups perform a few different functions. They allow the quantifiers (such as plus and star) to be applied to sections of the match instead of just individual characters. A group is specified by the round brackets ( and ). If you want to match the round bracket characters you must use the escape character before the bracket e.g. \( or \). This regex matches 'http://' optionally followed by 'www.' then starts a group and matches one or more of any character that is not a full stop/period (.) closes the group then matches '.com'. Text: http://www.yahoo.com/index.html and http://yahoo.com Regex: http://(www\.)?([^\.]+)\.com Matches: http://www.yahoo.com http://yahoo.com The question mark after the group (www\.) applies to the whole group making it optional.
An example in C#
The regular expression classes are in the System.Text.RegularExpressions namespace. using System.Text.RegularExpressions; The Regex class represents a regular expression. A regular expression pattern must be specified when creating a Regex object. The pattern cannot be changed. Regex exp = new Regex( @"http://(www\.)?([^\.]+)\.com",
6|Page
RegexOptions.IgnoreCase); string InputText = "http://www.yahoo.com/"; The MatchCollection class stores a list of successful matches found by applying the regular expression pattern to an input string. MatchCollection MatchList = exp.Matches(InputText); Match FirstMatch = MatchList[0]; Console.WriteLine(FirstMatch.Value); The Group class represents a group within the regex pattern. Each Match object has a Groupscollection. Group GroupCurrent; for (int i = 1; i < FirstMatch.Groups.Count; i++) { GroupCurrent = FirstMatch.Groups[i]; The Success property on the group can be used to check if the Group matched or not. if (GroupCurrent.Success) { Console.WriteLine("\tMatched:" + GroupCurrent.Value); } else { Console.WriteLine("\tGroup didn't match"); } } Groups within a Match can be referenced by number or by name (see below). if (MatchList.Count > 0) { if (MatchList[1].Success) { Console.WriteLine("Group 1 matched"); } } Matches also allow sections of the match to be used in replacement expressions when usingRegex.Replace().
Named Groups
Groups can be named to allow easier identification with the following syntax.
(?<NameOfGroup>expression)
7|Page
RegexOptions.Multiline - Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string. RegexOptions.Singleline - Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n). RegexOptions.ExplicitCapture - Specifies that the only valid captures are groups that are explicitly named or in the form (?<name>...). RegexOptions.IgnorePatternWhitespace - Eliminates unescaped white space from the pattern and enables comments marked with the hash sign (#). RegexOptions.Compiled - Specifies that the regular expression is compiled to an assembly. The regular expression will be faster to match but it takes more time to compile initially. This option (although tempting) should only be used when the expression will be used many times. e.g. in a foreach loop RegexOptions.ECMAScript - Enables ECMAScript-compliant behavior for the expression. This flag can be used only in conjunction with the IgnoreCase, Multiline, and Compiled flags. The use of this flag with any other flags results in an exception. RegexOptions.RightToLeft - Specifies that the search will be from right to left instead of from left to right.