Beruflich Dokumente
Kultur Dokumente
Expressions
A simple and powerful way to
match characters
Laurent Falquet, EPFL March, 2005
Swiss Institute of Bioinformatics
Swiss EMBnet node
Regular Expressions
What is a regular expression?
Literal (or normal characters)
Alphanumeric
abcABC0123...
Punctuation
-_ ,.;:=()/+ *%&{}[]?!$^|\<>"@#
Metacharacters
Ex: ls *.java
Flavors
awk, egrep, Emacs, grep, Perl,
POSIX, Tcl, PROSITE !
Pattern: <A-x-[ST](2)-x(0,1)-{V}
Perl Regexp: ^A.[ST]{2}.?[^V]
Text: The sequence must start with an alanine,
followed by any amino acid, followed by a serine
or a threonine, two times, followed by any amino
acid or nothing, followed by any amino acid
except a valine.
Simply the syntax differ
In Perl: //
^ start, $ end
[] or (|)
Match 0, 1 or more
. 1 of any
? 0 or 1
+ 1 or more
* 0 or more
{m,n} range
! negation
Examples
Match every instance of a
SwissProt AC
m/[OPQ][0-9][A-Z0-9]{3}[0-9]/;
m/ [OPQ]\d[A-Z0-9]{3}\d/;
\d digit [0-9]
\s whitespace [space\f\n\r\t]
\w character [a-zA-Z0-9_]
\D\S\W complement of \d\s\w
m//
s///
$var =~ s/colou?r/couleur/;
Translate operator
tr///
$var =~ m/colou?r/;
$var !~ m/colou?r/;
Substitution operator
Byte notation
Match operator
\char or \num
Shorthand
$revcomp =~ tr/ACGT/tgca/;
Modifiers //#
/i case insensitive
/g global match
Many other /s,/m,/o,/x...
Grouping
External reference
Exercises
Internal reference
Numbering
$1 to $9
$10 to more if needed...
On sib-dea:
use visual_regexp-1.2.tcl to check
your regular expressions
(requires X-windows)
Solution RegExp
/[\d{1,3}\.]{3}\d{1,3}/
/\w+\.\w+\@\w+\-?\w+\.[a-z]{2,4}/
/\<(\/?)address\>/\<$1pre\>/
generalized:
address = \w+
Perl In-liners
Example:
In-liners: -n and -p
is equivalent to:
is equivalent to:
-1
-11
-1
-1
-2
-1
-1
5
5
6
7
7
8
9
prf:CARD
prf:CARD
prf:CARD
prf:CARD
prf:CARD
prf:CARD
prf:CARD
90
513
88
1463
195
430
218
1
sw:ICEA_XENLA
435 sw:RIK2_MOUSE
1
sw:CARC_HUMAN
1380 sw:NAL1_HUMAN
113 sw:ASC_HUMAN
347 sw:CAR8_HUMAN
134 sw:CARF_HUMAN
In-liners: examples
for($i=0;$i<100;$i++) {
$nb = int(rand(100));
$hash{$nb} = 1;
}
print sort {$a<=>$b} keys %hash;