Sie sind auf Seite 1von 6

publicfinalclassPattern

extendsObject
implementsSerializable
Acompiledrepresentationofaregularexpression.
Aregularexpression,specifiedasastring,mustfirstbecompiledintoaninstanceofthisclass.TheresultingpatterncanthenbeusedtocreateaMatcherobject
thatcanmatcharbitrarycharactersequencesagainsttheregularexpression.Allofthestateinvolvedinperformingamatchresidesinthematcher,somany
matcherscansharethesamepattern.
Atypicalinvocationsequenceisthus

Patternp=Pattern.compile("a*b");
Matcherm=p.matcher("aaaaab");
booleanb=m.matches();
Amatchesmethodisdefinedbythisclassasaconvenienceforwhenaregularexpressionisusedjustonce.Thismethodcompilesanexpressionandmatches
aninputsequenceagainstitinasingleinvocation.Thestatement

booleanb=Pattern.matches("a*b","aaaaab");
isequivalenttothethreestatementsabove,thoughforrepeatedmatchesitislessefficientsinceitdoesnotallowthecompiledpatterntobereused.
Instancesofthisclassareimmutableandaresafeforusebymultipleconcurrentthreads.InstancesoftheMatcherclassarenotsafeforsuchuse.

Summaryofregularexpressionconstructs
*

Construct

Characters
x
\\
\0n
\0nn
\0mnn
\xhh
\uhhhh
\x{h...h}
\t
\n
\r
\f
\a
\e
\cx

Characterclasses
[abc]
[^abc]
[azAZ]
[ad[mp]]
[az&&[def]]
[az&&[^bc]]

Matches

Thecharacterx
Thebackslashcharacter
Thecharacterwithoctalvalue0n(0<=n<=7)
Thecharacterwithoctalvalue0nn(0<=n<=7)
Thecharacterwithoctalvalue0mnn(0<=m<=3,0<=n<=7)
Thecharacterwithhexadecimalvalue0xhh
Thecharacterwithhexadecimalvalue0xhhhh
Thecharacterwith
hexadecimalvalue0xh...h(Character.MIN_CODE_POINT<=0xh...h<=Character.MAX_CODE_POINT)
Thetabcharacter('\u0009')
Thenewline(linefeed)character('\u000A')
Thecarriagereturncharacter('\u000D')
Theformfeedcharacter('\u000C')
Thealert(bell)character('\u0007')
Theescapecharacter('\u001B')
Thecontrolcharactercorrespondingtox

a,b,orc(simpleclass)

Anycharacterexcepta,b,orc(negation)
athroughzorAthroughZ,inclusive(range)
athroughd,ormthroughp:[admp](union)
d,e,orf(intersection)
athroughz,exceptforbandc:[adz](subtraction)
athroughz,andnotmthroughp:[alqz](subtraction)

[az&&[^mp]]

Predefinedcharacterclasses
.
Anycharacter(mayormaynotmatchlineterminators)
\d
\D
\s
\S
\w

Adigit:[09]
Anondigit:[^09]
Awhitespacecharacter:[\t\n\x0B\f\r]
Anonwhitespacecharacter:[^\s]
Awordcharacter:[azAZ_09]

Anonwordcharacter:[^\w]

POSIXcharacterclasses(USASCIIonly)
\p{Lower}
Alowercasealphabeticcharacter:[az]
\p{Upper}
Anuppercasealphabeticcharacter:[AZ]
\W

\p{ASCII}

AllASCII:[\x00\x7F]

\p{Alpha}

Analphabeticcharacter:[\p{Lower}\p{Upper}]

\p{Digit}
\p{Alnum}

Adecimaldigit:[09]
Analphanumericcharacter:[\p{Alpha}\p{Digit}]

\p{Punct}

Punctuation:Oneof!"#$%&'()*+,./:;<=>?@[\]^_`{|}~

\p{Graph}
\p{Print}

Avisiblecharacter:[\p{Alnum}\p{Punct}]
Aprintablecharacter:[\p{Graph}\x20]

\p{Blank}

Aspaceoratab:[\t]

\p{Cntrl}

Acontrolcharacter:[\x00\x1F\x7F]

\p{XDigit}

Ahexadecimaldigit:[09afAF]
Awhitespacecharacter:[\t\n\x0B\f\r]

\p{Space}

java.lang.Characterclasses(simplejavacharactertype)
\p{javaLowerCase} Equivalenttojava.lang.Character.isLowerCase()
\p{javaUpperCase}

Equivalenttojava.lang.Character.isUpperCase()

\p{javaWhitespace} Equivalenttojava.lang.Character.isWhitespace()
\p{javaMirrored}
Equivalenttojava.lang.Character.isMirrored()

ClassesforUnicodescripts,blocks,categoriesandbinaryproperties
\p{IsLatin}
ALatinscriptcharacter(script)
\p{InGreek}
\p{Lu}
\p{IsAlphabetic}
\p{Sc}
\P{InGreek}
[\p{L}&&
[^\p{Lu}]]

Boundarymatchers
^
$
\b
\B

AcharacterintheGreekblock(block)
Anuppercaseletter(category)
Analphabeticcharacter(binaryproperty)
Acurrencysymbol
AnycharacterexceptoneintheGreekblock(negation)
Anyletterexceptanuppercaseletter(subtraction)

Thebeginningofaline
Theendofaline
Awordboundary
Anonwordboundary

\Z

Thebeginningoftheinput
Theendofthepreviousmatch
Theendoftheinputbutforthefinalterminator,ifany

\z

Theendoftheinput

\A
\G

Greedyquantifiers
X?
X*

X,onceornotatall
X,zeroormoretimes

X+
X{n}

X,oneormoretimes
X,exactlyntimes

X{n,}
X{n,m}

X,atleastntimes
X,atleastnbutnotmorethanmtimes

Reluctantquantifiers
X??
X,onceornotatall
X*?
X+?
X{n}?

X,zeroormoretimes
X,oneormoretimes
X,exactlyntimes

X{n,}?
X{n,m}?

X,atleastntimes
X,atleastnbutnotmorethanmtimes

Possessivequantifiers
X?+
X,onceornotatall
X*+
X,zeroormoretimes
X++
X{n}+
X{n,}+

X,oneormoretimes
X,exactlyntimes
X,atleastntimes

X{n,m}+

X,atleastnbutnotmorethanmtimes

Logicaloperators
XY
X|Y
(X)

XfollowedbyY
EitherXorY
X,asacapturinggroup

Backreferences
\n
\k<name>

Quotation
\

Whateverthenthcapturinggroupmatched
Whateverthenamedcapturinggroup"name"matched

Nothing,butquotesthefollowingcharacter
Nothing,butquotesallcharactersuntil\E
Nothing,butendsquotingstartedby\Q

\Q
\E

Specialconstructs(namedcapturingandnoncapturing)
(?<name>X)
X,asanamedcapturinggroup
(?:X)

X,asanoncapturinggroup

(?idmsuxU
idmsuxU)

Nothing,butturnsmatchflagsidmsuxUonoff

(?idmsux
idmsux:X)

X,asanoncapturinggroupwiththegivenflagsidmsuxonoff

(?=X)

X,viazerowidthpositivelookahead

(?!X)

X,viazerowidthnegativelookahead
X,viazerowidthpositivelookbehind

(?<=X)
(?<!X)

X,viazerowidthnegativelookbehind
X,asanindependent,noncapturinggroup

(?>X)

Backslashes,escapes,andquoting
Thebackslashcharacter('\')servestointroduceescapedconstructs,asdefinedinthetableabove,aswellastoquotecharactersthatotherwisewouldbe
interpretedasunescapedconstructs.Thustheexpression\\matchesasinglebackslashand\{matchesaleftbrace.
Itisanerrortouseabackslashpriortoanyalphabeticcharacterthatdoesnotdenoteanescapedconstructthesearereservedforfutureextensionstotheregular
expressionlanguage.Abackslashmaybeusedpriortoanonalphabeticcharacterregardlessofwhetherthatcharacterispartofanunescapedconstruct.
BackslasheswithinstringliteralsinJavasourcecodeareinterpretedasrequiredbyTheJavaLanguageSpecificationaseitherUnicodeescapes(section3.3)or
othercharacterescapes(section3.10.6)Itisthereforenecessarytodoublebackslashesinstringliteralsthatrepresentregularexpressionstoprotectthemfrom
interpretationbytheJavabytecodecompiler.Thestringliteral"\b",forexample,matchesasinglebackspacecharacterwheninterpretedasaregularexpression,
while"\\b"matchesawordboundary.Thestringliteral"\(hello\)"isillegalandleadstoacompiletimeerrorinordertomatchthestring(hello)the
stringliteral"\\(hello\\)"mustbeused.

CharacterClasses
Characterclassesmayappearwithinothercharacterclasses,andmaybecomposedbytheunionoperator(implicit)andtheintersectionoperator(&&).Theunion
operatordenotesaclassthatcontainseverycharacterthatisinatleastoneofitsoperandclasses.Theintersectionoperatordenotesaclassthatcontainsevery
characterthatisinbothofitsoperandclasses.
Theprecedenceofcharacterclassoperatorsisasfollows,fromhighesttolowest:

1
2
3
4

Literalescape
Grouping
Range
Union

\x
[...]
az
[ae][iu]

5 Intersection

[az&&[aeiou]]

Notethatadifferentsetofmetacharactersareineffectinsideacharacterclassthanoutsideacharacterclass.Forinstance,theregularexpression.losesits
specialmeaninginsideacharacterclass,whiletheexpressionbecomesarangeformingmetacharacter.

Lineterminators
Alineterminatorisaoneortwocharactersequencethatmarkstheendofalineoftheinputcharactersequence.Thefollowingarerecognizedaslineterminators:
Anewline(linefeed)character('\n'),
Acarriagereturncharacterfollowedimmediatelybyanewlinecharacter("\r\n"),
Astandalonecarriagereturncharacter('\r'),
Anextlinecharacter('\u0085'),
Alineseparatorcharacter('\u2028'),or
Aparagraphseparatorcharacter('\u2029).
IfUNIX_LINESmodeisactivated,thentheonlylineterminatorsrecognizedarenewlinecharacters.
Theregularexpression.matchesanycharacterexceptalineterminatorunlesstheDOTALLflagisspecified.
Bydefault,theregularexpressions^and$ignorelineterminatorsandonlymatchatthebeginningandtheend,respectively,oftheentireinputsequence.
IfMULTILINEmodeisactivatedthen^matchesatthebeginningofinputandafteranylineterminatorexceptattheendofinput.When
inMULTILINEmode$matchesjustbeforealineterminatorortheendoftheinputsequence.

Groupsandcapturing
Groupnumber
Capturinggroupsarenumberedbycountingtheiropeningparenthesesfromlefttoright.Intheexpression((A)(B(C))),forexample,therearefoursuch
groups:

((A)(B(C)))

2
3
4

(A)
(B(C))
(C)

Groupzeroalwaysstandsfortheentireexpression.
Capturinggroupsaresonamedbecause,duringamatch,eachsubsequenceoftheinputsequencethatmatchessuchagroupissaved.Thecaptured
subsequencemaybeusedlaterintheexpression,viaabackreference,andmayalsoberetrievedfromthematcheroncethematchoperationiscomplete.

Groupname
Acapturinggroupcanalsobeassigneda"name",anamedcapturinggroup,andthenbebackreferencedlaterbythe"name".Groupnamesarecomposed
ofthefollowingcharacters.Thefirstcharactermustbealetter.
Theuppercaseletters'A'through'Z'('\u0041'through'\u005a'),
Thelowercaseletters'a'through'z'('\u0061'through'\u007a'),
Thedigits'0'through'9'('\u0030'through'\u0039'),
AnamedcapturinggroupisstillnumberedasdescribedinGroupnumber.
Thecapturedinputassociatedwithagroupisalwaysthesubsequencethatthegroupmostrecentlymatched.Ifagroupisevaluatedasecondtimebecauseof
quantificationthenitspreviouslycapturedvalue,ifany,willberetainedifthesecondevaluationfails.Matchingthestring"aba"againstthe
expression(a(b)?)+,forexample,leavesgrouptwosetto"b".Allcapturedinputisdiscardedatthebeginningofeachmatch.
Groupsbeginningwith(?areeitherpure,noncapturinggroupsthatdonotcapturetextanddonotcounttowardsthegrouptotal,ornamedcapturinggroup.

Unicodesupport
ThisclassisinconformancewithLevel1ofUnicodeTechnicalStandard#18:UnicodeRegularExpression,plusRL2.1CanonicalEquivalents.
Unicodeescapesequencessuchas\u2014inJavasourcecodeareprocessedasdescribedinsection3.3ofTheJavaLanguageSpecification.Suchescape
sequencesarealsoimplementeddirectlybytheregularexpressionparsersothatUnicodeescapescanbeusedinexpressionsthatarereadfromfilesorfromthe
keyboard.Thusthestrings"\u2014"and"\\u2014",whilenotequal,compileintothesamepattern,whichmatchesthecharacterwithhexadecimal
value0x2014.
AUnicodecharactercanalsoberepresentedinaregularexpressionbyusingitsHexnotation(hexadecimalcodepointvalue)directlyasdescribedin
construct\x{...},forexampleasupplementarycharacterU+2011Fcanbespecifiedas\x{2011F},insteadoftwoconsecutiveUnicodeescapesequencesof
thesurrogatepair\uD840\uDD1F.
Unicodescripts,blocks,categoriesandbinarypropertiesarewrittenwiththe\pand\PconstructsasinPerl.\p{prop}matchesiftheinputhasthepropertyprop,
while\P{prop}doesnotmatchiftheinputhasthatproperty.
Scripts,blocks,categoriesandbinarypropertiescanbeusedbothinsideandoutsideofacharacterclass.
ScriptsarespecifiedeitherwiththeprefixIs,asinIsHiragana,orbyusingthescriptkeyword(oritsshortformsc)as
inscript=Hiraganaorsc=Hiragana.

ThescriptnamessupportedbyPatternarethevalidscriptnamesacceptedanddefinedbyUnicodeScript.forName.
BlocksarespecifiedwiththeprefixIn,asinInMongolian,orbyusingthekeywordblock(oritsshortformblk)as
inblock=Mongolianorblk=Mongolian.
TheblocknamessupportedbyPatternarethevalidblocknamesacceptedanddefinedbyUnicodeBlock.forName.
CategoriesmaybespecifiedwiththeoptionalprefixIs:Both\p{L}and\p{IsL}denotethecategoryofUnicodeletters.Sameasscriptsandblocks,
categoriescanalsobespecifiedbyusingthekeywordgeneral_category(oritsshortformgc)asingeneral_category=Luorgc=Lu.
ThesupportedcategoriesarethoseofTheUnicodeStandardintheversionspecifiedbytheCharacterclass.Thecategorynamesarethosedefinedinthe
Standard,bothnormativeandinformative.
BinarypropertiesarespecifiedwiththeprefixIs,asinIsAlphabetic.ThesupportedbinarypropertiesbyPatternare
Alphabetic
Ideographic
Letter
Lowercase
Uppercase
Titlecase
Punctuation
Control
White_Space
Digit
Hex_Digit
Noncharacter_Code_Point
Assigned
PredefinedCharacterclassesandPOSIXcharacterclassesareinconformancewiththerecommendationofAnnexC:CompatibilityPropertiesofUnicode
RegularExpression,whenUNICODE_CHARACTER_CLASSflagisspecified.

Classes
Matches
\p{Lower} Alowercasecharacter:\p{IsLowercase}
\p{Upper} Anuppercasecharacter:\p{IsUppercase}
\p{ASCII} AllASCII:[\x00\x7F]
\p{Alpha} Analphabeticcharacter:\p{IsAlphabetic}
\p{Digit} Adecimaldigitcharacter:p{IsDigit}
Analphanumericcharacter:[\p{IsAlphabetic}\p{IsDigit}]
\p{Punct} Apunctuationcharacter:p{IsPunctuation}
\p{Graph} Avisiblecharacter:[^\p{IsWhite_Space}\p{gc=Cc}\p{gc=Cs}\p{gc=Cn}]
\p{Print} Aprintablecharacter:[\p{Graph}\p{Blank}&&[^\p{Cntrl}]]
Aspaceoratab:[\p{IsWhite_Space}&&
\p{Blank}
[^\p{gc=Zl}\p{gc=Zp}\x0a\x0b\x0c\x0d\x85]]
\p{Cntrl} Acontrolcharacter:\p{gc=Cc}
\p{XDigit} Ahexadecimaldigit:[\p{gc=Nd}\p{IsHex_Digit}]
\p{Alnum}

\p{Space}
\d
\D
\s
\S
\w
\W

Awhitespacecharacter:\p{IsWhite_Space}
Adigit:\p{IsDigit}
Anondigit:[^\d]
Awhitespacecharacter:\p{IsWhite_Space}
Anonwhitespacecharacter:[^\s]
Awordcharacter:
[\p{Alpha}\p{gc=Mn}\p{gc=Me}\p{gc=Mc}\p{Digit}\p{gc=Pc}]
Anonwordcharacter:[^\w]

Categoriesthatbehavelikethejava.lang.Characterbooleanismethodnamemethods(exceptforthedeprecatedones)areavailablethroughthe
same\p{prop}syntaxwherethespecifiedpropertyhasthenamejavamethodname.

ComparisontoPerl5
ThePatternengineperformstraditionalNFAbasedmatchingwithorderedalternationasoccursinPerl5.
Perlconstructsnotsupportedbythisclass:
Predefinedcharacterclasses(Unicodecharacter)

\hAhorizontalwhitespace
\HAnonhorizontalwhitespace
\vAverticalwhitespace
\VAnonverticalwhitespace
\RAnyUnicodelinebreaksequence\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]

\XMatchUnicodeextendedgraphemecluster
Thebackreferenceconstructs,\g{n}forthenthcapturinggroupand\g{name}fornamedcapturinggroup.
Thenamedcharacterconstruct,\N{name}foraUnicodecharacterbyitsname.
Theconditionalconstructs(?(condition)X)and(?(condition)X|Y),
Theembeddedcodeconstructs(?{code})and(??{code}),
Theembeddedcommentsyntax(?#comment),and
Thepreprocessingoperations\l\u,\L,and\U.
ConstructssupportedbythisclassbutnotbyPerl:
Characterclassunionandintersectionasdescribedabove.
NotabledifferencesfromPerl:
InPerl,\1through\9arealwaysinterpretedasbackreferencesabackslashescapednumbergreaterthan9istreatedasabackreferenceifatleastthat
manysubexpressionsexist,otherwiseitisinterpreted,ifpossible,asanoctalescape.Inthisclassoctalescapesmustalwaysbeginwithazero.Inthis
class,\1through\9arealwaysinterpretedasbackreferences,andalargernumberisacceptedasabackreferenceifatleastthatmanysubexpressions
existatthatpointintheregularexpression,otherwisetheparserwilldropdigitsuntilthenumberissmallerorequaltotheexistingnumberofgroupsoritis
onedigit.
Perlusesthegflagtorequestamatchthatresumeswherethelastmatchleftoff.ThisfunctionalityisprovidedimplicitlybytheMatcherclass:Repeated
invocationsofthefindmethodwillresumewherethelastmatchleftoff,unlessthematcherisreset.
InPerl,embeddedflagsatthetoplevelofanexpressionaffectthewholeexpression.Inthisclass,embeddedflagsalwaystakeeffectatthepointatwhich
theyappear,whethertheyareatthetoplevelorwithinagroupinthelattercase,flagsarerestoredattheendofthegroupjustasinPerl.
Foramoreprecisedescriptionofthebehaviorofregularexpressionconstructs,pleaseseeMasteringRegularExpressions,3ndEdition,JeffreyE.F.Friedl,
O'ReillyandAssociates,2006.

Das könnte Ihnen auch gefallen