Sie sind auf Seite 1von 2

10/12/2017 python - Searching images files with regular expressions - Stack Overflow

Learn, Share, Build


Each month, over 50 million developers come to Stack Overflow to Google Facebook
learn, share their knowledge, and build their careers. OR

Join the worlds largest developer community.

Searching images files with regular expressions

I have a text file that looks like this:

[22/Nov/2011 12:57:58] "GET /media/js/jquery-1.4.3.min.js HTTP/1.1" 304 0


[22/Nov/2011 12:57:58] "GET /media/js/fancybox/fancybox-x.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/fancybox-y.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/blank.gif HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /ajax/pages/erlebnisse/ HTTP/1.1" 200 563
[22/Nov/2011 12:58:00] "GET /erlebnisse/alle-erlebnisse/ HTTP/1.1" 200 17114

I want to use regular expressions to get all the image files (.gif, .jpg, .png) that appear here. So the result from the text above should be:

['fancybox-x.png', 'fancybox-y.png', 'blank.gif']

What I did was:

re.findall('\w+\.(jpg|gif|png)', f.read())

So the pattern is:

1 or more word-characters (\w+) followed by a dot (\.) and then 'jpg', 'gif' or 'png' (jpg|gif|png) .

This actually works, but confuses the content of the parentheses (which I'm using only for "grouping") as a group(1) , so the result is:

['png', 'png', 'gif']

With is right, but incomplete. In other words, I'm asking, how can I make re.findall() distinguish between "grouping" parentheses and
parentheses to assign groups?

python regex

asked Nov 23 '11 at 0:52


juliomalegria
11.8k 7 48 76

3 Answers

You're looking for non-capturing version of regular parentheses (?:...) . The description is
available in the re module docs.

s ='''[22/Nov/2011 12:57:58] "GET /media/js/jquery-1.4.3.min.js HTTP/1.1" 304 0


[22/Nov/2011 12:57:58] "GET /media/js/fancybox/fancybox-x.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/fancybox-y.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/blank.gif HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /ajax/pages/erlebnisse/ HTTP/1.1" 200 563
[22/Nov/2011 12:58:00] "GET /erlebnisse/alle-erlebnisse/ HTTP/1.1" 200 17114'''

import re

for m in re.findall('([-\w]+\.(?:jpg|gif|png))', s):


print m

edited Nov 23 '11 at 1:04 answered Nov 23 '11 at 0:58


Andrew Walker
17.3k 6 43 74

yep! that was it, thank you @AndrewWalker! juliomalegria Nov 23 '11 at 1:08

I actually liked more your answer without the code.. made me feel like a newbie :-( juliomalegria Nov 23

https://stackoverflow.com/questions/8236020/searching-images-files-with-regular-expressions 1/2
10/12/2017 python - Searching images files with regular expressions - Stack Overflow
'11 at 1:09

You can just add another pair of parentheses, and put ?: for the inner one

re.findall('/([^/]+\.(?:jpg|gif|png))', f.read())

Note that \w won't match "-", so I would suggest [^/]+

answered Nov 23 '11 at 1:00


Chen Xing
917 1 7 9

you're right about the "-". What's the [^/]+ for? same as [\w-]+ ? juliomalegria Nov 23 '11 at 1:13

@julio.alegria , [^/] will match all characters other than "/". Chen Xing Nov 23 '11 at 1:40

If you're looking for the entire match you should be able to find it in group 0, otherwise you can
add extra parentheses if you're looking for another part of the string.

answered Nov 23 '11 at 0:57


Godwin
5,969 3 22 46

2 re.findall() returs a list, so there's no group 0 juliomalegria Nov 23 '11 at 1:00

1 if you have multiple groups in your re, findall returns a list of tuples, so you could do ["".join(groups) for
groups in re.findall('(\w+\.)(gif|png|jpg)', my_data)] -- note the new parenths around \w+\. --
accepted answer is obviously a better solution here but i can see this being potentially useful. ben author
Nov 23 '11 at 1:14

https://stackoverflow.com/questions/8236020/searching-images-files-with-regular-expressions 2/2