I want to use regular expressions to get all the image files (.gif, .jpg, .png) that appear here. So the result from the text above should be:
re.findall('\w+\.(jpg|gif|png)', f.read())
1 or more word-characters (\w+) followed by a dot (\.) and then 'jpg', 'gif' or 'png' (jpg|gif|png) .
This actually works, but confuses the content of the parentheses (which I'm using only for "grouping") as a group(1) , so the result is:
With is right, but incomplete. In other words, I'm asking, how can I make re.findall() distinguish between "grouping" parentheses and
parentheses to assign groups?
python regex
3 Answers
You're looking for non-capturing version of regular parentheses (?:...) . The description is
available in the re module docs.
import re
yep! that was it, thank you @AndrewWalker! juliomalegria Nov 23 '11 at 1:08
I actually liked more your answer without the code.. made me feel like a newbie :-( juliomalegria Nov 23
https://stackoverflow.com/questions/8236020/searching-images-files-with-regular-expressions 1/2
10/12/2017 python - Searching images files with regular expressions - Stack Overflow
'11 at 1:09
You can just add another pair of parentheses, and put ?: for the inner one
re.findall('/([^/]+\.(?:jpg|gif|png))', f.read())
you're right about the "-". What's the [^/]+ for? same as [\w-]+ ? juliomalegria Nov 23 '11 at 1:13
@julio.alegria , [^/] will match all characters other than "/". Chen Xing Nov 23 '11 at 1:40
If you're looking for the entire match you should be able to find it in group 0, otherwise you can
add extra parentheses if you're looking for another part of the string.
1 if you have multiple groups in your re, findall returns a list of tuples, so you could do ["".join(groups) for
groups in re.findall('(\w+\.)(gif|png|jpg)', my_data)] -- note the new parenths around \w+\. --
accepted answer is obviously a better solution here but i can see this being potentially useful. ben author
Nov 23 '11 at 1:14
https://stackoverflow.com/questions/8236020/searching-images-files-with-regular-expressions 2/2