Regex that matches path, filename and extension
April 6, 2009
I was looking for a regular expression for Python capable to match a string containing a valid path, filename and extension. Finally, I discovered following solution:
^(.*/)?(?:$|(.+?)(?:(\.[^.]*$)|$))
Let me explain how I got this regular expression. Fortunately, Scott Carpenter has written an excellent article about a regular expression to match a filename with extension. Matching the filename extension is not trivial for all possible situations.
He proposes a very simple, but elegant solution to recognize a filename with extension based on regular expression.
(.+?)(\.[^.]*$|$)
Following his advices, I worked out a more comprehensive regular expression that is also capable to match filenames with path.
First, I decided to add ^ at the beggining of the regular expression, to ensure that the string is matches from the beginning. However, depending how you match the string, this might not be necessary.
On Scott’s regex, a parentheses group is used to match the extension, that might be solely the end of line and result in an empty string. Personally, I prefer getting None instead of an empty string to signal that the extension does not exist in the matched string.
I changed tha pattern to match the extension with a non-grouping parentheses (?:…). The grouping parentheses is used only to delimit the pattern that recognizes the extension \.[^.]*$, as below:
^(.+?)(?:(\.[^.]*$)|$)
Then, I added another group at the beginning of the pattern, to match the path, assuming / as path separator. The grouping is (.*/)?, that consumes all characters until the last /. The group is optional, since the string might not contain any path information.
The resulting regular expression is:
^(.*/)?(?:$|(.+?)(?:(\.[^.]*$)|$))
Entry Filed under: planetLTC. .
Trackback this post | Subscribe to the comments via RSS Feed