Regex that matches path, filename and extension

I was looking for a regular expression for Python capable to match a string containing a valid path, file name and extension. Finally, I discovered following solution:
^(.*/)?(?:$|(.+?)(?:(\.[^.]*$)|$))
Let me explain how I got this regular expression. Fortunately, Scott Carpenter has written an excellent article about a regular expression to match a file name with extension. Matching the file name extension is not trivial for all possible situations.

Scott proposes a very simple, but elegant solution to recognize a file name with extension based on regular expression.

(.+?)(\.[^.]*$|$)

Following his advices, I worked out a more comprehensive regular expression that is also capable to match file names with path.

First, I decided to add ^ at the beginning of the regular expression, to ensure that the string matches from the beginning. However, depending how you match the string, this might not be necessary.

On Scott’s regex, a parentheses group is used to match the extension, that might be solely the end of line and result in an empty string. Personally, I prefer getting None instead of an empty string to signal that the extension does not exist in the matched string.

I changed the pattern to match the extension with a non-grouping parentheses (?:). The grouping parentheses is used only to delimit the pattern that recognizes the extension \.[^.]*$, as below:

^(.+?)(?:(\.[^.]*$)|$)

Then, I added another group at the beginning of the pattern, to match the path, assuming / as path separator. The grouping is (.*/)?, that consumes all characters until the last /. The group is optional, since the string might not contain any path information.

The resulting regular expression is:

^(.*/)?(?:$|(.+?)(?:(\.[^.]*$)|$))

4 Responses to Regex that matches path, filename and extension

  1. ancore says:

    This is exactly what I was looking for. Thank you!

  2. Marirs says:

    ([\w\:\\\w /]+\w+\.\w+)

    will match:
    c:\documents and settings\users\administrator\desktop\foo.txt
    or
    c:\foo.txt
    or
    foo.txt

  3. Fran says:

    Thanks!

    @Marirs: \w only matches alphanumeric characters so it would fail with files like “this-file.txt”

  4. Gatien Cambas says:

    But matche with …\foo.txt and nimp|*:,”.>< :(

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: