There's more than one way to...remove a file extension

Written by Chad Peters

This week I was working on a project that moves and archives file recordings and their accompanying XML meta data. I needed to remove the file extension from a file path stored as a string. The first thing I thought of was:

filename.Replace(".xml","")

That works, but it has to march through the whole string to the end to find what it needs to replace.

Substring seems more efficient:

filename.Substring(0, filename.Length - 4);

My Lead then reminded me that Path has a method for this very thing:

Path.GetFileNameWithoutExtension(filename)

At this point curiosity got the better of me. Which of these is actually more efficient? More importantly, which of these is more efficient for the strings I am working with? BenchmarkDotNet ftw! 🎉

Benchmark Results Fwiw, the strings I am working with are the 2nd result set with a length pretty close to 71.

The fact that Replace was the worst performing isn’t a surprise, but there were a few surprises. I was surprised that the difference between Replace and Substring decreased as the size of the string got larger. Path.GetFileNameWithoutExtension had a couple of interesting finds. Once the string gets to a certain length the performance and allocated memory remains consistent no matter the length of the string. It also uses significantly less memory than the other two methods.

I looked at the Path.GetFileNameWithoutExtension and in short it:

converts the path string to a ReadOnlySpan<char>
gets just the filename from the path
uses LastIndexOf(".") to find the period before the extension
uses Slice to get the filename without the extension
converts the filename without the extension back to a string

I found this paragraph in an article exploring spans that gives an example of when you might want to use a span. It helps shed light on the benchmark results.

Or take another example. You’re implementing an operation over System.String, such as a specialized parsing method. You’d likely expose a method that takes a string and provide an implementation that operates on strings. But what if you wanted to support operating over a subset of that string? String.Substring could be used to carve out just the piece that’s interesting to them, but that’s a relatively expensive operation, involving a string allocation and memory copy. You could, as mentioned in the array example, take an offset and a count, but then what if the caller doesn’t have a string but instead has a char[]? Or what if the caller has a char*, like one they created with stackalloc to use some space on the stack, or as the result of a call to native code? How could you write your parsing method in a way that didn’t force the caller to do any allocations or copies, and yet worked equally well with inputs of type string, char[] and char*?

That was a fun diversion and an informative peek under the hood. Happy coding!