There's more than one way to...remove a file extension

Written by Chad Peters

This week I was working on a project that moves and archives file recordings and their accompanying XML meta data. I needed to remove the file extension from a file path stored as a string. The first thing I thought of was:

filename.Replace(".xml","")

That works, but it has to march through the whole string to the end to find what it needs to replace.

Substring seems more efficient:

filename.Substring(0, filename.Length - 4);

My Lead then reminded me that Path has a method for this very thing:

Path.GetFileNameWithoutExtension(filename)

At this point curiosity got the better of me. Which of these is actually more efficient? More importantly, which of these is more efficient for the strings I am working with? BenchmarkDotNet ftw! 🎉

Benchmark Results Fwiw, the strings I am working with are the 2nd result set with a length pretty close to 71.

The fact that Replace was the worst performing isn’t a surprise, but there were a few surprises. I was surprised that the difference between Replace and Substring decreased as the size of the string got larger. Path.GetFileNameWithoutExtension had a couple of interesting finds. Once the string gets to a certain length the performance and allocated memory remains consistent no matter the length of the string. It also uses significantly less memory than the other two methods.

I looked at the Path.GetFileNameWithoutExtension and in short it:

  • converts the path string to a ReadOnlySpan<char>
  • gets just the filename from the path
  • uses LastIndexOf(".") to find the period before the extension
  • uses Slice to get the filename without the extension
  • converts the filename without the extension back to a string

I found this paragraph in an article exploring spans that gives an example of when you might want to use a span. It helps shed light on the benchmark results.

Or take another example. You’re implementing an operation over System.String, such as a specialized parsing method. You’d likely expose a method that takes a string and provide an implementation that operates on strings. But what if you wanted to support operating over a subset of that string? String.Substring could be used to carve out just the piece that’s interesting to them, but that’s a relatively expensive operation, involving a string allocation and memory copy. You could, as mentioned in the array example, take an offset and a count, but then what if the caller doesn’t have a string but instead has a char[]? Or what if the caller has a char*, like one they created with stackalloc to use some space on the stack, or as the result of a call to native code? How could you write your parsing method in a way that didn’t force the caller to do any allocations or copies, and yet worked equally well with inputs of type string, char[] and char*?

That was a fun diversion and an informative peek under the hood. Happy coding!

Published October 09, 2024 by

undefined avatar
Chad Peters Senior Application Developer

Suggested Reading