Mixed Base Formatter

Photo by Ales Nesetril on Unsplash

Introduction

The .NET framework provides a variety of options for converting objects, enums and structs into string representations. The trusty ToString method is always handy and, of course, there are plenty of variants that allow for the possibility of using standard or custom format strings, culture specifiers, and more. But, as flexible as most .NET formatting schemes are, they could always be a little more flexible, right?

Way back about 2003 I found myself in need of a custom .NET formatter that could represent any number using a custom formatting mask, where the mask contained a token for each digit in the number. That wasn’t too complicated but then I realized I also needed the ability to represent those numbers using more than one base at a time – possibly even a different base for each digit in the number. That’s a little weird, I admit, but it’s what they wanted from me. Then, to make the things even more interesting, they asked me to allow literal tokens anyplace in the mask that should be copied verbatim to the output.

So, for instance, I was asked to create a formatter that could take a number, say ‘201’, and convert that value using base 16 for the first digit, base 2 for the second digit, base 8 for the third digit, and base 10 for a final digit. In other words, I needed the ability to format the number ‘201’ and have the results come out as: “1-04-1” by using a formatting mask of: “-HBO-D”, where the two ‘-’ characters were literal tokens and the ‘H’, ‘B’, ‘O’, and ‘D’ characters were placeholders for base 16, base 2, base 8, and base 10 digits.

MaskedFormatter

The code I came up with is now part of my NUGET package called CG.Formatter. The class itself is called MaskedFormatter and it can be used just like any other kind of .NET formatting object. Here are a few examples of how the class works:

 

Usage rules

The only real rules for using the MaskedFormatter are as follows:

  1. You MUST use a mask for every operation.
  2. The mask MUST start and end with the “$” characters.
  3. The mask MUST contain enough non-literal, non-escaped tokens to convert whatever value is supplied. For instance, if you want to convert 345 to binary, your mask better have at least 9 tokens in it. If you want to convert 345 to octal, your mask better have at least 3 tokens in it.
  4. The formatter can convert arguments of these types: sbyte, byte, short, int, long, ushort, uint, ulong, byte[], or GUID.
  5. The formatter can convert a number using any combination of the following mask tokens: D (decimal), B (binary), H (hex), O (octal), or Z (base 36). Any token can be escaped with a leading backslash.

Violate one or more of these rules and the formatter throws an exception.

The code

The code starts with a customer interface, which is shown below:

IMaskedFormatter defines a type that is a format provider AND a custom formatter. The corresponding class is called MaskedFormatter. Here is the trimmed down code for that class:

 

The code starts with a couple of arrays: TOKENS and BASE36SYMBOLS. TOKENS contains the collection of all valid formatting characters. BASE36SYMBOLS contains a lookup for converting a value to base 36. We’ll see more of these two arrays as we move along.

One of the method on the IFormatProvider interface is the GetFormat method. That method is implemented here and we use it to determine when the caller has asked for a type of ICustomFormatter. When that happens, we respond by returning an instance of our MaskedFormatter object.

The ICustomFormatter interface defines a method named Format. That method is implemented here and we use it to perform the conversion operations. The first thing we do is trim the mask to remove the possibility of embedded white-space interfering with the conversion process. The next thing we do is create a temporary BigInteger, which we’ll use during the conversion process. Afterwards, we do some basic validation of the mask. If the mask isn’t valid we fall back and return the results of default .NET formatting.

Assuming the mask is valid, we then wrap the incoming argument into a BigInteger object. We do that so we don’t have to worry about overflowing anything and, also, so we can work with a single type during the rest of the operation. If the input argument is an unknown type we throw an exception.

Once we’ve verified the mask and wrapped the input argument, we strip off the leading and trailing ‘$’ symbols, since we don’t actually need those for our processing. Then we start iterating through the mask, one character at a time, using the contents of the mask to drive the conversion process. Each non-escaped, non-literal character in the mask represents a digit in the converted number. So, each time we iterated through another mask character, we’re really converting the next digit in the formatted number. For each digit, we insert the converted value into a temporary StringBuilder instance, then we reduce the value of the number to be formatted by whatever base corresponds to the mask character. So, we use base 10 for ‘D’, and base 2 for ‘B’, etc.

After we’ve iterated through all the non-escaped, non-literal characters in the mask, we’re left with a value that should be zero. That is to say, if the mask had enough non-escaped, non-literal characters in it then we should have a zero value left over. If not, that indicates that we ran out of mask characters before we completely formatted the input argument. Since that condition represents a form of numeric overflow, we check for it and throw and exception if we encounter that situation.

Assuming we didn’t overflow, then the StringBuilder contains the converted value of the input argument, formatted using the mask as a guide – except, we haven’t taken into account any literal characters in the mask. We do that with the next loop, where we iterate through the mask again, this time adding any escaped or literal characters into their proper place in the converted value.

When we’re done with the second iteration, we’re also done with the formatting operation. The only thing left to do is return the value of the StringBuilder object, which contains our formatted value, complete with any literals.

 

Final thoughts

With every bit of code I’ve ever written it seems like there’s always a little bit of room for improvement. This class is no exception to that rule. My biggest complaint, as I’ve used this formatter in various projects, is that the mask MUST have enough digits to completely convert the argument or it results in an exception. There are several ways I could try to address that situation:

(1) I could ignore left over values during a conversion and only worry about whatever the caller specified in the mask. For instance, if the argument is “3450” and the mask is: “DD”, I could simply return the value “34” and forget that I’ve left the remaining bit, “50”, unconverted. My only problem with that approach is, “3450” is NOT “34”, even if the mask is “DD”. So, I have to get my head around accepting incomplete conversions and I’m just not there yet.

  1. I could assume that any left over value in the argument should be converted using whatever the last token was, in the mask. For instance, if the argument is “3450” and the mask is: “DD”, then the result would be “3450”, since we would continue to convert the left over value using base10, since the last non-escaped, non-literal mask token was “D”. But, how intuitive is that approach when the mask has tokens with several bases in it? In that case, does it still make sense to keep converting any left over value using the last mask token? Does it really matter if the mask contains multiple tokens? I’m still not sure but this approach seems better, to me, than approach #1.
  2. I could combine approaches #1 and #2, using one or more parameters, and let the caller decide what they want the behaviour to be. I like that approach best and I might go implement that when I get the chance to make that change and test everything.

 

So that’s about it. I hope someone benefits from the MaskedFormatter class. Have fun with the code!

 

The code for this article is part of my NUGET package CG.Formatter, which can be downloaded for free at https://github.com/CodeGator/CG.Formatter

The source code for the CG.Formatter project lives on Github and can be obtained for free at https://www.nuget.org/packages/CG.Formatter