It is almost ubiquitous knowledge among educated and/or experienced programmers that “magic numbers” (i.e. hard-coding specific numeric values in multiple different locations in code) is generally highly damaging to code quality, readability, and maintainability.
It is a lot harder to change code that uses a lot of magic numbers, because every time you need to change one of those numbers you not only have to remember (or find) all the places where that number is, you have to also make sure that each instance of that number that you do find is actually related to the other ones that you are changing and not merely coincidentally the same. This can quickly become nightmarishly tedious, error prone, time consuming, and inflexible. Thus, it is common to instead use named constants so that it is easy to change all the related number instances at once in a much more foolproof way.
One thing I’ve noticed for many years now though is that it is surprisingly extremely common for many programmers to not realize that hard-coding types in multiple locations often has similarly negative effects on code quality, clarity, and productivity. Programmers often write the same type name in multiple places in related code, without any apparent awareness that without explicit connections between those instances of that type you are essentially creating the same problem as magic numbers, except for types instead.
Interestingly though, even though this is a pretty basic idea, I’ve never even been able to find any evidence that this type of code flaw has any existing terminology by which to refer to it in popular circulation and indeed if you try to search for info about it on the internet it is often not easy to even find related discussions, even putting the lack of terminology aside. You can also see from reading many different tutorial pages, codebases, and books that a huge proportion of existing code also has little/no real awareness of the problem.
So, as of this article, as you can see, I am actually apparently coining a new term for this phenomenon: magic typing
A concrete example should ideally be given, as usual in programming, and so I will do so here. As such, I’ve written a very short and simple program in plain old C that demonstrates the difference between the same function written with magic typing and written without it. I compiled the code with clang
and verified that it does indeed work (Test your code whenever possible! I see too many errors and uncompilable code in tutorials and books!):
#include <assert.h>
#include <stdlib.h>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof(array[0]))
//Don't forget about array decay by the way, if you use this.
void func_with_magic_typing(void) {
const double num_array[] = {1.0, 2.0, 3.0};
double num_array_sum = 0.0;
for (size_t i = 0; i < ARRAY_SIZE(num_array); ++i) {
num_array_sum += num_array[i];
}
assert(num_array_sum == 6.0);
}
void func_without_magic_typing(void) {
typedef double num_t;
const num_t num_array[] = {1.0, 2.0, 3.0};
num_t num_array_sum = 0.0;
for (size_t i = 0; i < ARRAY_SIZE(num_array); ++i) {
num_array_sum += num_array[i];
}
assert(num_array_sum == 6.0);
}
typedef int exit_status_t;
exit_status_t main(void) {
func_with_magic_typing();
func_without_magic_typing();
return EXIT_SUCCESS;
}
Notice that func_without_magic_typing
is using C’s typedef
system for abstracting over types to make the connection between the fact that num_array
is an array of double
s and num_array_sum
is a double
sum explicit instead of leaving the connection implicit and error prone.
Also, it is important for understanding this code to realize that typedef
in C never defines a new type, despite what its name would seem to imply. The “def” in typedef
is intended to have the connotation of a macro (akin to #define
and hence behaving similarly to a text substitution). In fact, a better name for typedef
would have been type_alias
or type_synonym
really.
This is also why my choice to declare the main function as exit_status_t main(void)
, rather than as the much more common but conceptually inferior int main(void)
, works here. The fact that typedef
never defines a new type, but only ever an alias or synonym for a type is why this is guaranteed to work on any C compiler that is actually correctly conformant to the correct meaning of typedef
.
Self-describing code is better than opaque code generally, and typedef
provides a great way to retroactively cover up some poor naming choices in old or 3rd party code in order to make things much more readable. This trick only works because typedef is an alias/synonym though. Macros and other kinds of wrappers would be another approach. Pros and cons though, as always in programming.
Also, for those who aren’t aware, the convention in C of suffixing type names with _t
is intended to mean “type”. The “t
” is short for “type”. The purpose of this is to make it clearer that a name stands not for a value but for a type, which would otherwise sometimes be less clear.
For example, the identifier exit_status
(if we used that instead of exit_status_t
) would sound more like it could refer to a specific named value and not a type. This is a small and subjective matter though. It’s fine if you ignore that convention in C and C++. Pragmatism and value creation for the end user are what matter most, not getting bent out of shape over every little detail.
Also, notice that I use assertions instead of print statements to test things. Assertions are generally a vastly faster way of testing that things are behaving as intended than print statements are. With assertions you never have to spend any time reading any text output and slowly (especially compared to a computer) mentally checking that things ended up being what they were intended to be.
In fact, for a long time now I’ve thought that programming should be introduced with assertions first, instead of with those banal “hello world” programs that are so overused within the community. Assertions and unit tests, not print statements, should be the bulk of how you test code.
Maybe I should write an article about that too at some point (i.e. about the overuse of print statements and how assertions are so much better in most cases, etc).
Also, the exact comparison of a double
here is actually completely fine in this case, because we know that the number will always be fully represented exactly in a double
, just like any integer of a sufficiently small size (which is very large actually, and indeed elegantly reusable for a very broad range of things… ain’t that right Lua ππ). Don’t let “best practices” cause you to program too rigidly and ritualistically. If you know something is safe then it’s fine. Pragmatism is better than toxic perfectionism and pedantry. Too much fear of judgement and social posturing tends to damage creativity, intellectual honesty, and personal growth, but unfortunately such things are far too common within programming culture, especially on the internet. There should be more kindness-based programming and less fear-based programming.
Anyway though, I got a bit side-tracked there for a few points. I next want to briefly talk a bit about C++, which has some additional useful things for connecting types together in more diverse, expressive, or syntactically clean ways.
Specifically, I just wanted to mention that C++ has three especially useful additional features for types for the purposes of this discussion: (1) using
declarations, (2) decltype
, and (3) the STL’s value_type
convention.
Regarding item (1):
C++ has a more refined and more capable version of typedef
, which is to use using
declarations. C++ using
declarations can be template specialized better than typedef
, among other uses (e.g. bringing in namespaces), but they also are just more syntactically clean since typedef
has a slightly odd syntax owing to the fact that typedef
reuses much of the same syntax as other type declarations do in C (i.e. typedef old_type_name new_type_name;
has the same underlying syntax structure as variable declarations, like const int identifier;
or volatile size_t identifier;
for example).
So, for example, for a side by side comparison:
typedef old_type_name new_type_name;
using new_type_name = old_type_name;
Notice how the C++ using
syntax mirrors the same syntax structure you’d use for assignments, which makes it look more uniform when you mix it in with a bunch of variable definitions. For example, compare this:
typedef double num_t;
num_t accum = 0.0;
… with this:
using num_t = double;
num_t accum = 0.0;
… and see that the C++ version makes it so that abstracting over types in the same sense that you’d abstract over values feels a bit more syntactically clean. It is admittedly subjective though.
Regarding item (2):
C++ also provides the decltype
keyword, which is a really awesome keyword in C++ that lets you force the compiler to substitute in whatever type the compiler can deduce that any arbitrary given expression would have, at the location of the decltype
use, without causing any side-effects that the expression would normally cause if placed elsewhere.
For example:
decltype(1 + 2) x = 0;
//same as int x = 0;
decltype(std::cout << "hi") os = std::cout;
//same as defining an output stream, "hi" is never printed
os << typeid(os).name() << '\n';
//prints "class std::basic_ostream<char,struct std::char_traits<char> >" to std::cout
And so, as you can see, decltype
provides a great diverse number of ways that you can connect the meanings of types in your code together explicitly so that those connections remain logically correct and instantly update whenever changes are made elsewhere, without having to tediously manually go through and remember to correct a bunch of redundant type names like you analogously would when fixing up magic numbers.
Regarding item (3):
All of the container data structures in the C++ standard library’s STL are required to expose a type that captures what the type of the elements contained in that container are, so that you can easily reuse that type for your own declarations connected to your use of that container type.
The container classes accomplish this by placing a using
declaration corresponding to their template element type input into their public interface, like so:
public:
using value_type = T;
That type can then be accessed elsewhere, like so:
ContainerType::value_type
… which is handy for properly connecting your types together with less tedium. There’s a good chance that you should add value_type
(and/or similar declarations) to many of your own template classes.
Finally, in conclusion:
Despite the fact that magic typing is a big problem in code quality and maintainability, and is very common, I often find myself frustrated by how abundantly it is neglected in many codebases and tutorials (etc) and also how many programming languages and systems don’t even provide adequate means for expressing the type relationships and type connections between things properly. Many language and libraries will force you into writing brittle code whose types aren’t/can’t be connected properly, which both degrades code quality and wastes time and productivity.
The dangers of magic numbers are almost universally known, but the dangers of magic typing still seem too widely underappreciated in the programming community at large. We need to work on that I think. Having an explicit term for the problem (or “anti-pattern” if you want), such as “magic typing” as I have suggested here, would help raise awareness of that.
Anyway though, I hope you enjoyed reading my article, found some useful ideas/thoughts in it, and that you will have a great day/night/week/etc!
ππ