Making C Code Uglier
By Artyom Bologov
C++ is practical, yet sometimes scary. C is outright frightening. If someone writes code in C++, they must be smart. If someone writes code in C, they must be crazy (well, at least I am.)
But still, C—with its guts full of eldritch horrors—is the lingua franca of programming and the most portable assembly language.
C is readable enough to most programmers, because most mainstream languages are C progenies. Pointers and macros are loathsome, but they are rare enough (are they?) to ignore.
So how scary can C code get? Not as a production use, but rather as an exercise in aesthetics. This post goes through a set of things that can convolute/obfuscate C code, from the minute details to critical readability losses.
Note that some obvious things like
- Inconsistency.
- Typos.
- Bad naming.
- Pointer abuse.
are not mentioned to leave space for the scarier ones.
Test Program
I'm going to use a slightly modified C version of Trabb Pardo Knuth algorithm from Wikipedia because it's small enough while still showcasing most of C constructs/features:
Benign: Indentation and Bracket Placement Style
Two of four spaces? Eight? Or three, maybe? Or—God almighty—tabs? C code styles are numerous and these styles have only one thing in common: all the styles are mutually incompatible and un-aesthetic. No matter which style one prefers—they're delusional and wrong, at least to the ones exhorting another style.
I use Linux kernel style, which might make you scream from the 8 spaces-wide tabs. But I'm not surrendering it.
As a matter of example, I'll use the Pico indentation style (four/five spaces) and bracket placement (before the first expression and after the last one.) Plus added spaces mimicking the Glib style: ¶
Ugh, block scope and control flow are illegible now.
Confusing: Subscripts
A queer behavior of the standard array subscripts: the index and array parts can be swapped:
This reversal is modest, but nonetheless galling.
An exercise to the reader: can you find the exact spot where the subscript is reversed?
Antiquated: K & R style
That's where the post gets shuddery. K & R style, or, as they call it, "I don't understand old C code".
This style
- duplicates parameter names,
- moves the type information further away from the parameter list,
- removes the typing information from the function.
Luckily, C23 finally removes it, after more than thirty years of yielding to the horror and maintaining it in deprecated status.
Smart: Recursion
Reordering and refactoring functions is always fun. So how about turning all the for-loops into recursion? Recursion is cool, I've heard. So here's a recursive rendering of the number printing loop:
Five more code lines, lots of stack frames (unless you have tail call elimination),
and overall less comprehensible control flow. Yay!
Recursion is good, actually
Like some un-aesthetic and alienating changes this page lists, recursion might be useful. It can make your algorithms simple and powerful when done right. I often use recursion when writing Lisp. But I can relate to people seeing it as vile and perplexing.
Terse: Ternaries
This is my favorite: switching from if-else to ternaries. It's shorter, expression-only, and it makes code look more daunting. And there's a rumor that compilers increase the optimization level when they see ternaries. Likely, out of regard for programmer's bravery.
If only comma operator allowed for variable declaration
(wink wink C standard committee),
this function might've had no double y
in it either.
But, for now, let this stateful statement stay there.
Ternaries are good, actually
I like the ternary-formatted code because it forces a side effect-less algos where I want it to. It's even more useful in other C-like languages because they have less restrictive blocks and more abstractions compatible with functional style.
Unconventional: Delimiter-First Code
There are reasons one can use leading-delimiter style in SQL and Haskell. But in other languages...
I like how the ternaries become more pronounced and how it promotes a functional-ish style. But I bet, your eyes are already hemorrhaging, so feel free to ignore my aesthetic preferences.
Awful: Alternative representations
That's the most horrifying one:
C has alternatives to some characters
that weren't there at the time of the first standard.
There are two-(digraphs)
and three-character (trigraphs, deprecated in C23) encodings
for [
, ^
, {
etc.
Here's a table of transformations:
C char | Digraph | Trigraph |
---|---|---|
{ | <% | ??< |
} | %> | ??> |
[ | <: | ??( |
] | :> | ??) |
# | %: | ??= |
\ | ??/ | |
^ | ??' | |
| | ??! | |
~ | ??-
|
And here's the code with encoded parts:
And that's just digraphs, trigraphs are even worse!
Alternative representations are good, actually
There is a more useful side to alternative encodings.
<iso646.h>
provides the spelled-out logical operators
far more readable than single-character operators:
C operator | iso646.h spelled-out macro |
---|---|
&& | and |
&= | and_eq |
& | bitand |
| | bitor |
~ | compl |
! | not |
!= | not_eq |
|| | or |
|= | or_eq |
^ | xor |
^= | xor_eq |
Even though it's atypical, I'm tempted to use these in my projects.
Wrapping Up
Here's the final code for TPK algorithm. It compiles under Clang 13.0.1 on my x86_64-unknown-linux-gnu
😵 (the exact command is clang tpk.c -trigraphs -lm
.)
If you want some job security as a C or C++ programmer, you might use some of the things discussed above. But in any other scenario: you don't want to write code this way! Be kind to each other, even when y'all write chthonic C code.
Update: some commenters on Reddit mentioned IOCCC as an additional inspiration and further research direction. This post is by no means exhaustive, and you will likely find much more gory details if you explore IOCCC.
Another update: u/insanelygreat shared an absolutely horrendous piece of code and set of macros that turn C into something BASIC. Here's a small piece of code from their comment you should read in full: