C++ struct
carelessness
One in a possibly never-ending series of reflections on what makes C++
a terrible choice for programming.
At work I recently updated a bridge between our Rust project and a C++ API.
The code includes both extensive, automated testing on both the Rust and C++ sides,
as well as an example C++ program to illustrate feature usage.
Everything in Rust built and tested successfully,
and everything in C++ built
and
seemed to run successfully,
but for some mysterious reason one test didn’t pass.
Eventually I realized the problem came down to this:
// in api.h
struct Thing {
Type1 first_field;
Type2 second_field;
Type3 third_field;
Type4 fourth_field;
};
// in tests.cpp
Thing one {
first_field_value,
second_field_value,
third_field_value
};
The precise values of
Type1
,
Type2
, etc. don’t matter;
what matters here is that:
fourth_field
was not initialized,
because…
- lines 10-14 had existed in
tests.cpp
since time immemorial;
- the latest API change added line 6
(
fourth_field
);
- lines 10-14 are part of a setup for a web server integration test;
- said test required every field,
including
fourth_field
,
to have a particular value;
- since lines 10-14
did not initialize
fourth_field
,
it contained data the server did not expect, so that the test failed.
On the one hand, that’s a good thing: the test
should fail
when you feed it garbage data.
Debugging this in a test, then finding the error,
reminds the programmer using this api to hunt down
every instance of
Thing
in the codebase
and properly initialize
fourth_field
when needed.
On the other hand,
- It wasn’t supposed to fail.
- It’s a rather subtle bug — sure, it stands out here,
because I don’t have it surrounded
by hundreds or thousands of lines of code,
much of it “delicate” enough to have broken in the past,
compelling me to check lots of wires before I finally came to this one.
Hunting this down wasted the better part of an hour.
- That’s one hour I could have spent doing something else.
- If we didn’t have such a strict testing policy,
the programmer using our api might well not think to go fix this,
leading to bugs on the client’s end that are hard to debug
and may well seem like bugs on the server’s end.
(After all, we just updated our code,
while their code hasn’t changed!)
The average C/C++ programmer may think that
this is how programming
should work;
there could be cases where you don’t need to initialize
every field of a
struct
—
say, for whatever reason, you only need those fields in certain cases,
but not all —
and initializing things you don’t need wastes a few precious nanoseconds
of CPU time. Per the C++ motto,
You don’t pay for things you don’t use.
The trouble with this philosophy is that
most programs are read and/or modified, not written, then forgotten.
As a result, all too often in C++,
You pay — hard — for things you forget to use.
As Donald Knuth might say,
Premature optimization is the root of all evil (in programming).
Is there a safer way?
Of course there is, especially if you’re willing to trust the optimizer
to remove things you don’t need,
and focus on solving the problem reliably.
Ada
Let’s look at what happens when I try this in Ada.
with Ada.Text_IO;
procedure Test_Ada is
type Thing is record
First, Second: Integer;
end record;
One: Thing := ( First => 4 );
begin
Ada.Text_IO.Put_Line(One.First'Image & ", " & One.Second'Image);
end Test_Ada;
This fails at compile time:
$ gnatmake test_ada.adb
gcc -c test_ada.adb
test_ada.adb:7:18: error: no value supplied for component "Second"
This forces you to define the
Second
field.
If you don’t care what value it has, just initialize it to the default:
One: Thing := ( First => 4, Second => <> );
(
<>
is Ada shorthand for “default value”,
and is called “the box” on account of its looks.)
This now compiles and runs as expected.
(Interestingly, the output on my machine depends on the optimization level.
That does not happen with C++, which always spits out
0
for the second field.)
Rust
The equivalent Rust code would be:
#[derive(Debug)]
struct Thing {
first: isize,
second: isize,
}
fn main() {
let one: Thing = Thing { first: 4 };
println!("{:?}", one);
}
Like Ada, Rust refuses to compile this.
Unlike Ada, Rust gives a characteristically verbose error message:
$ rustc test_rust.rs
error[E0063]: missing field `second` in initializer of `Thing`
--> test_rust.rs:8:21
|
8 | let one: Thing = Thing { first: 4 };
| ^^^^^ missing `second`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0063`.
Getting this to compile when you don’t care about
second
’s value
is a little harder in Rust, but not much.
You’ll need to derive the
Default
trait,
then call it explicitly. Modify the following lines:
#[derive(Debug, Default)]
let one: Thing = Thing { first: 4, ..Thing::default() };
Again, this compiles and runs as expected.
Summary
Requiring a
struct
’s user
to define all its fields, as Ada and Rust do,
prevents the introduction of bugs by adding fields during API revision.
It may carry a very small run-time cost,
but if you really don’t need a field every time,
then perhaps you don’t so much need
a
struct
as a
union
,
or, if you want to use a safer, more modern C++,
a
std::variant
.
Unfortunately, the C++ language committee must not like that approach, as
it went out of its way to make
std::variant
incredibly painful to use,
a topic I’ll visit at some point in the future.