Borrowing in Rust
I wrote
previously
that I was working a lot in Rust at work, and hinted
that while I like it, certain design decisions disappoint me.
One of these has to do with Rust’s most notorious difficult:
the borrow checker.
It is not my intention to attempt a full explanation of the borrow checker.
Suffice it to say that data in Rust is either
owned or
borrowed:
a variable either owns its data or borrows it from another variable.
This concept turns out to be
quite powerful, and effective.
That being the case, let me pose the following question:
Is the following code correct?
let mut result = 0;
for row in matrix.iter() {
for element in row.iter() {
result += element;
}
}
The reader is correct to suspect that this is a trick question.
Followup question: what about the following?
let mut result = i32::MIN;
for row in matrix.iter() {
for element in row.iter() {
result = result.max(element);
}
}
Why is this a trick question?
The reader unfamiliar with Rust, but familiar with type-checking,
may wonder whether it has something to do with the types of
result
and
element
.
You would be correct! In fact, you would be more correct
than at least a few moderately experienced Rustaceans I’ve observed,
whose stated approach is to just type and let the IDE’s language plugin
correct them when they’re wrong.
I could be wrong, but relying on the IDE as a crutch strikes me as suggesting
the presence of unneeded conceptual complexity.
I’m not dealing with morons here;
these are very,
very smart people.
The experienced Rustacean reader hopefully knows which code is incorrect,
and how to correct it:
let mut result = i32::MIN;
for row in matrix.iter() {
for element in row.iter() {
result = result.max(*element);
}
}
If you don’t see the difference between the second and third examples,
look carefully.
It’s there.
See it yet? It’s the absence (or presence) of an asterisk right before
element
in line 4.
Whether you put one there depends, somewhat perversely,
on whether the function you call
borrows or
moves its data.
To illustrate this problem further, the following code’s correctness
depends entirely on whether matrix
itself
borrows or owns its data:
let mut result = 0;
for row in matrix.into_iter() {
for element in row.into_iter() {
result = result.max(*element);
}
}
For example, if
matrix
is of type
&Vec<Vec<f32>>
, you need an asterisk
— it’s borrowed,
which means
element
is a reference,
and you have to dereference it to have the same type as
sum
—
but if
matrix
is of type
<Vec<f32>>
,
no asterisk is needed —
it’s owned, so there’s no need to dereference.
Clear?
Apparently it isn’t, since, as I say, I routinely see moderately
experienced Rustaceans let the IDE or compiler correct them.
The effect is such that when I asked a
very experienced senior developer
the difference between
iter()
and
into_iter()
,
he got it wrong:
he simply tended to use
.into_iter()
unless the IDE or compiler tells him it’s incorrect.
The reality is exactly backwards, though to be fair the Rust manual
goes out of its way to tell you in symbols,
rather than words.
That’s characteristic of Rust:
like C, it prefers that you memorize the meanings of
&
,
*
,
but unlike C
&*
is necessary, because…
Well, that gets us to disfavored design decision #2, which is related
to the convoluted explanation I had to give above:
Rust notation conflates concepts with implementation
Rather than use the
&
to indicate the
concept of borrowed data, Rust’s designers used it
to indicate the
implementation:
You have to
use it to borrow data, but it’s not a designation
of the borrowing concept; it’s a designation of the tool used to borrow:
references.
Both
the
Rust manual and many Rustaceans will tell you that the two are one and the same,
but we see above that this is not in fact the case..
In a similar way,
*
doesn’t mean
“use borrowed data”;
it means “dereference referenced data.”
The necessity of a
*
when you want to
use borrowed data depends entirely on whether the client is willing to
use a borrowed copy, or insists on having the data moved to it.
And if the client insists on having the data moved to it, then God help you,
because the only way out of that is a
.clone()
— assuming you have one available.
That brings us to disfavored design decision #3: the default action in Rust
is to move data, rather than to borrow it.
According to one of the Rust designers’s answers on Reddit,
this was a deliberate choice made to suit their preferred programming style.
Very well, but there are an
awful lot of
&
’s
littering most Rust code I’ve seen;
in most cases I work with, I use an
&
to pass the parameter, because borrowing is more or less painless:
I almost never
need to “drop” data, to use Rust parlance,
and one of Rust’s strengths lies precisely in that
the compiler can always figure out when
it can drop the data.
So there’s no practical reason for me to move data, and I suspect
that’s the case for many others, as well.
To conclude, I’d like to compare this
to one of Ada’s design decisions that I really appreciate:
procedures accept parameters in one of three modes:
in
, which means the parameter is to be
read but not modified;
out
, which maens the parameter is to be
assigned but not read; and
in out
, which means what you hopefully
expect: the parameter can be both read and assigned.
A common question one sees in Ada forums is,
How do I tell the compiler I want to pass the parameter by reference?
The typical answer is:
You don’t. That’s not your job.
The programmer’s job is to specify the solution to a problem
at a conceptual level, and
not to micromanage the details.
There is some not entirely invalid criticism to this, in that
if you look at the following Ada code, it’s not clear
what mode
Do_Something_With
accepts for
Thing
:
declare Thing: Integer;
begin
Do_Something_With(Thing);
end;
But that’s beside the point, I think.
Perhaps we can agree that we should praise Rust’s designers for deciding
that the corresponding Rust code would be
let thing = 0;
do_something_with(&mut thing);
…if we want to change
thing
, or
let thing = 0;
do_something_with(&thing);
…if we don’t; or even
let thing = 0;
do_something_with(thing);
…if we never intend to use
thing
again.
I
do appreciate that sort of explicit designation.
Though I do wish they’s used
borrow
instead of that ampersand, or perhaps
move
with a different design… but you go to code with the language you have,
not the language you wish you had… and I suppose that if I work with Rust
long enough, I may come to appreciate the decision to make moving the default.