A first Rust macro

A while back, I left the university for a job in software development. I’m glad I made the change, and will write more about it later, though the basic reasoning can be found here.

In my new job I am nominally a Java programmer, but mainly I’ve been writing Rust. In general, I like it, but some of the design decisions irk me, and Ada remains my favorite.

That discussion can wait for later. For now, I want to show off my first Rust macro. I’ll explain what that is in a moment.

The problem

My motivation for writing the macro is that my experience with languages like Python and Ada, where one can initialize a list or array in one line.

For instance, in Python you’d write:

l = [ i * i for i in range(n) ]

…while in Ada 2022you’d declare:

L: array ( 0 .. N ) of Integer := ( for I in 0 .. N => I * I );

The current edition of Rust has no way to do this. You have to do it in two steps:

let mut l = vec![];
for i in 0..n { l.push(i*i); }
// or maybe this instead
let mut l = Vec::with_capacity(n);
for i in 0..n { l.push(i*i); }

There are a few ways to do it, but they all requires at least two lines: one to initialize the vector (typically with 0’s), and one to fill it with the desired values. (So saith StackOverflow, at any rate.)

(Some Rust-familiar readers may object that one can, in fact, initialize some vectors in just one line. For instance: let l = vec!( 0, 1, 4, 9, 16 );. This is correct, so long as you know exactly how many elements you need at compile time, and don’t mind typing out their values exhaustively. However, if you don’t know the exact size at compile time, or don’t want to list the elements exhaustively, you need at least two lines.

The tool: Macros

Unlike Ada or Python, Rust offers a facility to “extend” the language, so to speak, via macros. You can think of a macro as a script that the compiler executes to replace some source code with other source code. If you’re familiar with C, then yes, it’s a lot like the C preprocessor, only safer and more powerful — just like almost everything in Rust is safer than C. In particular, Rust will hand you the relevant part of the source code’s abstract syntax tree, and you can mess with it to your heart’s content.

I needed to learn how to write macros, and the examples I found online were absolutely inadequate: either they were too trivial to be useful (looking at you, Rust By Example!) or they were too complicated and unhelpful for the problem I wanted to solve.

At first I tried attacking it with a declarative macro. I had trouble with it for a while, then concluded that I had to use a procedural macro. That conclusion was wrong, as it turns out; a colleague worked it out while I was struggling with procedural macros. It wasn’t quite what I wanted, so together we got it to where I wanted.

I won’t show that result here, in part because I’m more interested in procedural macros, which are more flexible than declarative macros. Instead, I’ll show the third version of the procedural macro solution.

This may look complicated, but it really isn’t. I’ve included some comments to explain what’s going on, and will add some commentary as well.

The solution

This comes from a crate I’m writing called macros. First, the all-important macros/Cargo.toml:

[package]
name = "macros"
version = "0.1.0"
edition = "2021"

[lib]
proc-macro = true

[dependencies]
syn = { version = "1.0.92", features = [ "full" ] }
quote = "1.0.18"

Next, in macros/src/lib.rs, the macro definition:

// formula-defined vectors

// necessary to define procedural macros
extern crate proc_macro;

// importing other crates' definitions
use proc_macro::TokenStream;
use quote::quote;
use syn::{
    Expr,
    ExprClosure,
    parse::{
        Parse,
        ParseStream,
    },
    Token,
};

// this struct helps keep the information we extract from the syntax tree
struct FormVec {
    size: Expr,           // the vector's desired size
    formula: ExprClosure, // the formula defining the vector's elements
}

// this implements the `Parse` trait by implementing its required
// `parse` function; as a result, we can parse a formula vector
// in a natural, Rust-like idiom:
//    form_vec!( , |  |  )
// This construct appears several other places in the code; for instance,
// the `fold` function expects an argument of the form
//    .fold(  | ,  |  )
impl Parse for FormVec {

    fn parse(input: ParseStream) -> syn::Result {
        let size: Expr = input.parse()?;
        input.parse::()?;
        let formula: ExprClosure = input.parse()?;
        Ok(FormVec { size, formula } )
    }

}

#[proc_macro]
/// Read the form
///    form_vec!( , |  |  )
/// and create a vector of given size according to the given formula.
/// # Examples
/// ```
/// let vec = form_vec!( 5, | x | x * x );
/// assert_eq!( vec, vec!( 0, 1, 4, 9, 16 ) );
/// ```
pub fn form_vec(input: TokenStream) -> TokenStream {

    // parse `input` according to the `parse` function defined above
    let fv: FormVec = syn::parse(input).unwrap();
    // get the information
    let size = fv.size;
    let closure = fv.formula;
    let var = closure.inputs.first().unwrap();
    let var_type = closure.output;
    let formula = closure.body;

    // replace the code
    let result = TokenStream::from( quote!(
        {
            // initialize a new vector
            let mut tmp = Vec::<#var_type>::with_capacity(#size);
            // avoid initializing with useless data
            unsafe { // not unsafe! tmp is allocated w/capacity #size
                tmp.set_len(#size);
            }
            // apply formula to the entries
            for #var in 0..#size {
                tmp[#var] = #formula
            }
            tmp
        }
    ) );

    result

}

The test code

This comes from a crate called macro-testing. Again we start with macro-testing/Cargo.toml.

[package]
name = "macro_testing"
version = "0.1.0"
edition = "2021"

[dependencies]
macros = { path = "../macros" }

And now the main program in macro-testing/src/main.rs:

use macros::form_vec;

fn main() {
    let n = 10;
    let result = form_vec!( n, | x | x * x );
    assert_eq!( result, vec!( 0, 1, 4, 9, 16, 25, 36, 49, 64, 81 ) );
    println!("assertions pass!");
}

When we build and run with cargo run --release, we see this output:

$ cargo run --release
   Compiling proc-macro2 v1.0.39
   Compiling unicode-ident v1.0.0
   Compiling syn v1.0.95
   Compiling quote v1.0.18
   Compiling macros v0.1.0 (/home/cantanima/common/rust/macros)
   Compiling macro_testing v0.1.0 (/home/cantanima/common/rust/macro_testing)
    Finished release [optimized] target(s) in 9.06s
     Running `target/release/macro_testing`
assertions pass!

In addition, cargo clippy offers not one complaint. Time to celebrate!

Additional commentary

Some additional comments:

I really like the result. It looks like ordinary Rust code. In principle, a macro could make Rust look entirely un-Rust-like. Part of the “fun” of some C and C++ programming is trying to figure out what a programmer’s trying to say, and heavy use of the preprocessor does not help. So while I like this, macros can damage readability.
While I used a fairly simple closure (| x | x * x) I’ve tested this with more complicated closures (e.g,, | x | { /* complicated stuff */ }).
If you’re new to Rust, it isn’t necessary to list each import from the same crate on a separate line, the way I did in lines 9-11. In fact, most everything here could be written on one line, but it would be a lot less readable.
Notice how the parse function proceeds methodically through the token stream:
```
let size: Expr = input.parse()?;
input.parse::()?;
let formula: ExprClosure = input.parse()?;
Ok(FormVec { size, formula } )
```
First it tries to parse an expression (Expr). Next, a comma. Then, a closure. The syn crate is helping us out here.
For those unfamiliar with rust, the .unwrap method is often used when a function might fail.
- Such functions return data of type Result, which has value either Ok or Err.
- Typically, one ought to consider these possible values, and act accordingly, but it’s often the case that if something returns Err then you might as well give up. In that case, you .unwrap, which gives you the underlying value when it’s Ok; if instead it’s Err, the program panics.
- The only way to avoid an immediate panic is to handle the error, or to append the question mark, as I did here. This quits the parser immediately and passes the error to the caller, form_vec in this case. It is then the caller’s responsibility to handle the error; otherwise, the program will panic.
- Notice that the parse function returns a Result type; if it didn’t, we couldn’t use the question mark here; we’d have to handle the potential error immediately. Likewise, we would not return an Ok at the end of the function.
Starting at line 57, form_vec extracts the relevant data that we parsed.
The quote! macro generates the new code. That’s essentially the Rust code that we want, except that we prefix substitutions with #: for instance, #var_type and #size in line 67. This substitutes the values we parsed earlier.
This particular implementation uses “unsafe” code to avoid initializing the vector’s elements with data (e.g., 0) that we will throw away the moment we apply our formula, or alternately to avoid using a .push method on the vector, which will (?) waste time checking index bounds. The .set_len method is considered unsafe, but the usage here is not because we were careful to allocate the needed capacity right above it. If you don’t like to see unsafe in Rust code, delete lines 69-71 and replace the current line 74 with tmp.push(#formula). (That was how my original implementation worked.)