Struct show_notes::e029::Script

source ·

pub struct Script;

Expand description

e029: I’m Out To C

Hello, I’m Chris Krycho and this is New Rustacean: a show about the Rust programming language and the people who use it. This is e029: I’m Out To C.

Manning is back sponsoring the show this episode! Once again they have a discount on some Rust video content they’re producing with Carol Nichols || Goulding and Jake Goulding, Rust In Motion—the very same we talked about when I interviewed them back in 2018! I’ve been saying for years that we need more video content: it’s a huge way people learn. Manning’s Rust in Motion is one of the first major video courses for Rust I know of (though I’m happy to report other video content is starting to appear as well). You can get it for 40% off at deals.manning.com/new-rustacean – there’s a link in the show notes. That link actually gives you 40% off of anything at Manning, including their book Rust in Action, which is in early access preview. Thanks so much to Manning for sponsoring the show and building some great video content with Carol and Jake!

C FFI

Now, let’s talk about Rust’s foreign function interface, or FFI, and specifically its C foreign function interface. For further materials on all of this, you’ll want to take a look at The Rust Programming Language’s materials, the API and the Reference’s writeups on extern, and the nomicon section on FFI. And speaking of Jake Goulding, his Rust FFI Omnibus a nice complement to those! I’ve linked them all in the show notes!

One of the key selling points for Rust is that you don’t have to just replace your existing code in other languages, including in C or C++, but can interoperate freely with them via C APIs—since nearly all modern languages have a C foreign function interface, or FFI. I’m actually going to talk a fair bit about that in the next teaching episode, both for extending other languages with Rust and for Rust calling into other languages. For today, though, we’re looking specifically at C and C++, because the interop for those other languages builds on what we have here.

By definition, C has a C API. And C++ libraries often have C APIs or can be expressed as C APIs given a certain amount of finagling. But interacting with a C API means you have to write a bunch of Rust bindings for that API. A binding is a description to the Rust compiler of what the API looks like: a set of declarations of the types of both data structures and functions. It’s important that we get these right, because if we don’t, stuff will explode on us.

Throughout the rest of this episode, I’m going to teach you a tiny bit of C, because in order to work with Rust’s FFI tooling, you’re going to need to understand at least a little bit of C. To do it really well when starting from scratch for interop with other languages, you’ll need to understand more, but in many cases there are tools already that mean you don’t have to do all the work yourself, which helps a lot. I’ll say more on that in an upcoming Crates You Should Know!

`extern`

The first thing we need to talk about is Rust’s extern keyword, which lets you define an external item: blocks, functions, types, and crates. I’ve linked to [the Reference’s writeup on extern][extern-reference], and it’s worth reading carefully.

You may recall that extern in the Rust 2015 Edition was required to declare all crates outside the current crate, with the extern crate syntax. This told the Rust compiler “Hey, this name isn’t part of this crate; you’ll need to look it up somewhere else!” Rust no longer needs that for crates in the 2018 Edition, as I covered in the news episodes where I talked about “path clarity.” However, we do still need the extern keyword, because there are functions and types we need to tell Rust to look up “outside”: no surprise given today’s topic, those are for FFI!

There are two forms where we use FFI. The first, which we’ll focus on in this episode, is extern blocks, which you use when calling out into something with a C ABI from Rust. The second is the extern modifier on an item declaration; we’ll talk about that in the next teaching episode, when we cover how you can use Rust from other languages.

In either of those positions, the extern keyword may optionally be followed by an ABI definition. An ABI is an application binary interface. Just as an API—an application programming interface—tells you as a programmer how to connect your code with other code, an ABI defines for the compiler how to connect different binaries together. Put another way: as an API is to source code, an ABI is to machine code. Some examples of the kinds of things an ABI has to define:

how do data structures get laid out in memory?
how are functions named when compiled?
what is the processor instruction set to be used?
how do system calls work in this particular context?
how is the stack laid out?

There are more details than that; the point is that those machine-level details are super important and you have to define them somehow. Rust has its own ABI, of course, though it’s not something you ever have to worry about unless you’re working on the compiler itself. What’s more, Rust’s ABI can and does change. Other, “stable” ABIs exist, though, which don’t change over time, and that is useful when you need to interact with code that compiles down to that format.

You specify which ABI you want to compile against with a string literal after extern., e.g. to use the stdcall ABI on Windows, you’d write The most common one is the C ABI!

There are two special cases to know about with extern:

When you write fn foo() (or indeed any item declaration), that’s equivalent to writing extern "Rust" fn foo(), specifying that this function will have the Rust ABI calling convention.
Because C FFI is far and away the most common, extern is short for extern "C", both in blocks and in function declarations.

An extern block is exactly what it sounds like: a block which is marked with extern (and optionally an ABI string). The two big takeaways here are:

All functions defined in an extern block are inherently unsafe, so even though you don’t have to and in fact can’t write unsafe on those declarations.
For all FFI, you have to wrap the definitions in an extern block.

So, speaking of definitions… let’s dive in and talk about how you actually define extern functions and interact with them!

The Basics

Any time we’re interoperating with something via a C API, we’re dealing with functions: there’s no other reason or way to interact across this kind of boundary!

We’ll start with a basic, apparently trivial example: calling out to C to perform addition. The C function signature is:

int add(int a, int b);

The body does what you expect: returns a + b.

On the Rust side, we need to provide a declaration (but of course no definition) so we know how to call that function. We do that with an extern block and a function type with no body—somewhat like a trait method definition.

use std::os::raw::c_int;

extern {
    fn add(a: c_int, b: c_int) -> c_int;
}

Notice the types here: c_int, not i32. Now, c_int is nearly always an i32, but it’s not guaranteed to be anything of the sort, so we use these types instead. The types are defined in two places: std::os::raw and the libc crate. The definitions are the same; the only reason I know of to prefer getting them from libc is if you’re in an environment without std, e.g. on an embedded system. Using c_int will make it so that if you happen to be on a system where it’s not an i32 and you call it with an i32 without an explicit cast, you’ll get an error. Handy!

Building C libraries with `build.rs`

Now, to make this work, you have to build the C you actually want to call. To do this, you’ll need to either use a tool outside of Cargo like Make, or use Cargo’s “build file” tool. For today, we’re just going to use the latter. In many cases, you’ll be using projects which already have their own build system, and will need to do some more work to tell rustc how to link those items into your app or library. The various references I mentioned at the top of the show all have info on how to do that.

For our purposes, I’m going to use this as an opportunity to talk through Cargo build files!

If you define a build key pointing to a Rust file in your package config in the Cargo.toml file, Cargo will compile and execute that Rust program before building the rest of your program. Conventionally, you can simply supply a file named build.rs and Cargo will do the same thing automatically, without even needing to specify the build key. If your build process is sufficiently complicated, you can just set up a build directory, make the root build/main.rs, and have a full Rust program in that directory which your build key points to. It’s quite powerful!

You can also specify [build_dependencies], which will only be used by the build.rs file. In the show notes for this episode, you’ll find a sample build.rs file which is using the cc crate to build a small bit of C code and set it up in the right spot to be statically linked into our program. The cc crate does the hard work of abstracting over the differences between C compilation environments on different platforms, so you don’t have to worry about working out those different invocations for gcc and clang and msvc and so on.

Structs

So that’s the basics for our add function, though I’ll have more to say about it in a moment.

Let’s say we had a very traditional struct type in C, just a plain old record—we’ll go with a Point because that’s such a common example. Let’s also imagine we wanted the ability to translate the point, to change the coordinates from one location to another. In C, we’d write that like this:

typedef struct Point {
    float x;
    float y;
} Point;

void translate(Point * point, float byX, float byY) {
    point->x += byX;
    point->y += byY;
}

Writing Point* point means translate requires a pointer to a given Point piece of data (but it makes no guarantees that there’s actually a Point behind it, only that C will treat the data layout as consistent with a Point). The net of this declaration is that we are taking in a pointer to a Point and can freely do whatever we want to the memory behind that pointer—quite unlike in safe Rust. We could also here mutate the pointer itself, pointing it to some other point in memory. If we wanted to make it so you couldn’t change what the pointer pointed to, we’d write it Point *const point. If we wanted to make it so we couldn’t change the contents of the point but could change what it pointed to, we’d write const Point * point. If we wanted to make it so we couldn’t change either, we’d write const Point *const point. Unfortunately, C would also let us cast away all of that attempted safety and just do whatever we wanted. In this case, we do the right thing: we just update the values in the point.

Now we need to write those same definitions in Rust. First, the Point type. This looks pretty much like what you’d expect:

use std::os::raw::c_float;

#[repr(C)]
pub struct Point {
    pub x: c_float,
    pub y: c_float,
}

It’s almost just a normal struct type—pub struct Point with public properties x and y, both of type c_float. So far so normal. However, we also need to add an attribute, #[repr(C)], to the struct declaration. This makes it so that Rust lays it out using the C ABI’s definition of how structs are laid out, rather than how Rust chooses to lay them out. Once we’ve added that, though, this definition is exactly the same as the C one and we’re good to go with interop.

Next up, the function definition, but with raw pointers instead of references. I skipped over this in my discussion of unsafe code in Rust, because I knew we’d hit it in this episode: Rust has a similar notation, in the form of raw pointers, written *const and *mut. In Rust we’d write this same first argument point: *mut Point, since we’re passing a pointer pointing at a thing which will be mutated. Notice an important difference here: when we write *const in C, it means “you can’t change the pointer”; in Rust it means “you can’t change the thing the pointer points to.” The full function signature for this in Rust, then, is:

use std::os::raw::c_float;

#[repr(C)]
pub struct Point {
    pub x: c_float,
    pub y: c_float,
}

extern "C" {
    pub fn translate(point: *mut Point, by_x: c_float, by_y: c_float);
}

The other arguments here are straightforward: c_float for the same reason we used c_int in the addition example.

Safe wrappers

You can expose an unsafe interface that just provides direct interop with C functions in your code. However, this means that everything which uses that code has to uphold its invariants! As I discussed back in Episode 27, the Rustic way of doing things is to wrap our unsafe functions in safe Rust functions, which check that all the invariants are correctly upheld. That way the vast majority of your code—even your code that relies heavily on external C API code—can just be normal safe Rust; we isolate the un-safety and the complexity and the testing burden.

So, taking that approach with the examples I gave above, we might have the same extern definition for the C functions, but wrap them in a non-exposed module—maybe called ffi—and only expose our own safe wrappers. That would be something like this:

use std::os::raw::c_float;

#[repr(C)]
pub struct Point {
    pub x: c_float,
    pub y: c_float,
}

mod ffi {
    use super::Point;
    use std::os::raw::{c_float, c_int};

    extern "C" {
        pub fn add(a: c_int, b: c_int) -> c_int;

        pub fn translate(point: *mut Point, by_x: c_float, by_y: c_float);
    }
}

pub fn add(a: i32, b: i32) -> Option<i32> {
    if i32::max_value() - a >= b {
        unsafe { Some(ffi::add(a, b)) }
    } else {
        None
    }
}

pub fn translate(point: &mut Point, by_x: f32, by_y: f32) {
    unsafe {
        ffi::translate(point as *mut Point, by_x as c_float, by_y as c_float);
    }
}

The add definition here tries to do something somewhat reasonable when dealing with the possibility of overflow—not a great solution (and not even a complete solution, since you could underflow, too!), but we need something here. The reason is that as silly as this kind of check might seem—about as calling out to C for addition is!—even something like addition which might seem to be trivially safe… isn’t.

While it might seem that something like addition is trivially safe, it turns out to be mostly safe. The behavior of overflow for signed integers isnot defined for C. In Rust, it is defined, by [RFC #0560]: in modes where debug_assertions are enabled, an overflow will cause a panic; in modes where those assertions are not enabled (i.e. release mode), Rust wraps them by [two’s complement]. The net of that is that even something this simple can have unexpected results when calling across the FFI boundary.

This is a silly example, of course, but it shows how you can provide a safe wrapper for a case where C’s implementation differences might actually matter to you… and it also serves as a reminder that C’s implementation differences are likely to matter in more places than you might initially think.

In the case of the safe wrapper for translate, we don’t need to do anything except call unsafe { translate(...); }, because we don’t have any particular constraints we have to uphold other than that we legitimately have a non-null pointer to pass in—and pointers are always non-null when coming from safe Rust. But by wrapping it up this way, we’ve still made it so our consumers don’t have to deal with the un-safety. Here, I’ve chosen to add the explicit casts as a way of communicating to anyone reading the code what I’m doing, but none of the arguments actually requires it.

Finally, it’s worth clarifying that our careful use of safe Rust interfaces does not mean something can’t go wrong here through our use of unsafe code. The C code itself could still do something horrible—cast the pointer to void *, point it at other junk memory, or even call free on it. Lots of things can go wrong in C (or indeed in other languages with C APIs) that Rust by definition doesn’t have control over. When we provide a safe interface like this, we are not saying anything at all about the actual behavior of the C API we’re calling into. The only thing we’re doing is saying that we’re providing data that satisfies the invariants the C API requires.

Conclusion

Okay, so that takes us through the basics of C FFI in Rust. There’s a lot more to say, though! In the next teaching episode, we’ll cover dealing with enums, types like strings and vectors and arrays and slices, ownership issues, and union types. After that, we’ll dive into talking about how you can use Rust from other programming languages! Along the way, there will also be a Crates You Should Know about bindgen.

Thanks as always to everyone who sponsors the show. This month’s $10-or-more sponsors included:

Anthony Deschamps
Ryan Osial
Ola Fadeyi
Soren Bramer Schmidt
Rob Tsuk
John Rudnick
Chip
Brian McCallister
Jerome Froelich
Andrew Dirksen
Nick Stevens
Daniel Mason
Daniel Collin
Joseph Schrag
Jako Danar
Embark Studios
Peter Tillemans
Jonathan Knapp
Adam Green
Michael Mc Donnell
Matt Rudder
Graham Wihlidal
Benjamin Manns
Jeff May
Martin Heuschober
Oluseyi Sonaiya
Scott Moeller
Nathan Sculli
Behnam Esfahbod
Evan Stoll
Arun Kulshreshtha
Raph Levien
Ramon Buckland
Nick Gideo
Daniel Bornkessel
James Hagans II
Dan Abrams
Paul Naranja
David Carroll
Bryan Stitt
Nicolas Pochet
Johan Andersson
Jason Bowen
Alexander Payne
Zach Peters

You can sponsor the show at patreon.com/newrustacean or via other services listed on the show website, <newrustacean.com>. There, you’ll also find show notes, including links to things I talk about, scripts, code samples, and interview transcripts. The notes for this episode are at <newrustacean.com/show_notes/e029/>.

Please recommend the show to others if you like it, whether in person, via your podcast directory, or in various media online! You can contact me at @chriskrycho or @newrustacean on Twitter, or by sending men an email at hello@newrustacean.com.

Until next time, happy coding!