The article provides a very detailed exploration of all of the fun challenges you can face designing FFIs with Rust, but there's a good chance you can "get away" with simpler approaches if you think ahead a bit.
In our case, we call into Rust from Kotlin using JNI [0] and Swift using swift-bridge [1]. Thankfully our use case for the FFI [2] is for non-performance-critical calls and the data structures are fairly simple, so we just serialize objects with JSON.
No major issues so far.
One thing I am surprised hasn't been mentioned so far is Mozilla's UniFFI [3] which seems to solve some of the issues brought up in the article. We plan to switch to that once our FFI requirements become more complex.
If you want to interop well with Rust code, it feels to me like your language has to inherit so many Rust semantics, that I'm questioning myself why I would use it over Rust.
If you're making a new language, just have good interop with C. Most libraries worth using are written in C. Calling into C is trivial* and enforces almost no limitations on what you can do language-design wise.
* trivial, with the somewhat sizable asterisk that you have to rewrite the header files in your language.
I wish Rust would standardize their ABI already. I started a project to call Rust from Common Lisp, but haven't got very far. It's a lot of work, and they can break compatibility at any time.
If they really want to replace C and C++ then they really need to support being called from third party languages.
Rust already has a C ABI for those cases. Also, the C++ example is kinda bad because it doesn't have a standard ABI (only a bunch of implementation specific ones); they also mostly treat this ABI as stable, but this is also detrimental because it is making the performance of some features suboptimal (e.g. `unique_ptr`)
I've been looking into this, and I suspect that one actually needs surprisingly little to interoperate safely with Rust.
TL;DR: The lowest common denominator between Rust and any other memory-safe language is a borrow-less affine type.
The key insight is that Rust is actually several different mechanisms stacked on top of each other.
To illustrate, imagine a program in a Rust-like language.
Now, refactor it so you don't have any & references, only &mut. It actually works, if you're willing to refactor a bit: you'll be storing a lot of things in collections and referring to them by index, and cloning even more, but nothing too bad.
Now, go even further and refactor the program to not have any &mut either. This requires some acrobatics: you'll be temporarily removing things from those collections and moving things into and out of functions like in [2], but it's still possible.
You're left with something I refer to as "borrowless affine style" in [1] or "move-only programming" in [0].
I believe that's the bare minimum needed to interoperate with Rust in a memory safe way: unreference-able moveable types.
The big question then becomes: if our language has only these moveable types, and we want to call a Rust function that accepts a reference, what then?
I'd say: make the language move the type in as an argument, take a temporary reference just for Rust, and then move-return the type back to the caller. The rest of our language doesn't need to know about borrowing, it's just a private implementation detail of the FFI.
These weird moveable types are, of course, extremely unergonomic, but they serves as a foundation. A language could use these only for Rust interop, or it could go further: it could add other mechanisms on top such as & (hard), or &mut (easy), or both (like Rust), or a lot of cloning (like [3]), or generational references (like Vale), or some sort of RefCell/Rc blend, or linear types + garbage collection (like Haskell) and so on.
(This is actually the topic of the next post, you can tell I've been thinking about it a lot, lol)
Have you taken a look at the paper "Foreign Function Typing: Semantic Type Soundness for FFIs" [0]?
> We wish to establish type soundness in such a setting, where there are two languages making foreign calls to one another. In particular, we want a notion of convertibility, that a type τA from language A is convertible to a type τB from language B, which we will write τA ∼ τB , such that conversions between these types maintain type soundness (dynamically or statically) of the overall system
> ...the languages will be translated to a common target. We do this using a realizability model, that is, by up a logical relation indexed by source types but inhabited by target terms that behave as dictated by source types. The conversions τA ∼ τB that should be allowed, are the ones implemented by target-level translations that convert terms that semantically behave like τA to terms that semantically behave like τB (and
vice versa)
I've toyed with this approach to formalize the FFI for TypeScript and Pyret and it seemed to work pretty well. It might get messier with Rust because you would probably need to integrate the Stacked/Tree Borrows model into the common target.
But if you can restrict the exposed FFI as a Rust-sublanguage without borrows, maybe you wouldn't need to.
Thanks for the write-up. My biggest fear is not references, overloads or memory management, but rather just the layout of their structures.
We have this:
sizeof(String) == 24
sizeof(Option<String>) == 24
Which is cool. But Option<T> is defined like this:
enum Option<T> {
Some(T),
None,
}
I didn't find any "template specialization" tricks that you would see in C++, as far as I can see the compiler figures out some trick to squeeze Option<String> into 24 bytes. Whatever those tricks are, unless rustc has an option to export the layout of a type, you will need to implement yourself.
You don’t need to determine the internal representation as long as you’re dealing with opaque types and invoking rust functions on it.
As for the tricks used to make both 24 bytes, it’s NonNull within String that Option then detects and knows it can represent transparently without any enum tags. For what it’s worth you can do similar tricks in c++ using zero-sized types and tags to declare nullable state (in fact std::option already knows to do this for pointer types if I recall correctly)
Yeah currently "niche optimization" is performed when the compiler can infer that some values of the structure are illegal.
This can be currently done when a type declares the range of an integer to not be complete with the
rustc_layout_scalar_valid_range_start or _end attribute (requires #![feature(rustc_attrs)])
In your example it works for String, because String contains a Vec<U8> which inside contains a capacity field of type struct Cap(usize) but the usize is effectively constrained to contain values from 0..=max_isize
The only way for you to know that is to effectively be the rustc compiler or be able to consume it's output
It seems like the struggle here is trying to use Rust transparently/automatically from another language instead of just make bindings easier. I have found that trying to auto-FFI existing Rust types is not the best for languages because there is often an impedance mismatch with how the language treats things and how Rust does. Therefore trying an always-works transparent binding may inevitably end up with people asking for more flexibility to fit the language better (e.g. controlling lifetime semantics, type mappings/copying, etc).
I think it's clearer to take an approach like Neon and PyO3 and other FFI-to-lang helpers do where you just make it easy/safe to write these Vale functions in Rust.
I agree with you, but it's always hard to ignore the allure of not needing to write all the bindings manually. If nobody is willing to write the bulk of the initial bindings then the chance of someone using it seems low, and in theory writing a transparent layer between the two takes less time/effort (in practice I agree that the incompatibilities will make it messy long term).
Rust has the same problem with C APIs, in the past I've went to use something and found that the binding was not there. For a couple functions it's no big deal, but if say half or more of the ones I needed weren't there already then I wouldn't have bothered trying to use it at all.
> Anyone trying to make a new mainstream language is completely insane, unless they're backed by a huge corporation. There are only two exceptions in the last 25 years that come close: Scala and Kotlin.
Much less effort to build a language (without megacorp backing) if you're building off a battle tested runtime.
I'm absolutely not saying this to discredit the work that's gone into Clojure, Elixir et al, but it does lend credence to the idea of building for an existing ecosystem instead of bootstrapping your own (along with "seamless" interop as a first class concern)
If anyone can crack seamless interop between natively compiled languages to dodge ABI hell they'll earn a nice place in history
> Anyone trying to make a new mainstream language is completely insane, unless they're backed by a huge corporation. There are only two exceptions in the last 25 years that come close: Scala and Kotlin
Kotlin was designed and backed by JetBrains from the start. Maybe not a "huge" corporation but a pretty big company still (by revenue).
I don't know the story of how the Android team went Kotlin-first. If that wasn't a deliberate plan they got quite lucky. Could Kotlin arguably be backed by Google?
Android Studio is based on IntelliJ and there's a lot of collaboration between both teams. The adoption of Kotlin was a logical next step, considering a lot of IntelliJ is written in Kotlin.
I don't know when the first Kotlin Android app was published, but Kotlin 1.0 was released in 2016 and then announced as a first class language at Google I/O in 2017.
There's a huge amount of doom & gloom, prophecies of failure against wasm's component-model, a latent expectation that trying to solve FFI is impossible & destined to failure. But what if?
It's be so neat for language creators to be able to use & leverage other works. Getting there wouldn't be easy, but there's be a standard path to getting the hard fought capability here.
C APIs are the best APIs. I do a lot of mixed language work and I would never attempt anything like. Just write a C API and provide trivial FFI bindings for your favorite language.
That said, I thoroughly enjoyed the article and the authors admission of its insanity! Great read. But do the simple thing and call it a day.
I'm a novice on this topic, but I'm surprised that no one has mentioned Python.
Is that because it is a solved problem, thanks to
https://github.com/PyO3/pyo3
and is no longer a challenge?
I’ve not used rust, and quite frankly I think a lot of the post is over my head, but I enjoyed the read nonetheless.
> I don't have any specific plans to turn this C proof-of-concept into a production-quality tool that would enable calling Rust from C, but if anyone wants to take it from here, I'd be happy to assist!
I laughed at this, I’d bet my bottom dollar it’s an attempted nerd snip!
With all this effort required (as the author points out), I start to wonder if a better solution is to communicate via RPC over local sockets.
There will be some overhead, but it might be a wash considering calling over a FFI often involves similar overhead to marshall / unmarshall objects. And the simplicity gains would be massive.
COM [1] was a solution to these problems thirty years ago.
In-process it's just function calls. Cross-process COM has automatic marshalling for standard types ("automation types") or you can define custom marshalling that does whatever you want.
WinRT [2] is a more modern version. It builds on COM and (among other things) provides the basis for the latest UI frameworks in Windows.
A long time ago I worked on a project where we needed to distribute an in process COM object, so we moved it to DCOM, instantiated multiple instances, and that worked! All in all COM was a fairly pleasant technology. Not really that different than gRPC (e.g. idl vs. proto).
Why over a socket? You could perform the same protocol more efficiently with normal functions in-process. Maybe we need a standard serializing LPC protocol just using the platform ABI. Or maybe this comes down to something like ZeroMQ in-process.
Mostly because sockets are supported by everything today, and they're easy to understand. What you're describing would certainly work but it looks similar to what the OP did in the blog post, with all the complexity it comes with.
The OP doesn’t serialize. My proposal would still serialize as with RPC, but instead of passing the data over a socket, just pass the data as a binary blob over a regular function call.
The main thing on my mind is that the build system would become more bespoke when doing it that way, compared to running a few processes that interact with each other.
The overhead of socket read+write is typically much less than the serialization overhead, although both can be optimized to the point of irrelevance for many applications.
It's also interesting because it ends up looking like a microservices architecture, except all on one machine (even all in one process tree).
I suspect confusion with the WebAssembly Component Model — whose development is somewhat intertwined with that of WASI's.
It defines a function call ABI between sandboxes.
No object is in shared memory: parameters are passed by value or by handle.
Has its own IDL and ABI that languages' ABIs need to have adaptors to, if they don't conform.
In our case, we call into Rust from Kotlin using JNI [0] and Swift using swift-bridge [1]. Thankfully our use case for the FFI [2] is for non-performance-critical calls and the data structures are fairly simple, so we just serialize objects with JSON.
No major issues so far.
One thing I am surprised hasn't been mentioned so far is Mozilla's UniFFI [3] which seems to solve some of the issues brought up in the article. We plan to switch to that once our FFI requirements become more complex.
[0] https://docs.rs/jni/latest/jni/
[1] https://github.com/chinedufn/swift-bridge
[2] https://www.firezone.dev/kb/architecture/tech-stack#client-a...
[3] https://github.com/mozilla/uniffi-rs
If you're making a new language, just have good interop with C. Most libraries worth using are written in C. Calling into C is trivial* and enforces almost no limitations on what you can do language-design wise.
* trivial, with the somewhat sizable asterisk that you have to rewrite the header files in your language.
If they really want to replace C and C++ then they really need to support being called from third party languages.
TL;DR: The lowest common denominator between Rust and any other memory-safe language is a borrow-less affine type.
The key insight is that Rust is actually several different mechanisms stacked on top of each other.
To illustrate, imagine a program in a Rust-like language.
Now, refactor it so you don't have any & references, only &mut. It actually works, if you're willing to refactor a bit: you'll be storing a lot of things in collections and referring to them by index, and cloning even more, but nothing too bad.
Now, go even further and refactor the program to not have any &mut either. This requires some acrobatics: you'll be temporarily removing things from those collections and moving things into and out of functions like in [2], but it's still possible.
You're left with something I refer to as "borrowless affine style" in [1] or "move-only programming" in [0].
I believe that's the bare minimum needed to interoperate with Rust in a memory safe way: unreference-able moveable types.
The big question then becomes: if our language has only these moveable types, and we want to call a Rust function that accepts a reference, what then?
I'd say: make the language move the type in as an argument, take a temporary reference just for Rust, and then move-return the type back to the caller. The rest of our language doesn't need to know about borrowing, it's just a private implementation detail of the FFI.
These weird moveable types are, of course, extremely unergonomic, but they serves as a foundation. A language could use these only for Rust interop, or it could go further: it could add other mechanisms on top such as & (hard), or &mut (easy), or both (like Rust), or a lot of cloning (like [3]), or generational references (like Vale), or some sort of RefCell/Rc blend, or linear types + garbage collection (like Haskell) and so on.
(This is actually the topic of the next post, you can tell I've been thinking about it a lot, lol)
[0] "Move-only programming" in https://verdagon.dev/grimoire/grimoire#the-list
[1] "Borrowless affine style" in https://verdagon.dev/blog/vale-memory-safe-cpp
[2] https://verdagon.dev/blog/linear-types-borrowing
[3] https://web.archive.org/web/20230617045201/https://degaz.io/...
> We wish to establish type soundness in such a setting, where there are two languages making foreign calls to one another. In particular, we want a notion of convertibility, that a type τA from language A is convertible to a type τB from language B, which we will write τA ∼ τB , such that conversions between these types maintain type soundness (dynamically or statically) of the overall system
> ...the languages will be translated to a common target. We do this using a realizability model, that is, by up a logical relation indexed by source types but inhabited by target terms that behave as dictated by source types. The conversions τA ∼ τB that should be allowed, are the ones implemented by target-level translations that convert terms that semantically behave like τA to terms that semantically behave like τB (and vice versa)
I've toyed with this approach to formalize the FFI for TypeScript and Pyret and it seemed to work pretty well. It might get messier with Rust because you would probably need to integrate the Stacked/Tree Borrows model into the common target.
But if you can restrict the exposed FFI as a Rust-sublanguage without borrows, maybe you wouldn't need to.
[0] (PDF Warning): https://wgt20.irif.fr/wgt20-final23-acmpaginated.pdf
We have this:
Which is cool. But Option<T> is defined like this: I didn't find any "template specialization" tricks that you would see in C++, as far as I can see the compiler figures out some trick to squeeze Option<String> into 24 bytes. Whatever those tricks are, unless rustc has an option to export the layout of a type, you will need to implement yourself.As for the tricks used to make both 24 bytes, it’s NonNull within String that Option then detects and knows it can represent transparently without any enum tags. For what it’s worth you can do similar tricks in c++ using zero-sized types and tags to declare nullable state (in fact std::option already knows to do this for pointer types if I recall correctly)
This can be currently done when a type declares the range of an integer to not be complete with the
rustc_layout_scalar_valid_range_start or _end attribute (requires #![feature(rustc_attrs)])
In your example it works for String, because String contains a Vec<U8> which inside contains a capacity field of type struct Cap(usize) but the usize is effectively constrained to contain values from 0..=max_isize
The only way for you to know that is to effectively be the rustc compiler or be able to consume it's output
I think it's clearer to take an approach like Neon and PyO3 and other FFI-to-lang helpers do where you just make it easy/safe to write these Vale functions in Rust.
Rust has the same problem with C APIs, in the past I've went to use something and found that the binding was not there. For a couple functions it's no big deal, but if say half or more of the ones I needed weren't there already then I wouldn't have bothered trying to use it at all.
And Clojure! (also a JVM language)
I'm absolutely not saying this to discredit the work that's gone into Clojure, Elixir et al, but it does lend credence to the idea of building for an existing ecosystem instead of bootstrapping your own (along with "seamless" interop as a first class concern)
If anyone can crack seamless interop between natively compiled languages to dodge ABI hell they'll earn a nice place in history
Kotlin was designed and backed by JetBrains from the start. Maybe not a "huge" corporation but a pretty big company still (by revenue).
It's be so neat for language creators to be able to use & leverage other works. Getting there wouldn't be easy, but there's be a standard path to getting the hard fought capability here.
That said, I thoroughly enjoyed the article and the authors admission of its insanity! Great read. But do the simple thing and call it a day.
> I don't have any specific plans to turn this C proof-of-concept into a production-quality tool that would enable calling Rust from C, but if anyone wants to take it from here, I'd be happy to assist!
I laughed at this, I’d bet my bottom dollar it’s an attempted nerd snip!
There will be some overhead, but it might be a wash considering calling over a FFI often involves similar overhead to marshall / unmarshall objects. And the simplicity gains would be massive.
In-process it's just function calls. Cross-process COM has automatic marshalling for standard types ("automation types") or you can define custom marshalling that does whatever you want.
WinRT [2] is a more modern version. It builds on COM and (among other things) provides the basis for the latest UI frameworks in Windows.
[1]: https://en.wikipedia.org/wiki/Component_Object_Model
[2]: https://en.wikipedia.org/wiki/Windows_Runtime
The overhead of socket read+write is typically much less than the serialization overhead, although both can be optimized to the point of irrelevance for many applications.
It's also interesting because it ends up looking like a microservices architecture, except all on one machine (even all in one process tree).
https://en.wikipedia.org/wiki/Fatal_insomnia
[0] https://github.com/Srinivasa314/neptune-lang
It defines a function call ABI between sandboxes. No object is in shared memory: parameters are passed by value or by handle. Has its own IDL and ABI that languages' ABIs need to have adaptors to, if they don't conform.