Making C++ safe without borrow checking, reference counting, or tracing GC

pie_flavor · on June 23, 2023

> Borrow checking is incompatible with some useful patterns and optimizations (described later on), and its infectious constraints can have trouble coexisting with non-borrow-checked code.

Not that this isn't true, but the rest of the article introduces a system with a superset of those limitations, gradually decreasing over time but never becoming a subset. In fact the pattern described in the article is a common pattern in Rust and I make use of it all the time; the library for making use of it is `slotmap`.

verdagon · on June 23, 2023

Later on, it adds generational references and constraint references to relax the restrictions. These are both more flexible than SlotMap because they don't require a new parameter to be passed in from the callers (and callers' callers etc), which can cause problems when an indirect caller's signature can't change (trait method override, public API, drop, etc.)

Animats · on June 24, 2023

> Borrow checking is incompatible with some useful patterns.

The main problem is back references, as in doubly linked lists. In Rust, you can do that sort of thing using Rc and the weak/strong reference mechanism. Forward references own, and are strong. Back references are weak.

I've been toying with the idea of some generic types which allow strong forward references which you can't copy or clone, and weak back references which you can't make strong outside a contained scope. This can be implemented with the existing Rc system, and potentially could be proven, with a static analyzer, to not require the reference counts. It's worth a try to see if one can effectively program under those restrictions. If it's not too much of a pain to use, this might be an effective way out of Rust's back-reference problem.

dxhdr · on June 23, 2023

> In fact the pattern described in the article is a common pattern in Rust and I make use of it all the time; the library for making use of it is `slotmap`.

Slotmap uses unsafe everywhere, it's a memory usage pattern not supported by the borrow checker. It's basically hand-implementing use-after-free and double-free checks, which is what the borrow checker is supposed to do. Is that really a common pattern in Rust?

dralley · on June 23, 2023

> Slotmap uses unsafe everywhere, it's a memory usage pattern not supported by the borrow checker. Is disabling the borrow checker really a common pattern in Rust?

Wrapping "unsafe" code in a safe interface is a common pattern in Rust, yes. There is absolutely nothing wrong with using "unsafe" so long as you are diligent about checking invariants, and keep it contained as much as possible. Obviously the standard library uses some "unsafe" as well, for instance.

"unsafe" just means "safe but the compiler cannot verify it".

Unsafe does not disable the borrow checker, though. All of the restrictions of safe Rust still apply. All "unsafe" does is unlock the ability to use raw pointers and a few other constructs.

https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#unsa...

tialaramex · on June 23, 2023

> Obviously the standard library uses some "unsafe" as well, for instance.

Most beautifully, MaybeUninit<T>::assume_init() -> T

This unsafe Rust method says "I promise that I actually did initialize this MaybeUninit<T>, so give me the T".

In terms of the resulting program the machine is not going to do any work whatsoever, a MaybeUninit<T> and a T are the same size, they're in the same place, your CPU doesn't care that this is a T not a MaybeUninit<T> now.

But from a type safety point of view, there's all the difference in the world.

Even though it won't result in emitting any actual CPU instructions, MaybeUninit::assume_init has to be unsafe. Most of the rest of that API surface is not. Because that API call, the one which emitted no CPU instructions, is where you took responsibility for type correctness. If you were wrong, if you haven't initialized T properly, everything may be about to go spectacularly wrong and there's no-one else to blame but you.

dcow · on June 24, 2023

Exactly. People miss this all the time when they write off Rust for "needing unsafe to do real programming" or whatever uninformed criticism they're parroting (they've clearly never actually done this "real programming" in Rust). The whole point is to reduce the opportunity for unforced errors by marginalizing the cognitive load required for the programmer to ensure the program is correct. And a program with a few unsafe blocks to `assume_init` some memory that e.g. a driver initialized for you is still infinitely better in that regard than a program that's littered with `void*` everywhere.

mike_hock · on June 24, 2023

> than a program that's littered with `void*` everywhere

Strawman argument. A properly written C++ program isn't littered with `void*` everywhere in the same way that a properly written Rust program isn't littered with `unsafe` everywhere. You build safe abstractions around the ugly low-level pointer handling, you just don't have a keyword for a clear delineation.

> People miss this all the time when they write off Rust for "needing unsafe to do real programming" or whatever uninformed criticism they're parroting

Hard-core Rust proponents also seem to miss this all the time. Because "you basically write the same unsafe code that you would write in C++ but you now have a keyword to mark it" just doesn't imply the same urgency for adopting the language than "you only need unsafe to implement a few primitives in the standard library" does, which always seems to be tacitly implied until called out, and then the critics are "misinformed."

tialaramex · on June 24, 2023

Firstly the delineation clarity is much more valuable than you seem to appreciate. A day one beginner in Rust can see that this stuff is roped off - so they know if they should call a grown-up - and everything which isn't roped off is safe for them. This also benefits an experienced developer when you're not at your best. Lets not write unsafe Rust today, we can do that when the air conditioning works, the coffee machine is fixed and there aren't contractors using power tools in the office.

I also think you very seriously underestimate how much equivalently unsafe C++ you write, and overestimate how much actual unsafe Rust is needed. Philosophically WG21 (the C++ committee) didn't like safe abstractions, so it doesn't provide them. To the point where the C++ slice type std::span is exactly like the safety proposal where it was originally suggested, except with all the safety explicitly ripped out. "We like this safety feature, except for the safety, get rid of that". I am not even kidding.

Most Rust programmers don't need to write any unsafe Rust. They can rely on Rust's promises, about aliasing, races, memory safety, performance characteristics, and they have no responsibility for delivering those promises, it's all done for them so long as they write safe Rust.

The other crucial element is culture. Culturally Rust wants safe abstractions, that applies to the standard library of course, but it also applies to third party code, you can expect other Rust programmers to think your library is crap if it has a method which is actually not safe to call without certain pre-conditions but isn't labelled "unsafe" -- because that's exactly what "unsafe" is for so you're not fulfilling your social contract.

pkolaczk · on June 24, 2023

> You build safe abstractions around the ugly low-level pointer handling, you just don't have a keyword for a clear delineation.

The main difference is they are not really safe. It is trivial to accidentally invoke UB with incorrect use of "safe" abstractions in C++ like built-in containers or smart pointers. Keep a reference to a vector element, add a new item to the vector and it will sometimes blow up ;)

mike_hock · on June 24, 2023

I disagree that it is "trivial," at least in the example you stated. This take-reference-then-mutate is exactly the kind of usage that the borrow checker prevents. You have to avoid it systematically in both languages.

The built-in containers are also not the best examples of "safe" abstractions. You can build safer abstractions, and you can employ safer usage patterns of built-in vectors, at non-zero but marginal costs.

The honest view on C++ is that there is no such thing as "safe" in absolute terms, but you have a lot of tools to mitigate the unsafe nature of the core language.

The honest view on Rust is that the idea of categorically excluding memory safety errors didn't quite pan out, but we're nonetheless left with an improvement over C++.

dcow · on June 24, 2023

It’s subtle, but you don't avoid “take reference then mutate” in Rust, you are told exactly how to do it without aliasing the memory.

I’m not going to say Rust is perfect, that’s obviously not the case. But I really think your argument, like others are saying, underplays the actual value of Rust.

I’ve written entire projects in both C++ and Rust. I’ve never wasted days debugging memory corruption in Rust. Just sayin’.

mr_00ff00 · on June 23, 2023

If unsafe means “safe but the compiler cannot verify” then I guess just consider .cpp to mean “safe but the compiler cannot verify” and we have suddenly made C++ memory safe

retrac · on June 23, 2023

There's a related idea in Haskell, usually considered a memory safe language. You can write a program in Haskell that directly mutates memory, or does IO operations, freely, anywhere in the code. This violates functional purity and the compiler cannot offer its usual promises; your program may very well segfault from a bug in such code. But sometimes you just have to, perhaps to implement an algorithm efficiently.

Still, it is discouraged; both culturally in the language community, and discouraged through the subtle prodding of the language itself (such as everything being typed "IO", or the slightly ominous "unsafe" in the "unsafePerformIO".) Very often, the amount of code that must truly live in IO can be reduced to a few dozen lines, if that. That code is crucial to get right -- it's where the actual sequence of computation and external effects are handled. Such isolation allows the rest of the code to not have to worry about those matters.

jjnoakes · on June 23, 2023

Sure, and if a typical Rust program that I write has no unsafe in it directly, and 5% of its dependencies' code have unsafe in them, that's also the same as writing a program in the "not c++" language directly, and using "not c++" dependencies for all but 5% of the dependency code.

Seems like a silly analogy to me, though.

mr_00ff00 · on June 23, 2023

Right but it’s that 5% the origin comment is talking about. The times when rust has to use unsafe for the type of program.

jjnoakes · on June 23, 2023

Unsafe Rust is safer than C++, and even if it wasn't, 5% unsafe in Rust programs (in well-marked locations) is vastly superior to 100% unsafe in C++ programs.

Any analogy that equates the two is silly.

anonymoushn · on June 24, 2023

unsafe rust is less safe than C++ because of the provenance and aliasing semantics that unsafe rust must adhere to to avoid UB, which are generally tricker than those of C++

tialaramex · on June 24, 2023

The provenance rules in the C++ standard are basically just a shrug emoji†, so it's unclear whether those are worse, I can see an argument for the idea that obeying Aria's strict provenance experiment rules in Rust is easier - not because it's easy (although for many common cases it is) but because at least these are coherent rules.

† U+1F937 person shrugging

umanwizard · on June 24, 2023

The core value proposition of rust is that it’s memory safe by default, and it’s possible to limit the set of code that needs to be manually checked for UB. This isn’t the case for C++, as any code anywhere can invoke undefined behavior.

pjmlp · on June 24, 2023

True, as long static analysers aren't part of the build, at which time specific constructs can be made to break the CI/CD build, forcing everyone to play by the rules if they want the PR to go through.

It isn't perfect, but does improve a lot the security baseline.

ammar2 · on June 23, 2023

Sure but you're missing the

> so long as you are diligent about checking invariants

part. Could you go through and check all the parts of a huge C++ codebase to make sure invariants are held as opposed to a few hundred lines of unsafe Rust code?

mr_00ff00 · on June 23, 2023

Sure, but I think the point here is the degree.

Presumably if it takes a lot of unsafe rust lines to build something, it won’t matter if it’s 30% safe or whatever.

I just see the point of “unsafe is fine” a lot when the whole point of rust is that memory safety issues are never worth the cost.

ammar2 · on June 23, 2023

Right, I guess the question is what will that proportion be when Rust is used for things like operating systems and web browsers. 30% would be untenable but a few hundred/thousand lines of unsafe code is fairly easy to put under a microscope.

For some current day research into this, there is the paper "How Do Programmers Use Unsafe Rust?"[1] which I'll drop a quote from here:

> The majority of crates (76.4%) contain no unsafe features at all. Even in most crates that do contain unsafe blocks or functions, only a small fraction of the code is unsafe: for 92.3% of all crates, the unsafe statement ratio is at most 10%, i.e., up to 10% of the codebase consists of unsafe blocks and unsafe functions

That paper is definitely worth reading and goes into why programmers use unsafe. e.g 5% of the crates at that time were using it to perform FFI.

In writing "RUDRA: Finding Memory Safety Bugs in Rust at the Ecosystem Scale" [2], I recreated this data and year-by-year the % of crates using unsafe is going down. And for what it's worth, crates are probably a bad data-set for this. crates tend to be libraries which are exactly where we would expect to find unsafe code encapsulated to be used safely. There's also plenty of experimental and hobby crates. A large dataset of actual binaries would be way more interesting to look at.

[1] https://dl.acm.org/doi/10.1145/3428204

[2] https://taesoo.kim/pubs/2021/bae:rudra.pdf

Gwypaas · on June 23, 2023

Or Rust in Android, in this deep dice gaining two places of unsafe code which found a bug in the existing implementation due to the vetting triggered by being the only two places.

https://security.googleblog.com/2022/12/memory-safe-language...

mr_00ff00 · on June 23, 2023

Ahh that is quite interesting, I’ll check those links out

CJefferson · on June 24, 2023

Looking at a couple of programs I work on:

9,500 lines of code, 8 are unsafe.

7,000 lines of code, 22 are unsafe.

14,000 lines of code, 140 are unsafe.

As we follow the standard rust rule that "safe code should not be able to use unsafe code to do unsafe things", those unsafe bits of code have been very carefully checked, to the best of our abilities, to ensure they don't create memory safety issues. It is a lot easier to triple-check 170 lines of code than 30,000 lines.

Animats · on June 24, 2023

Sharpview, my metaverse viewer: 36,000 lines, 0 are unsafe.

I use some published crates that have unsafe code, but my own programs start with

    #![forbid(unsafe_code)]

This is 60FPS 3D high-detail graphics stuff, where performance matters.

cyber_kinetist · on June 24, 2023

Are you using wgpu for the rendering stuff? Heard that WebGPU had to sacrifice some performance in order to make the API safer for the web (like more bounds checking and sanity checks). These kinds of issues are actually plaguing projects like Tensorflow.js (for example see https://github.com/gpuweb/gpuweb/issues/1202).

Other libraries like Vulkan and DirectX 12 are fundamentally unsafe in the API level, so direct usage of it would lead to heaps of unsafe Rust code. Rust people have tried wrapping it in a safe way (like gfx-rs and vulkano) but nowadays most seem to have transitioned to wgpu (since WebGPU API is safe by design so it fits more for the Rust ecosystem).

Rust does sacrifice some performance in general in order to achieve its safety claims, but people are happy with it so far, since the majority of applications using Rust (like CLI apps and web servers) don't have to squeeze out performance that much (for webdev there are too many things that can cause performance issues other than not writing it in Rust). But for 3D graphics people can be more sensitive about these problems. Though maybe if you're not developing a triple-A game with the latest cutting-edge graphics (with new techniques like "hardware ray tracing" and "bindless descriptors", which are both impossible in wgpu), writing in Rust can be a good-enough tradeoff for your needs.

Animats · on June 24, 2023

WGPU is just finishing up a major reorganization of locking and internal memory management, going from a global lock to fine-grained Arc reference counts.[1] Change log, just posted a few minutes ago: "Arcanization of wgpu core resources: Removed 'Token' and 'LifeTime' related management, removed 'RefCount' and 'MultiRefCount' in favour of using only 'Arc' internal reference count, removing mut from resources and added instead internal members locks on demand or atomics operations, resources now implement Drop and destroy stuff when last 'Arc' resources is released, resources hold an 'Arc' in order to be able to implement Drop, resources have an utility to retrieve the id of the resource itself, removed all guards and just retrive the 'Arc' needed on-demand to unlock registry of resources asap removing locking from hot paths."

From a performance standpoint, I'm much more concerned about being able to get all the CPUs working on the problem than slight improvements in per-CPU performance. My metaverse viewer has slow frames because loading content into the GPU from outside the rendering thread blocks the rendering thread. All that "ARCcanization" should fix that.

(I'm a user of WGPU, not one of its developers.)

[1] https://github.com/gfx-rs/wgpu/pull/3626/files/5b34df5a2e6f9...

cyber_kinetist · on June 24, 2023

A counterpoint that makes this argument a bit weaker: Rust’s “unsafe” marker doesn’t pollute only its scope and actually pollutes the whole module; You need to make sure that the invariants in unsafe code are met even in safe code. (An explanation of this in the Rustonomicon: https://doc.rust-lang.org/nomicon/working-with-unsafe.html)

So there’s quite a lot more code to actually check then what some of the Rust proponents are saying. One can say that C++ is still worse in this regard (theoretically you need to check 100% of your code to be safe in C++). But for some minority of developers who frequently needs to delve into unsafe code, the advantages of Rust might seem a bit more disappointing (“the compiler doesn’t really do that much for the more important stuff…”)

kaba0 · on June 24, 2023

> whole point of rust is that memory safety issues are never worth the cost

I don’t think that it would be the point of rust — otherwise why not write Java, or a litany of GCd languages instead?

Rust is a low-level/systems programming language where you have more control over the program’s execution (e.g. no fat runtime), which is a necessity in some rare, niche, but important use cases.

dralley · on June 25, 2023

It almost never takes a lot of unsafe to build something. Even the redox OS kernel is only like 10% unsafe.

Ygg2 · on June 23, 2023

It's not what unsafe means. Unsafe means this might cause UB for some invocations (accessing raw pointers, calling into another language, etc.). Safe means it will not cause UB for any invocations (it may panic or abort).

dxhdr · on June 23, 2023

It's essentially a "user-space" memory allocator with it's own use-after-free and double-free checks, apparently because the language implementation isn't adequate. If anything it just reinforces the articles point that "borrow checking is incompatible with some useful patterns and optimizations."

junon · on June 23, 2023

Eh? This is a wild take. How do you draw the conclusion the default implementation is inadequate?

dymk · on June 23, 2023

Because something like slotmap has to use `unsafe` to get around the inadequacies of the borrow checker...

orlp · on June 23, 2023

Author of slotmap here.

There is absolutely no need for unsafe in slotmap. I chose to use unsafe (wrapped in a safe API) to reduce memory usage using intrusive linked freelists. If done using safe Rust this would involve `enum`s that would take up extra space.

junon · on June 24, 2023

Slotmap is one of my favorite crates, by the way. Thank you for putting it out there!

pjmlp · on June 24, 2023

Thus not adequate for performance requirements.

junon · on June 24, 2023

.... that has nothing to do with the allocator though

burntsushi · on June 23, 2023

A downside for sure, but one that, at least in this specific example, has limited downsides. If you can button it up into a safe abstraction that you can share with others, then I don't really see what the huge problem is. The fact that you might need to write `unsafe` inside of a well optimized data structure isn't a weakness of Rust, it's the entire point: you use it to encapsulate an unsafe core within a safe interface. The standard library is full of these things.

Now if you're trying to do something that you can't button up into a safe abstraction for others to use, then that's a different story.

runeks · on June 24, 2023

> "unsafe" just means "safe but the compiler cannot verify it".

"unsafe" means "safe"?

I would say "unsafe" means "only safe if used in a manner that cannot be checked by the compiler".

ChrisSD · on June 24, 2023

There are two things here. The `unsafe` in an `unsafe { ... }` block is referring to the contents of the block. From the outside it is indeed safe to use as if it were safe code. No special requirements necessary. So, yes, from a certain point of view `safe` would have been a better name (albeit confusing in a different way).

An `unsafe fn` however does need to be used correctly (and should document those requirements). However, these can only be called within `unsafe` blocks, so see above.

cyber_kinetist · on June 24, 2023

Not entirely correct, Rust’s “unsafe” marker doesn’t pollute only its scope, it actually pollutes the whole module; You need to make sure that the invariants in unsafe code are met even in safe code. (An explanation of this in the Rustonomicon: https://doc.rust-lang.org/nomicon/working-with-unsafe.html)

orlp · on June 23, 2023

> Slotmap uses unsafe everywhere, it's a memory usage pattern not supported by the borrow checker.

Author of slotmap here. This is patently false.

Yes, the slotmap crate uses a lot of unsafe to squeeze out maximum performance. But it is not 'a memory usage pattern not supported by the borrow checker'. You can absolutely write a crate with an API identical to slotmap without using unsafe.

dxhdr · on June 24, 2023

> But it is not 'a memory usage pattern not supported by the borrow checker'. You can absolutely write a crate with an API identical to slotmap without using unsafe.

I think that might actually be worse though, performance aside. You're performing memory / object lifetime management but the Rust borrow checker still would have no idea what's going on because now you've tricked it by using indices or an opaque handle instead of references. The program may compile just fine but could have use-after-free bugs.

At least with unsafe there's an explicit acknowledgement that the borrow checker is turned off.

orlp · on June 24, 2023

Yes, using slotmap you can get "use after free"-style bugs that you would not encounter if you strictly stayed with the borrow checker. So if the borrow checker fits your purpose, by all means, go ahead.

But the borrow checker can't represent circular/self-referential structures you see very often in graphs. Nor is it convenient in some cases as it has a strict separation between references that can mutate, and those that can't, which doesn't fit all problems either because the mutable references are by necessity unique.

Note that a "use after free" in slotmap results in a None value, or a panic (exception for the C++ people), depending on which API you use. In other words, it is detected and you can handle it. It does not trigger undefined behavior, you don't get ABA-style spurious references, there are no security issues. It is not the same as the issues pointers have at all.

chlorion · on June 24, 2023

I have implemented my own slotmap crate for a lisp interpreter that uses no unsafe code and provides exactly the same features as the "standard" slotmap crate.

There is nothing inherent to the slotmap that requires unsafe code! It's only used for optimizations purposes.

Mine works in a similar way to the "standard" slotmap. It's a vec of slots, slot is an enum that can be occupied or vacant, the occupied variant is a two tuple containing the value and generation, vacant holds just a generation. Inserting into the slotmap simply switches the variant of the slot from vacant to occupied, and popping does the reverse. If there is no currently vacant slots, we just use the underlying push method on the vec of slots which will handle resizing for us! I also store a stack of indexes to vacant slots to make insertion fast.

When you insert into the slotmap, it provides a opaque key, but the data inside is an index and a generation. When you attempt to retrieve a value with a key, the slotmap checks if the slot is occupied and if the generation matches, and if so returns the value, otherwise returns none.

There is also a indirect slotmap, that adds an extra layer of indirection, so rather than the key being an index directly into the underlying vec of slots, its an index into a vec of indexes, this allows moving the slots around without invaliding currently living keys.

The indirect slotmap has the advantage of faster iteration, since it doesn't have to skip over empty "holes" of vacant slots in the vec of slots. The tradeoff is that insertion is slightly slower!

Anyways, no unsafe is required to implement a performant slotmap data structure! I have not uploaded my slotmap to crates.io because I didn't think anyone would find it useful, but maybe I should reconsider this!

mr_00ff00 · on June 23, 2023

I would really love a definitive answer on whether the borrow checker and rust’s rules do really limit optimizations and such.

It seems like I see this opinion often and every time there are tons of people on both sides who seem sure they are correct.

What are the limitations for optimization? Does unsafe rust really force those?

steveklabnik · on June 23, 2023

The question is far too broad, and contextual. You're never going to get an answer to that question.

Sometimes, the rules add more optimization potential. (like how restrict technically exists in C but is on every (okay almost every) reference in Rust) Sometimes, the rules let you be more confident that a trickier and faster design will be maintainable over time, so even if it is possible without these rules, you may not be able to do that in practice. (Stylo)

Sometimes, they may result in slower things. Maybe while you could use Rust's type system to help you with a design, it's too tough for you, or simply not worth the effort, so you make a copy instead of using a reference. Maybe the compiler isn't fantastic at compiling away an abstraction, and you end up with slower code than you otherwise would.

And that's before you get into complexities like "I see Rc<RefCell<T>> all the time in Rust code" "that doesn't make sense, I never see that pattern in code".

verdagon · on June 23, 2023

I'd say it mostly applies to manual optimization, when we're restructuring our program.

If the situation calls for a B-tree, the borrow checker loves that. If the situation calls for some sort of intrusive or self-referential data structure (like in https://lwn.net/Articles/907876/), then you might have to retreat to a different data structure which could incur more bounds checking, hasher costs, or expansion costs.

It's probably not worth worrying about most the time, unless you're in a very performance-sensitive situation.

bluGill · on June 23, 2023

There can be no answer. Research is ongoing, smart people are actively trying to make optimizer better, so even if I gave a 100% correct answer now (which would be pages long), a new commit 1 minute latter will change the rules. Sometimes someone discovers what we thought was safe isn't safe in some obscure case and so we are forced to no longer apply some optimization. sometimes optimization is a compromise and we decide that the using a couple extra CPU cycles is worth it because of some other gain (a CPU cycle is often impossible to measure in the real world as things like caches tend to dominate benchmarks, so you can make this comprise many times when suddenly the total adds up to something you can measure.).

The short answer for those who don't want details: it is unlikely you can measure a difference in real world code assuming good clean code with the right algorithm.

apendleton · on June 23, 2023

Without directly answering your question, it's worth noting that there are also additional optimizations made available by Rust that are not easily accessible in C/C++ (mostly around stronger guarantees the Rust compiler is able to make about aliasing).

amelius · on June 23, 2023

Difficult to answer.

However, what you can say is that the borrow-checker works like a straight-jacket for the programmer, making them less capable to focus on other things like performance issues, high-level data leaks (e.g. a map that is filled with values without removing them eventually), or high-level safety issues.

steveklabnik · on June 23, 2023

You can also say that the borrow checker works like a helpful editor, double checking your work, so that you can focus on the important details of performance issues, safety issues, and such, without needing to waste brain power on the low-level details.

amelius · on June 23, 2023

This would be true if code using the borrow checker was easier to read than to write.

steveklabnik · on June 23, 2023

The point is that the compiler helps you “read” it. This takes mental effort off of you.

I agree that not everyone thinks this is true, but this is my experience. I do not relate to the compiler as a straight jacket. I relate to it as a helpful assistant.

jjnoakes · on June 23, 2023

This is my experience as well. I find it much easier to work faster when the compiler is helping me, and I don't consider it a "straitjacket" at all.

SubjectToChange · on June 23, 2023

I think it’s generally accepted that writing code is nearly universally easier than reading code, in any language. That aside, getting a mechanical check on memory safety for the price of some extra language verbosity is obviously worth it IMO.

By the same token, it is common to see criticisms of the complexity of templates in C++, but templates are the cornerstone of “Modern C++” and many libraries could not exist without them.

amelius · on June 24, 2023

> getting a mechanical check on memory safety for the price of some extra language verbosity is obviously worth it IMO

But a GC'd language doesn't require the extra verbosity.

zbentley · on June 24, 2023

GC has little to do with it. The borrow checker as a developer tool has much more to do with preventing concurrency bugs and unexpected mutation than it does with memory management.

"As a developer tool" is doing some work in that sentence though. As a language implementation characteristic, the checker can help inform (or, more accurately, ensures that code is written in a way that informs) memory management decisions.

buildartefact · on June 24, 2023

And you pay for that in performance

kaba0 · on June 24, 2023

What performance? That’s not a single thing. Do you pay in throughput or latency?

It certainly has a price but it is waaay too overblown in many discussions. What it mostly does entail is a slightly larger p99 latency. Where it actually matters is entirely another question.

kbenson · on June 23, 2023

Methods such as these for C and C++ are interesting, and needed, but only solve a part of the problem.

As others have noted before, they do little good because they're opt-in. I think there's a bit of nuance to that which needs to be explored though, as I think it's less a problem that the extra checks are opt in, and more a problem of how we use and categorize libraries.

As long as we encourage dynamic and static library inclusion (and why wouldn't we, it's how we build upon the work of others), every language has a problem similar to how C and C++ are opt-in and you can't easily control the code you include or link. If you load openssl from Java or Rust or Go, you might have some benefit from a well defined API layer, but ultimately you are still beholden to the code openssl provides in their library.

Just as one of the real benefits of Rust or Java or Go is not necessarily that the code is completely safe, but that weird unsafe behavior usually requires special escape hatches which are easier to audit, what we need are ways to categorize the code we include, no matter the language it comes from, with appropriate labels that denote how strong the safeguard guarantees it was compiled with are and of which type, so we can make easier and better informed decisions on what to include and how to audit it easily when we do.

This applies to including something written in Rust as well. If someone is writing something in C++ and wants to include a library written in Rust, that it's written in Rust is only part of the picture. It's equally important to how often (as a total and as a percentage of code) the safety checks that language required (or that the developers opted into) where escaped in that library.

If the choice is a Rust library with 95% of the code in unsafe blocks or a C++ library that opted into multiple different safety checker systems and has almost no escapes from those requirements, Rust is not providing any real safety benefits in that situation, is it? What we need is better information exposed at a higher level to developers about what they're opting into when they use third party code, because we can all control what safety mechanisms we use ourselves, so that's mostly a solved problem.

gridspy · on June 24, 2023

Often missed here is that the Rust library author is strongly protected from faulty code written by Users. The C/C++ library author is not.

The most obvious examples of this are memory allocation. The C/C++ user claimed the buffer was large enough to contain the result. The Rust user received back an object that protected the memory and returned it at the right time.

But it could also be a file handle or mutex that used the Rust ownership patterns to protect the underlying data.

If I am using a library and the author can put in features that prevent misuse, I don't need to work so hard to use the library correctly. As soon as my safe code compiles I can be fairly sure with most rust libraries that I didn't break some unenforced rule.

kbenson · on June 24, 2023

That may be true but it doesn't help downstream users know what they're opting into. Knowing that a rust library can use no unsafe code is not the same as knowing that that library did use no unsafe code, in a similar way to knowing that a C library can be coded with the help of various tools to provide additional safery (in some cases beyond what rust provides natively) is not the same as knowing they did.

In other words, that something was written in a specific language is far too coarse and blunt an assessment to really know what you're getting into.

Rust libraries may be, in aggregate, far more likely to have additional safety than C libraries because of features rust provides that are generally used, but you can know nothing about the specific relative safety of a single rust library and C library without looking closer, at the code level. Having indicators of exactly that info surfaced in descriptions about libraries would be a major step forward IMO. I think it would also probably immediately benefit rust if it were to happen, as the info would cast many rust libraries in a beneficial light in comparison to others, but to me that's far less important than promoting better practices overall regardless of language by allowing users to better choose between them, regardless of language.

pjmlp · on June 24, 2023

If the cybersecurity guidelines keep being improved upon, I expect security assements to be a requirement for 3rd party libraries, regardless of the language, just like we already have to do legal checks before being allowed to add them into the internal package server.

kaba0 · on June 24, 2023

That’s a great point and I agree with it. Just to play a bit of a “devil’s advocate”, GCd languages productivity boost comes from exactly this: both in Rust (where it is explicit) and in other low-level languages, low-level design decisions leak into public API interfaces. It’s great that it is explicit in Rust, but depending on what you work on it may be better to just have the runtime deal with it. (But clearly the C-FFI API surface is the worse of implicit requirements that may not be uphold at all)

jjnoakes · on June 23, 2023

I feel like a few languages are better than others in a related but not quite identical area:

Languages like Java and Go, while they CAN escape to native libraries, have cultures that tend to avoid that kind of thing. At least, in my projects, I have quite an easy time using zero native dependencies with those languages (except for the underlying kernel of course), and so I feel like there is a much lower chance of escape-hatch issues sneaking in.

They aren't built on a foundation of legacy C and C++ libraries - not even the crypto - and I find that to be an advantage.

amaranth · on June 24, 2023

This is at least partially because there is a performance overhead to calling out to C from Java or Go so eventually the optimal implementation of something will be in the language itself. Rust's zero-cost calls to/from C is a positive in that it gives you a large ecosystem of existing code to use but a negative in that people are more likely to just try to build "safe" wrappers around those instead of writing something safe in Rust. This is somewhat countered by the "rewrite it in Rust" folks but you still get more C wrappers than Java or Go.

verdagon · on June 23, 2023

This is a great point, and one that doesn't get enough attention. The article talks about using a static analysis tool, but usage of that tool is indeed opt-in, like you say.

I suspect a language could mitigate this with the ability to sandbox a library's code. That could be pretty slow though, but we could compile it to wasm and then use wasm2c to convert it back into native code. I wrote a bit about this idea in [0], but I'd love to see someone make this work for C++.

[0] https://verdagon.dev/blog/fearless-ffi

jackmott42 · on June 23, 2023

If you were starting a new project you could put lints in place to make these things enforced. But at some point you have all these lints and customizations in place, and you can't use old or 3rd party C++ code any more because of them, so you begin to ask, why not just use a new language where this stuff isn't pasted together with glue and bailing wire?

kbenson · on June 23, 2023

My point is really not about the code you write yourself, but the code you need to include in your project. Rare is the professional programmer that always gets to finish their project using only code they wrote themselves, and for many projects that's highly inadvisable (don't roll your own crypt unless you have a very good reason).

So, given that at times we will have to use external libraries, and given that even very safe languages often have escape hatches meaning you can't be sure the code of one language has more constraints than another, it would be great to have other indicators than the language it was written in that indicated what safety checks it uses.

If next year you're writing a new program in a language that hasn't even been invented as of now, and is viewed as safer than every language out today, what does that actually get you if one of your constraints is that you need to include and use openssl or one of a few forks for compatibility reasons? Wouldn't you rather be able to look at the available options and see that come opt into specific safety constraints, and have been good about not them circumventing them, and do so extremely easily? Network effects and existing known projects seem to have an inordinate amount of staying power, so we might as well deal with that as a fact.

The world is a messy place, but the more information we have the better our chances of making order out of it, even if temporarily.

adamnemecek · on June 24, 2023

> If the choice is a Rust library with 95% of the code in unsafe blocks, ... Rust is not providing any real safety benefits in that situation, is it?

If the choice is a C library with 95% of the code in inline assembly, ... C is not providing any benefits in that situation, is it?

spoiler · on June 24, 2023

People have this misconception that unsafe Rust is some other language. It's not. All it does is allow the use of raw pointers. So, all other Rust feature (ie type system) still work as they always did. So, you can really minimia the surface are of unsafe code. Not to mention seeing unsafe just means that something really low level is happening, and signals caution to the reader (or anyone editing the code), as opose to C++ where it's easy to forget you need to be alert, since you should be alert all the time really.

kbenson · on June 24, 2023

Exactly. Presumably there are some opt-in tools that are very strict about the inline assembly uses you have in C. I don't have any experience with them, but I'm sure they exist.

That's the whole point, it's less about the language, and more about the specifics of the code itself in a testable way. Saying something is written in C without accounting for a bunch of inline assembly is analogous to saying something is written in Rust and not accounting for a bunch of unsafe blocks. Not in that they are equivalently safe or unsafe, but that that high level assumptions because of the language fail because of the practices done within.

If I had to choose between two libraries written in C and one was pure C and one was 50% inlined assembly, and I viewed security and safety as more important for my use case, I know which one I would choose if that information was surfaced to me easily.

adamnemecek · on June 24, 2023

There are no Rust crates with 95% unsafe code.

kbenson · on June 24, 2023

If you think I'm talking about rust vs C and not just using them as stand ins for any language pair, you're not understanding my point. Possibly because you're too focused on one language in particular, as I'm not focusing on any language in particular.

barsonme · on June 24, 2023

Crates that only make FFI calls exist. I have written some. :)

Animats · on June 24, 2023

Well, what's new here?

Generational References, random or sequential

That's an old idea. Goes back to at least the 1980s. It's a useful way to detect use-after-free at run time, but doesn't prevent it. "To get access to an object, we first check ("generation check") that the current generation number matches the remembered generation number. If not, we safely signal a segmentation fault." Um.

Rule 5: We can only read a field by taking ownership of it, by either swapping something into its place or destroying the containing struct.

Hm. This is single ownership with move semantics on steroids. There was once some enthusiasm for languages where you could only read a variable once. But that didn't go very far. On the other hand, single assignment, where you can only write once, on the other hand, is now widely accepted and useful.

I doubt that any collection of hacks on top of C++ will make it safe. Too many people, including me, have tried. To increase safety, you have to take things out. This is unpopular and breaks backwards compatibility.

pjmlp · on June 24, 2023

One way to take things out is to have something like Sonar on the CI/CD pipeline, configured exactly to take specific patterns out, that break the PR builds and won't get greelighted for merging.

Yeah, not everyone likes those of us that share development roles alongside security best practices enforcement.

azakai · on June 23, 2023

There is also Type-After-Type:

https://dl.acm.org/doi/10.1145/3274694.3274705

(though maybe that's covered by what the author meant by "arenas").

verdagon · on June 24, 2023

This is one of the other three secret blends that I think could bring memory safety to C++!

I wrote a bit about using type-after-type as the basis for an entire language (Arrrlang, with a parrot mascot) in my last article [0] and a little bit in a post about memory safety for unsafe languages [1] which we eventually talked about at Handmade Seattle.

The downside is the extra memory usage, but I think we can combine it with temporary regions [2] to reduce it pretty drastically.

TIL the phrase type-after-type! I've also heard it referred to as type stability. [3] [4] If you squint, this is what we often do manually with Rust when the borrow checker influences us to use indices/IDs into central collections.

[0] https://verdagon.dev/blog/myth-zero-overhead-memory-safety

[1] https://verdagon.dev/blog/when-to-use-memory-safe-part-1#the...

[2] https://verdagon.dev/blog/zero-cost-borrowing-regions-overvi...

[3] https://www.usenix.org/legacy/publications/library/proceedin...

[4] https://engineering.backtrace.io/2021-08-04-slitter-a-slab-a...

Voultapher · on June 23, 2023

> Tracing GC is the simplest model for the user, and helps with time management and development velocity, two very important aspects of software engineering.

> Borrow checking is very fast, and helps avoid data races.

One thing many people seem to assume is that not having to care about memory means you can program faster and get to your goal faster. As the author here seems to do. However as it turns out, if your program is more complex than a ~100-1000 lines of code, explaining in a explicit way who owns what and who gets to change state when, is a very useful way to avoid bugs.

Saoirse Shipwreckt aka withoutboats mentioned this a while ago in https://without.boats/hire-me/

> Rust works because it enables users to write in an imperative programming style, which is the mainstream style of programming that most users are familiar with, while avoiding to an impressive degree the kinds of bugs that imperative programming is notorious for. As I said once, pure functional programming is an ingenious trick to show you can code without mutation, but Rust is an even cleverer trick to show you can just have mutation.

and later follows up on this in https://without.boats/blog/revisiting-a-smaller-rust/

> I still think this is Rust’s “secret sauce” and it does mean what I said: the language would have to have ownership and borrowing. But what I’ve realized since is that there’s a very important distinction between the cases in which users want these semantics and the cases where they largely get in the way. This distinction is between types which represent resources and types which represent data.

liuliu · on June 23, 2023

I don't write Rust.

But here is what you said and what the author said don't conflict with each other, and it has been on my mind for a while.

People who write similar code, or work on things for decades usually don't really think through what "sketch out some code" looks like. They spend most of their time on refactoring things that has clear use-cases, but not well-defined API boundaries within the component, or between components. So ownerships, nullability checks, data race checks are all comes very naturally as a starter.

But there are other side of the world, where people constantly sketching out something, for things like creative arts, high-level game logic, data analysis, machine learning etc. Now putting yourself in that position, the syntax noises are actively in the way of this type of programming. Ownerships, even nullability checks are not helpful if you just want to have partial code running and checking if it draws part of the graph. This is a world Python excels, and people constantly complaining about why this piece of Python code doesn't have type-annotation.

We may never be at peace between these two worlds, and this manifest itself somewhat into the "two-language problem". But that to me, is when someone mean "development velocity is faster".

marcosdumay · on June 23, 2023

> Ownerships, even nullability checks are not helpful

Memory management does get on the way. But you are wrong about algebraic data types, they will help you sketch something.

Ideally, if you don't know what you want, you will want extendable¹ algebraic types, more like Type Script than Rust, but what you call "nullability check" is a benefit since the beginning.

1 - Where you can say "here comes a record with those columns" instead of "here comes this record". You can write this in Rust, but it's easier to simply completely define everything.

IshKebab · on June 23, 2023

In my experience even in those "sketching" areas static types and strict checking is the better trade-off.

I think the real criteria for "will static types and stricter checks help?" is "how long will this thing last for?".

E.g. for a shell REPL you definitely don't want to have to write our types, but for a shell script you definitely do.

Something like using MATLAB for exploratory research is probably another decent example. Or maybe hackathon games.

But for most games, data analysis, machine learning etc. then being stricter pays for itself almost immediately.

Karrot_Kream · on June 23, 2023

In your framing there's a sort of implicit downplaying of the frequency of exploratory work and an implicit promotion of stricter work.

> Something like using MATLAB for exploratory research is probably another decent example. Or maybe hackathon games. But for most games, data analysis, machine learning etc. then being stricter pays for itself almost immediately.

(Emphasis mine)

This is where the viewpoints differ. Some people spend a lot more time on the exploratory aspect of coding. Others prefer seeing a program or a system to completion. It largely depends on what you work on and where your preferences lie.

Years ago I wrote a script that grabs a bunch of stuff from the HN API, does some aggregation and processing, and makes a visualization out of them. I wrote it because the idea hit me on a whim while intoxicated, and I wrote the whole thing while intoxicated. The script works and I still use it frequently. I haven't made any changes to it because it just does what it needs to. It has no types. It's written decently because I've been coding for a long time but I was intoxicated when I wrote it. The important thing is it's still providing value.

There's a surprising amount of automation and glue code that doesn't need the correctness of a type system. I've written lots of stuff like this over the years that I use weekly, sometimes daily, that I've never had to revisit because they just work. I suspect it's a matter of personal preference how much time a person spends on that kind of work vs building out large, correct systems. I suspect there's a long tail of quality-of-life tooling that is simple and exploratory in nature much like large, strict systems are much bigger than most people expect at first blush because of how many cases they handle.

I think trying to say that one is more common than the other without anything approaching the rigor of at least a computing survey is really just to use your gut to make generalizations. Which is what the strict vs loose typing online debates really are. A popularity contest of what kind of software people like to write given the forum the question is being discussed on.

convolvatron · on June 23, 2023

I really love parts of rust and kinda hate other parts.

but this is what really ruins it for me. I want to play. I want to knock something together and work with it and see what kind of shape it is.

rust demands that I cross every last t before I can run it at all. which is great if you already have a crystal notion of what you are building

jjnoakes · on June 23, 2023

> rust demands that I cross every last t before I can run it at all. which is great if you already have a crystal notion of what you are building

Maybe I'm a weirdo, but I don't find this to be the case for me.

When I'm knocking things together in Rust I use a ton of unwrap() and todo!() and panic!() so I can figure out what I'm really doing and what shape it needs to have.

And then when I have a design solidified, I can easily go in and finish the todo!() code, remove the panic!() and unwrap() and use proper error types, etc.

snek_case · on June 23, 2023

> rust demands that I cross every last t before I can run it at all.

It's worse than that IMO. Rust makes it very awkward/impractical to have cyclic data structures, which are necessary to write a lot of useful programs. The Rust fans will quickly jump in and tell you that if you need cycles, your program is wrong and you're just not a good enough programmer, but Maybe it's just that the Rust borrow checker is too limited and primitive, and it really just gets in the way sometimes.

Some of the restrictions of the Rust borrow checker and type system are arbitrary. They're there because Rust currently can't do better. They're not the gospel, they aren't necessarily inherent property that must always be satisfied for a program to be bug free. The Rust notion of safety is not an absolute. It's a compromise, and a really annoying, tiresome drain on motivation and productivity sometimes.

jcranmer · on June 23, 2023

The basic model of Rust is to move use-after-free from a dynamic, runtime check to a static, compile-time check. But to keep the static checks from being Turing-complete, you need to prohibit arbitrary cycles while something like a tree (or other boundable recursion) is doable. So Rust not being able to check cyclic data structures isn't a "Rust currently can't do better" situation, it's a "Rust just can't do better" situation.

What Rust's intended solution for that is that you add in data structures that do the dynamic checking for you in those cases. But the Rust library doesn't provide anything here that's useful (RefCell is the closest alternative, and that's pretty close to a this-is-never-what-you-want datatype), which means your options are either to use integers, roll your own with unsafe, or try hard to rewrite your code to not use cycles (which is usually a euphemism for use integers anyways). The problem here, I think, is that there is a missing data structure helper that can sit in between integers and references, namely something akin to handles (with a corresponding allocator that allows concurrent creation/deletion of elements).

cmrdporcupine · on June 23, 2023

missing data structure helper -- didn't you already just name-check that though, since that's basically RefCell .. or if you're willing to roll the dice... UnsafeCell (aka "trust me I know what I'm doing")?

jcranmer · on June 23, 2023

What you essentially want for the user to not write any unsafe code is this kind of interface:

   trait Allocator {
     fn allocate<'a, T>(&'a self, init: T) -> Handle<'a, T>;
     fn deallocate<'a, T>(&'a self, handle: Handle<'a, T>);
     fn read<T>(&self, handle: Handle<'_, T>) -> impl Deref<T>;
     fn write<T>(&self, handle: Handle<'_, T>) -> impl DerefMut<T>;
   }

&'a RefCell<T> is pretty close to a definition of Handle<'a, T>, except that Rust provides no implementations of allocate and deallocate that take a const instead of a mut reference for self. Trying to make an allocator that lets you safely deallocate something requires a completely different implementation of Handle<'a, T> than what RefCell can provide, and even if you're fine without deallocation, allocation with a const ref still requires unsafe to get the lifetime parameter right.

cmrdporcupine · on June 24, 2023

Did you get a look at https://github.com/rust-lang/rfcs/pull/3446 at all?

jcranmer · on June 24, 2023

I'm not in the habit of regularly following new Rust RFCs, so I'd have no way of knowing about something made just last week. :-) But I'm taking a look now.

cmrdporcupine · on June 24, 2023

I don't tend to follow them either, but I've been frustrated by the lack of progress on allocator_api, and I came across this yesterday after looking into that. I only mention it because the Handle stuff in there looked tangentially related, though it's talking about something quite a bit different than you were.

nyanpasu64 · on June 23, 2023

Can you clone a Handle? If so, how do you handle using a clone after freeing it? If clones are refcounted, how do you handle cycles?

jcranmer · on June 24, 2023

There are several different ways you can implement a Handle, depending on what features you want; the most important part of its implementation is that `fn is_valid(handle: Handle) -> bool` is possible. The simplest implementation is a (pointer, generation) pair, which can be packed into a u64 pretty easily even for 64-bit systems; every allocation and deallocation increments the generation counter in the allocator, and is_valid is thus implemented by checking if the allocator's generation matches the claimed generation for a Handle. This kind of Handle is effectively a Copy implementation (not merely Clone!).

Effectively, handles are like weak pointers in that you can detect when the underlying object has been freed, but unlike weak pointers, there's no need for a reference counter to know when to deallocate the object--the object is freed when the allocator itself dies, or it can manually be freed earlier. It is possible to write code that will attempt to use the freed object, and the compiler will be happy, but the runtime will detect that it has been freed and panic instead. (RefCell does something similar, except it only detects violations of multiple readers xor one writer requirement, not overall lifteime). You can also add other wrappers around Handles to automatically free those Handles on scope exit, but the point is you can now have multiple references to an object that can be upgraded to a mutable reference if you desire.

umanwizard · on June 24, 2023

> The Rust fans will quickly jump in and tell you that if you need cycles, your program is wrong and you're just not a good enough programmer

I have absolutely never heard a Rust fan say this. AFAICT the fact that cyclic data structures are hard to write is widely accepted within the community as one of the negative tradeoffs of the language.

If you’re talking to people who claim that any language is better than all others in every possible way, for every possible use case, then they are zealots whose opinion can be ignored.

jackmott42 · on June 23, 2023

I would never tell you that you are wrong to have cyclic data structures. But there are reasonable workarounds like using handles into an array to do it, which of course re-creates some of the same problems as pointers, but not the worst ones, and is often a positive for performance on modern hardware due to improved data locality.

Or you can use reference counted types and take a small performance hit.

Or use unsafe and git gud.

nsajko · on June 23, 2023

> currently can't do better

The limitations are an inherent consequence of basic tenets of Rust's design. Rust wouldn't be Rust anymore if you fixed them.

> Some of the restrictions of the Rust borrow checker and type system are arbitrary. They're there because Rust currently can't do better. They're not the gospel, they aren't necessarily inherent property that must always be satisfied for a program to be bug free. The Rust notion of safety is not an absolute. It's a compromise, and a really annoying, tiresome drain on motivation and productivity sometimes.

Yeah, but this actually seems consistent with the philosophy behind Rust: to take away the tools a programmer needs for creativity, so they couldn't do potentially costly mistakes, as applicable to big teams in huge corporations. Another commenter in this thread put it nicely: the borrow checker is a straitjacket for the programmer.

It's not meant to foster creativity, it's meant to be safe for big business and novice employees.

Yoric · on June 23, 2023

> It's not meant to foster creativity, it's meant to be safe for big business and novice employees.

Interestingly, my experience is the opposite.

I find that the "straightjacket" is extremely precious during refactorings – in particular, the type of refactorings that I perform constantly when I'm prototyping.

Compared to this, I'm currently writing Python code, and every time I attempt a refactoring, I waste considerable amounts of time before I can test the interesting new codepath, because I end up breaking hundreds of other codepaths that get in the way and I need to go through the testsuite (and pray that it contains a sufficient number of tests) hundreds of time until the code is kinda stable.

Which is not to say that Rust matches every scenario. We agree that it doesn't, by design. But I don't think that the scenarios you sketch out are the best representation of what Rust can/should be used for and can't/shouldn't be used for.

pjmlp · on June 24, 2023

Basically it should be left for scenarios where any kind of automatic memory management isn't allowed, either for technical reasons, or because it is a lost battle trying to change the mindset of the target group.

For everything else there are more productive options.

ordu · on June 23, 2023

Cyclic data structures are implemented easily with unsafe. Like non-cyclical ones (Vec for example). The difficult part is to make a safe API to that. This difficulties are not of syntactic nature but design difficulties. You need to think through your use cases for such a struct and to devise an API that supports them.

This is more difficult than C++ way "just do it". With C++ you will solve the same problems but on a case by case basis as they come into view. With Rust you need to solve these problems upfront or do a lot of refactoring later. There are upsides and downsides in both approaches, but it is clear that Rust is not good to sketch some code quickly to see how it will do.

It is still possible to do it quickly with Rust in a C++ way by leaking usafety everywhere and passing raw pointers, but I think it is still easier to do it with C++ which was designed for this style of coding.

cmrdporcupine · on June 23, 2023

This is definitely true, but I also don't know what a reasonable alternative is at this point for systems dev (aka places where a GC is a Bad Idea). I wouldn't unleash C or C++ onto a new project like that? I'd just feel icky. And Zig's type system IMHO isn't good enough, I'd really miss pattern matching for one.

I do think many people are using Rust in the Wrong Places(tm). It seems like torture to me to be applying it for general application development (though because I basically now "think" in it, I can see I myself would be tempted to do so).

And for things with complicated ownership graphs or nested interrelated data? It's just... no. Dear god, Iterator in Rust is an ownership and type traits nightmare, let alone anything more complicated

So I think people should just use a hybrid approach and keep Rust where it belongs down in the guts and use something higher level and garbage collected higher up.

Here's another thing about Rust that's driving me batty: it is nominally positioned as a "systems" programming language, but key things that would make it more useful there are being neglected, while things that I would consider webdev/server programming aspects are being highly emphasized.

Examples I would give that have driven me nuts recently: allocator_api / pluggable per-object allocators ... stuck in nightly since 2016(!). Full set of SIMD intrinsics and broader SIMD support generally ... also stuck. const_generics_expr ... still not there.

Meanwhile async this and async that and things more useful to the microservice crowd proliferate and prosper

dmytrish · on June 30, 2023

Async is badly needed in systems programming, more so than at the application level: handling events in embedded/low level components is incredibly tedious without it.

Yoric · on June 23, 2023

I think I agree with most of what you write, but note that async has lots of applications beyond microservices. In particular, writing anything that uses the network (e.g. a web browser), which definitely feels system-y to me.

curt15 · on June 24, 2023

>rust demands that I cross every last t before I can run it at all. which is great if you already have a crystal notion of what you are building

As a Rust beginner, I find the principle that "if it compiles, it works" quite helpful for catching technical oversights.

sroussey · on June 23, 2023

This is the nice thing about TypeScript—you can type want you want. As you iterate you can either ramp or down your type checking. This is outside the realm of memory management, of course.

And new to JS/TS land is the separation of pure data structures from resources. Something a sibling comment or brought up.

db48x · on June 23, 2023

Yea, different languages for different purposes. Rust is for finished products, not so much for experimentation. When you want to play or experiment you should use Lisp.

adamc · on June 23, 2023

That makes it expensive to move from experimentation to "fairly usable", though.

db48x · on June 23, 2023

Your Lisp program will be entirely usable once you have experimented and found the right way to do it. Lisp compilers are really good, and they support gradual typing: you can write your program with no explicit type information, and then speed it up by adding type information in the hot spots. You can deploy that to production and it will serve you well.

At some point your Lisp program will be mature, you will have implemented most of the features you know you will need, and you will know that any new features you add in the future will not alter the architecture. Once you understand the problem and have established the best architecture for the program, you can consider rewriting it in Rust. Lisp’s GC does have a run–time cost, and you can measure it to figure out how much money you will save by eliminating it. If you will save more money than the cost of the rewrite, then go for it. Otherwise you can go on to work on something more cost–effective.

Note that you might not need to rewrite the whole program; it might be more effective to rewrite the most performance–critical portion in Rust, and then call it from your existing Lisp program. This can give you the best of both worlds.

pjmlp · on June 24, 2023

That is hardly a reason, given that Common Lisp also supports value types and whole OSes were once upon a time written in Lisp variants, whose main features landed on Common Lisp.

verdagon · on June 23, 2023

I would love a language (or C++ subset!) where we could get the benefits of that secret sauce, while mitigating or avoiding some of its downsides.

Like Boats said, the borrow checker works really well with data, but not so well with resources. I'd also opine that it works well with data transformation but struggles with abstraction (both the good and bad kinds), works well with tree-shaped data but struggles with programs where the data has more intra-relationships (like GUIs and more complex games), and works well for imposing/upholding constraints but can struggle with prototyping and iterating.

These are a nice tradeoff already, but if we can design some paradigms that can harness the benefits without its particular struggles, that would be pretty stellar.

One promising meta-direction is to find ways to compose borrowing with mutable aliasing. Some promising approaches off the top of my head:

* Vale-style "region borrowing" [0] layered on top of a more flexible mutably-aliasing model, either involving single-threaded RC (like in Nim) or generational references (like in Vale).

* Forty2 [1] or Verona [2] isolation, which let us choose between arenas and GC for isolated subgraphs. Combining that with some annotations could be a real home run. I think Cone [3] was going in this direction for a while.

* Val's simplified borrowing (mutable value semantics [4]) combined with some form of mutable aliasing (like in the article!).

* Rust does this with its Rc/RefCell, though it doesn't compose with the borrow checker and RAII as well as it could, IMO.

[0] https://verdagon.dev/blog/zero-cost-borrowing-regions-part-1... (am author)

[1] http://forty2.is/

[2] https://github.com/microsoft/verona

[3] https://cone.jondgoodwin.com/

[4] https://www.jot.fm/issues/issue_2022_02/article2.pdf

cmrdporcupine · on June 23, 2023

Two things, full time Rust dev here:

a) Rust's borrow checker is good and its type system good, but IMHO it's not really doing what you say it is as well as you're implying: "explaining in an explicit way who owns what"; While ownership is explicit and static (apart from RefCell and friends), description of that ownership is scattered all over, program state flows are not modelled in the type system at all, and on the whole Rust is far from having being a kind of explicit "I can reason about the whole program" declaritive system with the kind of clarity you're implying. Or maybe I'm taking your claims too strongly.

b) Rust's borrow checker is good. But it's not perfect and fails to pass things that in fact should be legal borrows. In particular there's edge cases around where things are grabbed in if/let/else or matches, like this fail (from my own code):

        {
            let local_version = self.seek_local(tx);
            if local_version.is_some() {
                return match &local_version.unwrap().value {
                    Entry::Value(v) => Some(v),  // reference to value
                    Entry::Tombstone => None,
                };
            }         
        }
        // note that 'local' has gone out of scope here and so self should not be borrowed

... code later in func complains 'self' is still borrowed,

but the same thing done this way (but less efficiently) passes:

        if self.seek_local(tx).is_some() {
            let local_version = self.seek_local(tx).unwrap();
            return match &local_version.value {
                Entry::Value(v) => Some(v),
                Entry::Tombstone => None,
            };
        }

... same other code that uses 'self' compiles fine

In neither case is the 'local_version' being used outside of the lexical scope, and 'self' cannot be borrowed in either case, but the borrow checker is convinced in version #1 that they are and that code below that lexical scope cannot proceed because 'self' is borrowed. They're logically basically equivalent from a program flow and state mgmt, but the second passes while the first fails. Rust 1.7.0 stable.

(Before you ask, I did have if/let to take apart local_version instead of using unwrap, and the compiler griped about that even more)

Having the burden of how to fix that fall on the programmer sucks. This is all a step in the right direction, but I run into this kind of thing here and there and I shouldn't have to.

ziml77 · on June 23, 2023

The limitations of the borrow checker when it comes to borrowing self are annoying. I've had cases where I just said "screw it" and copied the body of a function inline in the 1 or 2 places it was being called just to make the borrow checker happy.

imtringued · on June 23, 2023

RIP all the modern languages that haven't made any improvements in memory management at all.

There is so much low hanging fruit in programming language design and nobody is picking it up and instead everyone produces marginal improvements over existing languages.

antonvs · on June 23, 2023

Because implementing a new language and getting it to wide adoption is an enormously challenging task, with a much lower success rate than e.g. SV startups.

Languages that try to implement one new bright idea don't go anywhere, because that's not enough to cause people to switch. At best they serve as examples for feature adoption in other languages.

Look at Rust for example: it seems to be succeeding and gaining adoption, but right now it's still relatively niche (check the number of Rust job postings), and it's taken 17 years to get to this point, with sponsorship from major organizations like Mozilla.

Given this, the idea that there's much low-hanging fruit that's being ignored, that could easily be exploited, seems dubious. What's an example of what you have in mind?

kbenson · on June 23, 2023

> it's taken 17 years to get to this point

Yes and no. Rust went through quite a bit of changes early on, ro the point that it's not really that similar of a language, and 1.0 was released in May 2015.

That's still quite a while (8 years), but IMO doesn't quite mean the same thing as a language that's been around for 17 years with a similar level of adoption. My impression (from the outside) is that Rust usage is still increasing, at least in specific areas, and has not leveled off or tapered. It doesn't seem to be exploding into lots of teams and places, but it does seem to be getting footholds still, like at Azure.

tcmart14 · on June 23, 2023

While that is true about Rust, most new languages are gonna have the same thing. It'll be years before they get to 1.0. Look at Zig, just about every new language. So I don't think it is valid to discount the 1.0 days because all languages are gonna need awhile to get to the 1.0 day. It still took 17 years of time investment to get Rust to where it is today.

kbenson · on June 23, 2023

I discount the early days because I don't think most professionals would rely on a language pre-1.0 that advertises it will stabilize at 1.0, so regardless of whether it spends 6 months or 10 years pre-1.0, with regard to wider adoption you'll only be able to make limited inferences about what that period means.

For example, you say 17 years, but it was a side project for the first four of those, and was only publicly announced as Rust from 2010 on from what I can find (given there's no way my memory is that good), but the following two announcements back that up.[1][2] If it's not really public or being advertised, I'm not sure how that can count towards adoption over time. Additionally, if it's advertised but with the caveat that it's pre-release and just for playing with as a proof of concept, should that count towards the adoption timeline? Counting periods when people were specifically warded off in a project's lifetime also seems odd to me, but your assessment would also use that as an indicator of what it's achieved over time.

I wouldn't say a novel languished in obscurity for a decade just because the author mentioned they were working on it at some point, I would assess it from the point it was released as a complete work and presented to people as a finalized product they could read expecting a full story.

  1: https://news.ycombinator.com/item?id=1498233

  2: https://news.ycombinator.com/item?id=1498232

verdagon · on June 23, 2023

> There is so much low hanging fruit in programming language design and nobody is picking it up

(waves) Author here! I wrote this article about some improvements to C++, but I also made a whole programming language [0] using a lot of these weird techniques. So not quite nobody!

Still, I can see why very few people do it. It's a massive undertaking. Even if one is fortunate enough to be able to spend the thousands of hours it takes to make a language, there's only a 0.0001% chance a particular language will even have a chance to make it into the mainstream. In other words, a glorious, glorious fool's errand.

One basically needs to be insane to embark on such an endeavor. But hey, turquoise bicycle shoe fins actualize radishes greenly!

[0] https://vale.dev/

FpUser · on June 24, 2023

I saw how design dances around structs / methods / interfaces. What is so bad to have however limited form of OOP instead? Would look much cleaner from my point of view.

imachine1980_ · on June 23, 2023

this is because programming languages have network effects, and are costly to move and test in real world case, you can use pony, but luck searching sdk, databases, performant compilers, and maintained libraries, the community aspect of programming languages ecosystems makes this, no matters how great it is if inst popular you will have hard time being a developer in it. that why most languages that works start in niche great scripting, good for data analysis, great for concurrent programming scala, and some of then like python then scale and other like scala or julia don't.

gavinray · on June 23, 2023

See also:

Thomas Neumann's current proposal for memory safe C++ using dependency tracking:

- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p27...

Google's proposal for memory safety using Rust-like lifetime analysis:

- https://discourse.llvm.org/t/rfc-lifetime-annotations-for-c/...

- https://github.com/google/crubit/tree/main/lifetime_analysis

pjmlp · on June 23, 2023

And Microsoft' work on Visual C++ lifetime checker and SAL, as well.

It will never be perfect, but every little improvement helps.

Voultapher · on June 23, 2023

> It will never be perfect, but every little improvement helps.

Or it might convince people to stay longer on a plane with a provably [0] terrible safety record.

[0] https://alexgaynor.net/2020/may/27/science-on-memory-unsafet...

pjmlp · on June 23, 2023

To put matters into perspective, Rust reference implementations depend on C++ toolchains.

Same applies to all major Ada, Java, .NET, Swift, Ocaml and Haskell implementations. And any GPGPU toolchain.

Which kind of shows it isn't going anywhere and those planes have to be improved no matter what.

SubjectToChange · on June 23, 2023

As an addendum, the same goes for many C toolchains. Anything requiring GCC 4.8 or later is depending on a C++ compiler. And projects like LLVM’s libc, Fuchsia’s Zircon kernel, the bareflank hypervisor, etc, demonstrate that C++ really can be used anywhere C is used.

C++ is the new C in the sense that it’s the language everything else is built on and I expect it will be even more difficult to displace than C. For instance, the complexity of C++ makes it next to impossible to incrementally rewrite in another language, simply writing a production quality C++ implementation is a gargantuan investment so a superset language is questionable, and the C++ community is committed to evolving and improving their language whereas C has largely ossified. Perhaps C will outlive everyone reading this thread, but C++ will outlive C.

pjmlp · on June 24, 2023

> C++ really can be used anywhere C is used.

Kind of what I have been preaching since 1993, as I adopted C++ into my toolbox, and had some fun on Usenet on C vs C++ flamewars.

It was with some vindication that I celebrated when all major C compilers eventually transitioned to C++.

Voultapher · on June 23, 2023

I agree these planes are important and deserve care. At the same time pretty much all suggestions on how to meaningfully improve the safety of those planes boil down to successor languages Cpp2, Carbon etc. or require some other complex manual rewrite of components of said plane. There is an argument to be made for having good out-of-the-box interoperability, however even in some of the most complex and important code-bases in existence, namely browsers Firefox and Chrome, have demonstrated that you can do that part replacement in Rust. I'm not saying there is no other way. But these suggested and yet unproven improvements to C++ will not automatically make those planes safer. They will require replacing parts with new code, and if we are writing new code there is a serious question we should ask ourselves, building on what foundation do we want to improve those "planes".

gavinray · on June 23, 2023

The second someone makes a successor language that seamlessly/directly interops with C++ _AND_ has the level of build/IDE tooling that C++/Rust have, I'm on board.

The closest thing right now is Sean Baxter's "Circle" compiler in "Carbon" mode IMO:

https://github.com/seanbaxter/circle/blob/master/new-circle/...

Unfortunately, Circle is closed-source and there's no LSP or other tooling to make the authoring experience nice.

pjmlp · on June 23, 2023

I also see Circle as the most promisor C++ wannabe, from all the contenders, and it being closed-source, once upon a time all major compilers were, so lets see.

pjmlp · on June 23, 2023

Rust in Firefox is a very tiny portion of it and now they are using some WASM sandbox tricks, because they aren't going to rewrite everything in Rust, given the effort.

Chrome only now started to consider to allow adding Rust, and it is baby steps, not coming close to V8, graphics engine and such.

Voultapher · on June 24, 2023

"very tiny portion" that's a gross misrepresentation. Rust sits at ~10% and C++ at ~27% https://4e6.github.io/firefox-lang-stats/.

pjmlp · on June 24, 2023

Since when is 10% big?

freeone3000 · on June 23, 2023

Rust has been bootstrapped for nearly a decade. The rust reference toolchain is built in rust.

detaro · on June 23, 2023

if you pretend that LLVM and friends are not part of the toolchain

pjmlp · on June 23, 2023

So no need for LLVM and GCC, Great news!

Where can we download it?

Voultapher · on June 23, 2023

Assuming you are serious, there is https://github.com/bytecodealliance/wasmtime/tree/main/crane... which is written in Rust and is targeted to become the default debug backend in rustc. LLVM has accumulated a lot of optimizations contributed by various groups and people over more than a decade. It's hard to catch up to that by virtue of resource limits.

pjmlp · on June 23, 2023

I was being sarcastic, when Cranelift becomes the official reference implementation then I shut up.

mr_00ff00 · on June 23, 2023

Is there a reason to replace LLVM? Are there still memory bugs that are popping up and causing issues?

steveklabnik · on June 23, 2023

https://github.com/bytecodealliance/wasmtime/blob/main/crane...

oleganza · on June 23, 2023

The reason I use Rust is because I can bypass all this messy business altogether and have my sensible patterns wrapped in a usable syntax and enforced by the compiler out of the box.

Whenever people say "just follow these rules" I read "just add this extra mental burden and do not slip up". Computers were invented to automate things. Rust automates ownership and borrowing rules. Suggestions like "do not forget to initialize unique_ptr with something" are not intelligent solutions.

SubjectToChange · on June 24, 2023

Keep in mind that a lot of C++ users really didn’t have a choice, it was either C or C++ for a lot of applications. In fact C++ is often still the only option, with some segments (like HPC) doubling down on their investments in the language.

So how do we serve those users? Rust doesn’t give them a path forward. Simply getting C++ compilers to agree is difficult enough, much less an entirely different language.

verdagon · on June 24, 2023

I'd definitely agree, and the article proposes a static analysis tool (similar to the borrow checker) for that purpose.

SleepyMyroslav · on June 24, 2023

I want to note couple of things from POV of C++ gamedev to safety/compiler people. This is completely IMHO and opinion is not of my employer and yadda yadda disclaimer. Feel free to disagree. It is posted here to learn where I am wrong about things.

If you don't like our arrays that means we don't like your type system. Arrays is the only efficient way to access existing RAM and until hardware changes do not start your work with 'do not use arrays'. And do not write into array elements without dev explicit consent just for safety.

If you want to lock access to whole array to single reference it does not scale with current hardware that has lots of CPUs and accelerators. Start thinking about array slices as things that can check safety. Start thinking about temporal aspects of multiple code blocks accessing slices into same array(s).

If your type system is not allowing iteration in 'broken' project state it does not scale with projects that involve multiple people working at same time. Yes one has to put some mitigations to stop 'nasal demons' from running wild in practice to be able to work. Just drop the idea of blocking people from work until it is fixed.

LoganDark · on June 23, 2023

Absolutely love to see CHERI mentioned here <3

coliveira · on June 23, 2023

The reason why safety in C++ is difficult to achieve is due to the memory model used by C and C++. The memory model is a flat space provided by the OS that can be addressed by pointers. In this sense, C++ is similar to assembly code. A language like Java, on the other hand, assumes a different model where you can only access objects with well defined behavior. To change this, one needs to disallow the use of native pointers in C++ or make them less powerful, like Java did.

josefx · on June 23, 2023

> The memory model is a flat space provided by the OS that can be addressed by pointers

From what I understand this is not true. Pointers cease to be valid the moment you try to leave a single allocation. You get to play around within a single continuous allocation and one past the end, everything further out is playing with fire.

Even comparing the "addresses" of two separate allocations is undefined if done with "<" . The comparison function std::less is basically magic to get well defined behavior out of a language that doesn't guarantee it.

> C++ is similar to assembly code

Only if you use a compiler that does not optimize anything.

LoganDark · on June 23, 2023

> Pointers cease to be valid the moment you try to leave a single allocation.

For the other readers who might not know what this is referring to, it's pointer provenance. For an introduction to the topic, I always recommend Ralf Jung's blog series, "Pointers Are Complicated":

https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html

antonvs · on June 23, 2023

> everything further out is playing with fire.

That's the point. C and C++ don't prevent you from playing with that for. Memory-safe language do.

SubjectToChange · on June 23, 2023

Rust only requires you to wrap it in unsafe. And I think C# allows you to do some pretty crazy stuff too.

tjalfi · on June 24, 2023

The Kestrel web server[0] is an example of a codebase that makes extensive use of C#'s unsafe functionality. UnmanagedBufferAllocator[1], for example, looks pretty similar to how you'd write it in C++.

[0] https://github.com/dotnet/aspnetcore/tree/1a56bdb671700ae698...

[1] https://github.com/dotnet/aspnetcore/blob/1a56bdb671700ae698...

int_19h · on June 25, 2023

C# cleanly separates pointers into Java-style managed references that are opaque, and C-style unmanaged pointers that are transparent but can only be used inside "unsafe" blocks.

shadowgovt · on June 23, 2023

It is, for all practical purposes, a flat space in the sense that for bare pointers, operator++ is defined (increments to the next whatever, defined based on type of pointer).

There is no operator++ equivalent in Java to apply to object references (unless you go unsafe); you can't immediately shoot yourself in the foot without the compiler noticing by asking for "the next object after this one" when no such thing exists.

(handwave a bit: of course, you can ask for an object past the last object in any container. That's (a) not the same thing and (b) results in an immediate runtime error in Java, instead of undefined behavior)

tsimionescu · on June 23, 2023

It depends what you mean exactly. The C and C++ official memory model is very much not a flat space, but exactly what you describe for Java - you can only (validly) access objects. For example, the operation x < y is only defined if x and y are both pointers into the same object or array of objects (or one past the end of an array of objects). Otherwise, the operation is entirely undefined in both the C and the C++ memory models. The following program has no defined C or C++ semantics, and neither the C nor the C++ standards can tell you anything about what it could do:

  int x = 0;
  int y = 0;

  if(&x < &y) {
    printf("???");
  }

Now of course the implementation of C and C++ actually assumes without checking that you only access objects and not raw memory, and thus will happily read raw memory directly.

shadowgovt · on June 23, 2023

I really feel like it's a hell of a definitions dodge to say "This is what the model is" when no compiler implements constraints to require the user to treat the model like that (i.e. I can always just increment the pointer, or typecast it to numeric type, do math on it, and typecast back to a pointer, without having to pull any big red levers like using "unsafe" methods).

If it's undefined but it compiles to something, is it really undefined, or is the definition merely not standardized?

epcoa · on June 23, 2023

Yes it’s really undefined. There is a distinction from “implementation defined behavior” which you seem to be confusing it with. You are practically wrong in your assumptions. Since undefined behavior is undefined the compiler is free to do anything with compilation, it may compile to something but you have no guarantee what that something is. And in real life this often actually bites you when the optimizer comes into play - modern optimizing compilers can and do optimize undefined behavior into noops or other weird stuff.

Read this and don’t come back on this topic until you clearly understand it: https://en.cppreference.com/w/cpp/language/ub

shadowgovt · on June 23, 2023

No; this is a common misconception I see from people who swallowed the "it's allowed to format your hard drive and blow up your monitor" dodge vs. the electrical engineers who know where terminology like 'undefined behavior' originated in engineering. In practice, it tends to do something subtle and usually right but probably wrong for the simple, practical reason that if it did anything as obviously wrong as "format your hard drive and blow up your monitor," someone would have tripped over it testing the compiler and changed the compiler.

This is why I actually hate using this programming language, because when you hit undefined behavior (which the language makes trivial to do; incrementing a pointer past the allocated memory is a one-line operation that throws no errors) the end-result is usually subtle, wrong, and hard to find later if it isn't actually "close enough to right" because the compiler desperately tries to make a useful program because that's what compilers are for. Hell, if it formatted my hard drive and blew up my monitor, it'd be much easier to figure out where the problem was! Hand-waving this flaw in the design of the programming tool with "oh, it's undefined behavior; you should never have relied on that in the first place" when so many valid statements in the language compile to undefined behavior, as if that is good enough, is building a house on sand.

... and quite frankly, our industry is full of sand houses and we could stand to respond to the amount of undefined behavior in C++ by ceasing to build on that shaky foundation.

epcoa · on June 24, 2023

> No; this is a common misconception I see from people who swallowed the "it's allowed to format your hard drive and blow up your monitor" dodge vs. the electrical engineers who know where terminology like 'undefined behavior' originated in engineering.

You are arguing against topics I never brought up. (I am an EE fwiw). I have no misconceptions where undefined behavior comes from and have demonstrated none thus far.

> This is why I actually hate using this programming language, because when you hit undefined behavior (which the language makes trivial to do)

Ok so you want to editorialize on something else entirely.

No idea how that dismisses anything I said or referred to.

Regardless of how trivial it is in practice to invoke undefined behavior it doesn’t change the real differences between undefined and implementation defined behavior.

shadowgovt · on June 24, 2023

Every undefined behavior is de facto implementation-defined because something happens resulting from the state of the machine and the code being executed. Change the implementation and the thing that happens changes.

(I know the c++ spec defines these terms differently; I'm not talking about the spec definitions and I never was. I'm saying the spec definitions are a dodge around what actually happens when code is compiled and executed).

epcoa · on June 24, 2023

I have no idea what your thesis is.

It seems to me that you’re claiming that the existence of undefined behavior is bad. Which isn’t actually that controversial outside of certain people weirdly infatuated with C/C++.

But it seems moreover you merely have a problem with it being called undefined behavior or something, as if the word itself isn’t harsh enough.

I don’t see it. I don’t see the problem with the definitions as stated. It doesn’t weaken any commentary about undefined behavior to me at least.

And again regardless of your hatred for C++ weenies it doesn’t change the fact that there are meaningful practical differences between undefined and implementation defined behavior, the distinction has to exist regardless of what you call them.

> Every undefined behavior is de facto implementation-defined because something happens resulting from the state of the machine and the code being executed.

Regardless of spec this makes no sense to me. Implementation defined implies something is still “defined”, like not in the spec but somewhere. Undefined means what it says - it’s undefined.

I don’t even disagree with your other points but I don’t get how complaining about the practical difficulties of avoiding undefined behavior have to do with a “definitions dodge”

anonymoushn · on June 24, 2023

> No; this is a common misconception I see from people who swallowed the "it's allowed to format your hard drive and blow up your monitor" dodge vs. the electrical engineers who know where terminology like 'undefined behavior' originated in engineering. In practice, it tends to do something subtle and usually right but probably wrong for the simple, practical reason that if it did anything as obviously wrong as "format your hard drive and blow up your monitor," someone would have tripped over it testing the compiler and changed the compiler.

This is incorrect. In one very popular web server, some behavior depends on the values set for some response headers, and the value is checked in part by calling a function like strstr which takes two pointers and two lengths and searches one string for the other string. If you pass {null, 0} as the haystack and the implementation of the strstr-like function starts by computing the upper bound of the haystack (null + 0) then the compiler can legally produce *any value* for that expression. Then your program will quite predictably segfault. When it does your recourse is to fix your program, not to fix the compiler, because the compiler is working correctly.

shadowgovt · on June 24, 2023

Oh, no argument here. All I'm saying is that the behavior resulting is determined by the current state of the machine and the compiler used.

Why does the distinction matter? Because I don't believe any program beyond minimal complexity written in C++ is actually free of undefined behavior. Therefore, being able to ask questions like " What configuration of compiler built this?" is meaningful for debugging code.

Because undefined behavior is so easily reached in the language's specification, the abstraction is broken from the start and one must know implementation details of the compilers used to understand how the code behaves.

leni536 · on June 23, 2023

The result of the pointer comparison is unspecified, this is not undefined behavior in C++.

I don't know about C.

kimixa · on June 23, 2023

One issue is that the memory model isn't just a flat space that can be addresses by any pointer value - it may look similar to one if your compiler and OS let you, but doing things like accessing memory allocated as a different type or outside (an array of) objects is invalid, and the compiler is perfectly allowed by the standard to assume that never happens and happily "optimize" everything that may be a result of that away.

A lot of bugs have been caused by programmers assuming any access to the 'linear address space' is fine, but that has never been reliable as it's not allowed by the standard. The worse thing is when it looks like it works for a while, but you're relying on stuff not allowed by the standard so may change at any time (like a compiler version or option change, or even a change to a different part of the code that happens to tickle the compiler's analysis stages a slightly different way). See the "Time traveling NULL-check removal" - as the compiler "knows" that no pointer can ever have the value of NULL during deference, any path that does that can be completely removed - even if there's something like a NULL check and a logging output before said deference, if compiler decides that deference will eventually happen in that path unconditionally, that path and logging before the deference Can Never Happen so can be removed.

Or type punning and pointer aliasing - objects are created with a type, and so the compiler Knows if you convert a pointer type to another type that isn't compatible with the first type, they somehow magically point to different memory, and all the assumptions that implies for the following code.

A lot of these restrictions are pretty similar to things like Java have - the difference is that the JVM checks and flags violations and/or straight up disallows them when compiling - not just allowing the compiler to (silently) optimize based on those assumptions, and throwing the result at hardware to see what happens.

There may be a few platform/compiler-specific behavior used to implement super low-level stuff like OSs, but that's platform-specific stuff outside the C++ (or C) spec itself.

ajross · on June 23, 2023

That's pretty much what the article says though. "Don't use traditional pointers" is a fairly trivial rule to enforce via static analysis, and constructs like unique_ptr are syntactically identical anyway.

The bit that has me confused is that it's inventing a new term, "borrowing affine style", to describe a longstanding paradigm that has traditionally been called "RAII". Now, neither term is very clear, but surely it's better to use the existing confusing jargon instead of inventing new terms.

bluGill · on June 23, 2023

borrowing affine style is more than RAII. borrowing affine style means that there are no pointers, and always one owner. in borrowing affine style your functions take a unique_ptr for everything, if the lifetime of the data needs to live beyond the function, then the function returns a unique_ptr of that data back.

    std::unique_ptr<foo> var;
    // init and use var
    var = SomeFunction(std::move(var));
    // use var again.

Note that while in SomeFunction you lose access to var, but since SomeFunction returns it again you don't really lose anything. Of course Somefunction can also return some other unique_ptr<foo> that isn't var and you can't control that.

It is an interesting idea, though I'm not sure if I like it for real world code or not.

gpderetta · on June 23, 2023

The significant difference is a static guarantee of no reuse after move, hence the 'affine' qualifier (which is not new).

chlorion · on June 24, 2023

Affine in this case is referring to not being able to use values after they have been moved. When I say not able to use, I mean that it's a compile time error to attempt to use them.

C++ does not restrict you from using things after they have been moved and therefore does not have affine typing!

verdagon · on June 24, 2023

I'm not sure this has much to do with RAII. RAII does indeed require affine types, but RAII doesn't provide memory safety. The article describes a method of memory safety.

(Also, I used the term "borrowless affine style" mostly because people might hear the term "affine style" and assume I'm talking about Rust, since that's what most people know.)

adamnemecek · on June 23, 2023

You just need an unsafe keyword.

kubb · on June 23, 2023

it's not about making C++ memory safe, but about describing a safe subset of C++

pjmlp · on June 23, 2023

Ideally we would have -fsafe and [[unsafe]], but it will take years for something like that.

derefr · on June 23, 2023

Presuming syntax for “unsafe” that gracefully degrades in non-aware compilers, why couldn’t a particular compiler start doing it right now, starting with a very trivial safety checker than can be iteratively improved upon once the framework is in place?

eslaught · on June 23, 2023

I feel like D has gone this route of incrementally adding features (like borrow checking) to the language that, in principle, improve safety.

I wonder if anyone here has more experience to know how well it has worked?

One massive advantage of Rust is that they started with borrow checking from the beginning. I think one thing that often gets understated in these discussions is how much it matters to have your entire ecosystem using a set of safe abstractions. This is a major drag for C++, and I suspect that even if the language went a route like D they'd still have gaping safety holes in practical, everyday usage.

pjmlp · on June 23, 2023

It still hasn't, that has been unfortunely a common theme in D's evolution, chasing the next big idea that will this time bring folks into D, while leaving the previous ones half implemented with bugs.

So now there is GC and @nogc, lifetimes but not quite, scoped pointers, scoped references,... while Phobos and ecosystem aren't in a state to fully work across all those variations.

pjmlp · on June 23, 2023

You can have it today on Circle, but its relationship with some C++ folks is complicated.

bluGill · on June 23, 2023

It is easy to say add unsafe. However the details are very complex. I've read a few of the papers proposing something like this, and they spend a lot of time discussing some nasty details that are important to get right.

verdagon · on June 24, 2023

I'm not sure if there's really any alternative besides using a subset, for a low-level language. Even Rust is only memory-safe within a certain subset.

Edit: Actually nevermind! CHERI is a hardware technique that can make C++ memory safe without using only a subset of the language.

axilmar · on June 24, 2023

My idea for C++ memory safety would be compile time reference counting.

I.e. the compiler/tool to go through the code, as if it was executing, and apply reference counting to objects, revealing whether an object will be destroyed normally, prematurely or never.

This is what we actually do as humans by the way, when we are developing an algorithm in C++ with manual memory management. We do it implicitly though.

hawk_ · on June 24, 2023

My idea is to use your solution to solve the halting problem.

deterministic · on June 27, 2023

I use tools like Valgrind/Helgrind etc. to ensure that million+ line C++ code used in production by large international corporations has zero memory leaks, no buffer overruns etc. It’s easy. It works. No need to rewrite in any other language.

rurban · on June 25, 2023

> Stack objects are safe too

"I'd like to have the same drink this gentlemen has."

Of course stack objects are only fast (that'd why rust likes them) but entirely unsafe. You cannot return them, you cannot reference them from outside, and you cannot have too many or too big of them.

pizlonator · on June 23, 2023

You could also just isoheap according to type, where the type is whatever you come up with to make C++ casts sound. It could literally be C++ types or something looser (like if you want to say that bitcasting a int ptr to a float ptr is ok).

Then you don’t need any language changes to make UAF type safe.

MagicMoonlight · on June 23, 2023

Deleting and re-adding each item from an array every time you use something seems like a massive pain

chris_wot · on June 23, 2023

Interesting pigeon reference.

verdagon · on June 24, 2023

You win =D

saagarjha · on June 24, 2023

You didn’t delete it!

verdagon · on June 24, 2023

I know! I just couldn't bring myself to remove it it. It's such a cool note. I think I'm going to do these differently in the future lol

rdtsc · on June 23, 2023

In Rule 3:

      struct Ship { int fuel; };
      void print(Ship* ship) {
        cout << ship.fuel << endl;
      }

Should that be "ship->fuel" instead?

verdagon · on June 24, 2023

Fixed, thank you!

rdtsc · on June 24, 2023

Np. Thanks for the nice blog post!

latenightcoding · on June 23, 2023

“Rule 4: When you want a raw pointer as a field, use an index or an ID instead.”

literally just woke up but: wouldn’t it be simpler to use a pointer to a pointer, or am I missing something

corysama · on June 23, 2023

You might like: "Handles are the better pointers (2018) (floooh.github.io)"

https://news.ycombinator.com/item?id=36419739

winrid · on June 23, 2023

> "We'll instead take and return the vector directly"

Won't this clone it?

rbancroft · on June 23, 2023

Not necessarily, although it's a bit complicated to understand in C++.

Starting with C++17, there is a feature called guaranteed copy elision that works for many/most scenarios that you would want. You need to read through the following resources to understand it fully:

https://en.cppreference.com/w/cpp/language/copy_elision https://en.cppreference.com/w/cpp/language/value_category