A guide to better embedded C++

95014_refugee · on May 13, 2018

Almost nobody in their right mind would do this. (To be fair, many people working with hardware are not in their right mind anymore. Try it, you'll see.)

1: Don't use bitfields when interacting with registers. The language does not guarantee the behavior you want. Or even the behaviour you think you want.

2: 'volatile' is the contract that we (language users) have hammered out with the compiler industry over the last 20 years; it means "don't get smart with this memory ___location". Use it.

3: For any SoC of more than trivial complexity, the vendor will supply headers that describe the hardware (register names and addresses, field names, sizes, offsets, enumerations, etc.). You will hate their naming convention. You will hate the way they encode field sizes, offsets, masks, etc. Deal. Because the alternative is you, or your intern, making dozens of mistakes attempting to transcribe from the documentation. Here's a hint - the better vendors auto-generate these headers from the VHDL. They are often right even when the documentation isn't.

4: It's trivial to template most interactions with registers. See, for example: https://github.com/steffanw/laks

5: A little investment in your abstractions, even if the lower levels look a bit more complex or verbose, helps make your application logic easier to follow.

The discussion here about re-ordering is largely moot; if you're having issues with the core re-ordering your transactions, either your core is børked or you haven't mapped your peripheral space correctly. The former is pleasantly rare, the latter common enough that your compiler vendor is going to ask you about it before they give you a bug number.

stevenhuang · on May 13, 2018

Choosing between bitfields and masks for register twiddling was something I looked into deeply. Like the article, I wanted the readability of bitfields, but after reading about all the pitfalls and how loosely bitfields specs were defined in the C89 standard, I decided to go with masks instead.

Here was a great summary I found from a comment explaining the dangers of bitfields.

Seeing it spelled out like this made it clear which to choose, lest you're a fan of nasal demons :).

"The royal mistake is to use bit fields in the first place. The following is not specified by the standard:

– whether bit field int is treated as signed or unsigned int

– the bit order (lsb is where?)

– whether the bit field can reach past the memory alignment of the CPU or not

– the alignment of non-bit field members of the struct

– the memory alignment of bit fields (to the left or to the right?)

– the endianess of bit fields larger than one byte

– whether plain int values assigned to them are interpreted as signed or unsigned

– how bit fields are promoted implicitly by the integer promotions

– whether signed bit fields are one’s compliment or two’s compliment

– padding bytes

– padding bits

– values of padding bits.

– and so on."

Source: https://embeddedgurus.com/stack-overflow/2009/10/effective-c...

stinos · on May 13, 2018

So do I understand it correctly that you are saying the OP's code (or e.g. the one like in AceJohnny2) is not guaranteed to work correctly according to C89? Any idea about C99?

alxlaz · on May 13, 2018

Essentially yes. It's the same in C99, too.

There's a bit of nuance here though. The function that the author chose to reimplement provides the platform-specific implementation of a function that's actually specified as part of CMSIS (https://developer.arm.com/embedded/cmsis). Its portability across cores and MCUs is not much of an issue, and you're usually tied to the compilers in this sort of projects, too. So it's not necessarily a bad idea from a purely technical standpoint.

That being said:

- No one who writes embedded software for a living expects to see bitfields in their source tree. In fact, there are standards that this code would break (e.g. in CERT C: https://wiki.sei.cmu.edu/confluence/display/c/EXP11-C.+Do+no... ), so it wouldn't even get well through code review. It's a bad idea to surprise your coworkers :-).

- Implementation-defined behaviour can sometimes be surprising and non-uniform. More often than not, the distinction between "bug" and "implementation-specific" is very blurry, and the corner cases are very hard to catch. You may think you're OK because your code doesn't need to be portable and it works fine on your device, but then it turns out the memory alignment of bit fields varies with field size because of course it does.

- Precisely because of their infrequent use, the implementation of bit fields is one of the things that vendors often get wrong in early versions of a toolchain. At best, it's just a matter of inefficient access. At worst, it's bugs. And if you think that can't possibly go wrong, let me tell you about that time when an early version of msp430-gcc translated access to volatile variables into a NOP.

(Disclaimer: that was a long time ago, on an early port. The MSP430 port of GCC is actually very solid.)

So basically, just don't. Don't. There are well-known idioms for bit access. They are not readable as in "someone who knows English can read those programs just fine" but it takes like two minutes to understand those idioms. They are very readable, as in "anyone who's ever made a LED blink understands them". Yes, it's ___domain-specific knowledge, but you're already twiddling bits on tiny machines in a language that's not English. You're already full of ___domain-specific knowledge.

Edit: ah -- just to make sure this doesn't get interpreted the wrong way. IMHO, C++ in embedded systems is not a bad idea. I don't use it because I've never worked on a project where I needed it, and therefore I never got the chance to be proficient with it. But I've seen solid, reliable devices running (what, to my untrained eye, seemed like) very clean, debugable C++ code.

Also, "embedded" refers primarily to purpose, not platform capabilities. There are plenty of embedded platforms running on eight-core Xeons and state-of-the-art DSPs and whatnot. It's not all tiny MCUs with 4K of RAM and no MMU.

Gibbon1 · on May 13, 2018

My thoughts

Probably will work with a specific compiler. If it fails it'll likely fall on it's face fail. Which with embedded is the best type of failure.

Also you probably can force the behavior you need with pragma's (not portable)

Portability is usually completely unimportant in embedded.

C89 just don't.

C++ in embedded. Lord god no, if you want to do that use rust instead. Or embedded python or lua, anything but C++.

stinos · on May 13, 2018

C++ in embedded. Lord god no

Note so sure, maybe depends on what 'embedded' exactly means to you. I mean, years ago I was working on some TI C6x series DSPs and needed a circular buffer. Hadn't one in C, did have a templated one in C++. Didn't think twice: if the compiler would swallow it, and it did, why would I go through the hassle of doing the work again in C possibly beeding different implementations for different type or a bunch of rather ugly void* stuff? Everything worked out. So for that particual case, C++ in embedded, god yes, as it meant: less development time, reuse existing implementation both for float and int types, etc. And that's +10 years ago so I'd imagine there's now more compilers which would do it.

pjmlp · on May 13, 2018

> C++ in embedded. Lord god no, if you want to do that use rust instead. Or embedded python or lua, anything but C++.

First Rust needs to reach C++'s tooling maturity on embedded space.

When there is High Integrity Rust certification, compiler backends for all major CPUs, SDK support on OSes like INTEGRITY or Mbed, UI embedded tooling for IoT displays, maybe we can start seeing production code there.

Gibbon1 · on May 13, 2018

> First Rust needs to reach C++'s tooling maturity on embedded space.

Yeah and that's soon. Anyone starting an embedded project in C++ is making a grave mistake.

> Mbed

Those guys make a bad mistake and will pay for it.

pjmlp · on May 13, 2018

Touching C is an even bigger one.

It took C++ about 20 years to reach maturity in this space, with help of several OS and compiler vendors.

You might be willing to bet the house on Rust, most business aren't, not in the current state of its toolchain.

In 10 years from now, yeah I can clearly see it.

Gibbon1 · on May 14, 2018

C++ would have died except for three things, Java, the JRA and the web/Javascript.

Java is a partial replacement for C++ that isn't quite good enough for desk top applications. Hence C++ is still alive in that space.

The JRE was a anvil that language grabbed a hold of, which set them back about 10 years. because every language that depends on the JRE also suffers it's limitations.

And the web diverted resources away from native programming for 20 years.

However in the last 5-10 years we've seen a couple of well designed native compiled languages being developed. And those are going to replace C++.

And LLVM project and work done on KIT compilers provides a strong alternative to the JRE for language development.

Seriously, go is better than C++ now. And Rust is very rapidly getting there.

pjmlp · on May 14, 2018

When Java came into the scene, it was still a bit of a pain to write portable C++ code with the compilers catching up with the ongoing standard and indeed Java was a bit of fresh air in that regard.

However, C++ was going strong in all relevant desktop environments, OS/2, Windows and Mac OS. Even on commercial UNIX systems everyone was jumping into CORBA as the next big thing, and I surely wouldn't ever bothered to touch CORBA with plain C stubs.

Java had a lost opportunity with Sun being religiously against AOT compilation, an option only available via commercial JDKs, mostly for embedded deployment.

Graal and SubstrateVM are changing that, but it will still take a couple of years to actually change everything. Project Metropolis calls the Java on Java plan, the next 20 years.

Not only are those web browsers written in C++, throughout those 20 years C++ has been present on web pages via plugins and now WebAssembly.

We cannot have LLVM and kill C++ as well, unless you are planning to rewrite into something else.

Go is a better C, but definitely not something worthwhile against C++.

Using Git urls as package names and writing generic code as if using Borland C++ 2.0 for MS-DOS with BIDS pre-processor macros is definitely not something enjoyable.

Rust is improving at a good pace, but until it cuts its dependency on LLVM it will always depend on C++ tooling.

Love the language, but suggesting that it is at the same tooling level as C++ to business using it for the last 20 years is just irresponsible.

There are lots of domains, even on embedded where Rust eco-system just isn't there nor does it seamlessly integrate with existing workflows.

Using a programming language is much more than just installing a command line compiler and hacking away on a text editor.

It took C++ almost 20 year to replace C in many high performance domains, and for better or worse, it is mostly copy-paste compatible with C89.

Any language that wants its place needs to account for similar uphill battle in all domains.

v_lisivka · on May 13, 2018

I use Rust for embedded right now. Part of project is in C, part is in C++, part is in Rust. NRF52 and i.MX6. Can you point me to the problem with Rust or toolchain, which I missed, so I will be aware? Thank you in advance.

pjmlp · on May 13, 2018

Some ideas:

- LLVM backend is limited regarding the supported CPU architectures;

- Rust compiler is not certified for high integrity domains, where human lives might be endangered;

- Features like SIMD are still nightly

- Lack of integration with closed source toolchains from OEMs for mixed debugging and other development workflows, e.g. Keil, Microchip

These issues might be complete irrelevant, or showstoppers, depending on the company.

steveklabnik · on May 13, 2018

SIMD is stable in five weeks, incidentally.

pjmlp · on May 14, 2018

From what I understood it is just phase I from SIMD support, right?

steveklabnik · on May 14, 2018

Yes, but also the most foundational part. There’s already high level libraries written on top.

pjmlp · on May 14, 2018

Thanks, will have a look into it.

steveklabnik · on May 14, 2018

Specifically what I'm referring to is https://doc.rust-lang.org/beta/std/arch/ (last two modules are stable, note this is beta docs for the next release)

Next step is https://doc.rust-lang.org/beta/std/simd/index.html but you can use https://crates.io/crates/stdsimd in the meantime to try it out, as they're the same.

Stuff like https://crates.io/crates/faster is already built on top of this and is just waiting for the initial intrinsic stabilization, too.

InitialLastName · on May 13, 2018

Curiosity: What's the issue with Mbed? I've toyed with it; it seems nice enough, and is good for getting working code close to the metal without much development time. I'm very interested in what pitfalls come from using it.

minipci1321 · on May 13, 2018

> Here's a hint - the better vendors auto-generate these headers from the VHDL.

Yes! And the hateful naming might come straight from the RTL where digital designers have their own notions about what is a pretty name ;-)

> They are often right even when the documentation isn't.

More so -- if you decide for changing these names, one day debugging a complex issue will require talking to RTL people and everybody will be instantly lost -- your names won't be meaning much to them.

nitrogen · on May 13, 2018

1: Don't use bitfields when interacting with registers. The language does not guarantee the behavior you want. Or even the behaviour you think you want.

To back this up: it's important to remember that reads and writes to memory-mapped registers have meaning independent from simple memory access.

I once was writing an interrupt handler for an MCU. There was a bitfield register that indicated which interrupts had fired on read, and acknowledged/cleared an interrupt on write (IIRC). I was reading the register, clearing the interrupt I'd just handled, then writing it again. So I might read 0x84, mask out 0x80, then write 0x04.

Or so I thought. Turns out what the processor actually wanted was a direct write. You'd just write 0x80 to acknowledge that interrupt, not read/mask/write. And reading had a different effect on the MCU too, as I recall. So I was only getting interrupts sporadically and my code didn't work.

Now add compiler bitfields. You have to be 100% certain that the implementation behaves the way the CPU wants (no extra reads/writes, expected bits set/cleared), and that it won't change from version to version.

Instead it's common to use macros to deal with registers. Not quite as pretty as bitfields, but much more predictable.

speedplane · on May 13, 2018

out of curiosity, do people now use exceptions when programming on embedded? It's been 10 years years since I've really programmed embedded systems, but at the time exceptions were a big no-no. It was often unclear how an exception would mess with the stack, how far up things would go, and what information was (and its size) being passed up.

Instead, we just checked return values of everything.

proverbialbunny · on May 13, 2018

Exceptions are still a no-no for code that needs to be real time due to it's non-deterministic nature. The feature can be disabled with the -fno-exceptions flag.

However, this might change in C++23. Herb Sutter just proposed C++ switch to deterministic exceptions. The paper is quite good: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p070...

It looks popular, so there is a high chance it will become a thing allowing stl to work in the embedded world.

JoeAltmaier · on May 14, 2018

Remember that embedded programming is often deployed on a particular machine using particular tools. So even with all the uncertainty, a little testing (looking at generated code) and you can get it to work. And it should stay working, since the code isn't going anywhere (different machine/tools).

There's still the downside of course - looking at the code to 'prove' it is troublesome. And often the code generated for bitfields is ugly. You can likely generate better code by hand.

So the single remaining advantage is, the C looks pretty clear. There's the register; there are its bits by name; assuming that's all been checked to work right, the coding (and code reading) can proceed without any more concern over bits.

orbifold · on May 13, 2018

The example in point 4. does almost no bit level manipulation and when it does, it does so with locally hard coded constants and shifts, which is probably one of the worst ways of dealing with this.

AceJohnny2 · on May 13, 2018

> Embedded is a wonderful versatile world which allows developers to create various interesting everyday devices (in collaboration with the hardware team).

ROFL!

I've been working in embedded for over a decade, and it's always quagmire of buggy hardware patched over with C code, half of whose programmers have actually transitioned over from hardware design and don't know what they're doing.

With that in mind, I'm going to assume the author was sarcastic.

I love the field, for getting to work deep inside and very early on bleeding-edge hardware, but by god most days feel like being in the coal mine of computer systems.

keithnz · on May 13, 2018

I've done embedded stuff for 25+ years. I find that it's always far better to stick with simple C code. Because it's the native language of the embedded world. It has a lot of good advice on how to write robust code. If needed, layer something like lua on top, but for the most, stick with simple C and proven embedded techniques. The article mentions a HAL. That's definitely a must. But don't get too fancy with trying to do that. Do I keep wanting to use something else? yes. Am I fan of C? No. Is my embedded device really just a scaled down PC with an OS and a MMU and good amounts of memory? use what ever language you want.

Rust is promising, but will never be a good option on low level devices.

I have seen some nice stuff done with Forth

Narishma · on May 13, 2018

> Rust is promising, but will never be a good option on low level devices.

Why not?

kabdib · on May 13, 2018

The last low-level device I shipped code on, we had 2K of program space and about 1K of RAM. Started writing in vanilla C, crunching pieces into hand-written assembly and rewriting fat code when space got tight. 6 bytes free in the final code. I might be going out on a limb when I claim that Rust won't touch this kind of system . . . but probably not.

The system before that was largely just a USB driver, an event loop and some interrupt handlers, supporting a large amount of ported-over and mostly already existing stuff in C that did the interesting and product-defining algorithmic heavy lifting. I guess that we could have written the infrastructure in Rust, but it's hard to see the win in using two languages (the algorithmic stuff was not going to get a rewrite).

Rust isn't great for other reasons, too. Finding good embedded systems programmers is hard enough; require something bleeding edge like Rust and the population of candidates plummets to "well, we might be able to find and hire one person this year" territory.

steveklabnik · on May 13, 2018

Rust can meet those requirements, though there may be other ones that you didn’t mention it that would disqualify them. With similar techniques as the C, the smallest Rust executable ever produced was 151 bytes.

The CPU architecture is more likely a problem. And, as you say, the stuff that’s not purely a technical requirement. We’ll get there!

keithnz · on May 13, 2018

smallest device I program in C has 20 bytes of RAM

steveklabnik · on May 13, 2018

What device is that, and what do you do with it? Just curious.

keithnz · on May 14, 2018

sorry, the device itself has 25 bytes :) But I was left with 20 for program logic.

http://ww1.microchip.com/downloads/en/DeviceDoc/41236D.pdf

PIC12F508.

It did a number of things. It had logic for controlling power on a low power device, coordinating between a number of other micros. It also acted as a last ditch watchdog system that triggered a recovery of the device

steveklabnik · on May 14, 2018

Thanks, neat!

pjmlp · on May 15, 2018

Those PICs can also be targeted with Basic and Pascal, so that you get an idea of the competition. :)

https://www.mikroe.com/mikropascal-pic

https://www.mikroe.com/mikrobasic-pic

AceJohnny2 · on May 13, 2018

> Unfortunately, a lot of embedded low-level libraries deal with registers this way:

    *(volatile std::uint32_t*)reg_name = val;

Who still writes code like this!? (maybe the ST folks responsible for the CubeMX BSP for STM32. Ugh...) Most places have finally moved on to using structs and bitfields. GCC and Clang support them well and consistently on the various architectures I've worked on (MIPS, some weird DSP, and a slew of ARM variations).

Some proper C code will define the 32-bit register like this (usually in a header provided by the hardware vendor, generated from the HDL):

    typedef union some_reg {
        uint32_t raw;
        struct {
            uint32_t bits1:4;
            uint32_t bits2:3;
            uint32_t rsvd:5;
            uint32_t bits2:20;
        }
    } some_reg_t;

Then use it either like:

    some_reg_t reg;
    reg->raw = READ_REG(reg_addr);
    reg->bits1 = 4;
    WRITE_REG(reg_addr, reg->raw);

Or:

    volatile some_reg_t *reg = (some_reg_t*)reg_addr;
    reg->bits1 = 4;

The latter assumes the register address is mappable, and that programmers are aware that writing the field performs a Read-Modify-Write of the whole register. Programmers being fallible, it's a more risky approach that I don't see often, despite how concise it is.

kbumsik · on May 13, 2018

> Who still writes code like this!?

Believe or not, a majority number of ARM MCU vendors does that with CMSIS headers in my experience (ST, TI, NXP, Nordic, EFM32...). To be more precise, the registers are accessed using mask and shift macros. I've work on many ARMs too but only Atmel's header files use struct bitfield you mentioned.

But yeah, I've also seen that most of non-ARM vendors use struct bitfield.

AceJohnny2 · on May 13, 2018

Hm, is armcc still a thing? I don't remember what it guaranteed (or not) for bitfields. If it doesn't work consistently, that would explain that they're using masks and shifts.

That, and because the C standard leaves it implementation-defined.

monocasa · on May 13, 2018

The ARM ABI defines a lot of useful semantics for bitfields (particularly when combined with volatile), but GCC perennially regresses and breaks them so next to no one uses them.

Khoth · on May 13, 2018

Doesn't the latter also risk the compiler deciding to emit a store of less than a whole word and the hardware then ignoring it or throwing a hissy fit in some unexpected way?

jhack · on May 13, 2018

The author says setting a register like this is hard to read and maintain:

    *(volatile std::uint32_t*)reg_name = val;

And the author's solution is to replace that one line with this:

    struct DeviceSetup {
        enum class TableType : std::uint32_t {
            inphase = 0,
            quadrature,
            table
        };

        std::uint32_t input_source : 8;
        TableType table_type : 4;
        std::uint32_t reserved : 20;
    };

    volatile auto device_registers_ptr = reinterpret_cast<DeviceSetup*>(DeviceControlAddress);

I'm not really seeing the readability and maintainability advantages here.

viraptor · on May 13, 2018

For the first case, you need to know how the value is split into bits, what does each part of the value mean, and what do the values of reach part mean.

For the second case, most of it goes away. You still need to know exactly how the value you set is interpreted, but you get a readable description of each part and you don't have to think of bit shifting every time they're set.

The declaration is longer of course. But what would you prefer to see at usage time:

    MREG42 = MREG42 & table_type_mask | inphase_val

Or:

    device_setup_registers->table_type = TableType.inphase

liquidify · on May 13, 2018

Can you explain what he means with the reinterpret cast section? I don't understand what he is storing into the auto variables.

viraptor · on May 13, 2018

Pointers. I don't understand why either :-( I think it doesn't map well to the original example.

kbwt · on May 13, 2018

> And the author's solution is to replace that one line with this:

Which does not do what the author wants it to do.

    volatile auto device_registers_ptr = reinterpret_cast<DeviceSetup*>(DeviceControlAddress);

'decltype(device_registers_ptr)' yields 'DeviceSetup* volatile', when clearly 'DeviceSetup volatile*' was intended. Insert some witty rant about programmers being attracted to complexity here.

tropo · on May 13, 2018

Both methods are buggy, but the second is also needlessly complicated.

The first possible bug is aliasing. The type of reg_name and DeviceControlAddress are not given, so it is possible there isn't a bug.

The second bug relates to memory ordering. Adding "volatile" will tell the compiler to do things in order, but the compiler will not pass that requirement on to the CPU. The CPU itself may reorder things. I saw this affect an embedded system back in 1998, and the problem has only become more common in the 2 decades since. Generally you will need assembly code to avoid the bug.

magila · on May 13, 2018

The re-ordering you were seeing may have been due to a lack of sequence points between the ordered operations. For example: If you have two 32 bit registers which correspond to a single 64 bit value and which must be read in a specific order to ensure consistency, then you cannot write (code simplified for clarity):

  int64_t val = (*reg0 << 32) | *reg1;

The compiler is allowed to re-order the register reads even if they are volatile pointers. To guarantee a specific order you need something like:

  int64_t val = reg0 << 32;
  val |= reg1;

Sequence points are one of those dark corners of C/C++ which most people don't know about because they rarely matter unless you are dealing directly with hardware.

burfog · on May 13, 2018

No, a problem with sequence points is at the compiler level, just the same as "volatile". Assume the compiler doesn't reorder anything. Maybe you turn the optimization off. You even inspect the assembly code, and all the operations are there in the correct order.

You can still hit the problem. In my case, the code ran fine until we got an upgraded CPU. (from MPC6xx series to MPC74xx series)

Suppose you store to registers at 0xf0000ffc, then 0xf0000104, then 0xf0000ff8. You need the stores to happen in that order. The instructions execute in that order, creating 32-bit chunks of data headed out toward the memory bus. There are multiple write buffers however, so they can go in parallel, each getting a distinct write buffer. They then head out onto the memory bus in some randomish order determined by timing issues internal to the CPU.

In my case, I had to add "eieio" instructions between each pair of accesses for which ordering mattered. FYI, that is a real instruction, supposedly meaning "Enforce In-Order Execution of I/O".

comex · on May 13, 2018

I found this surprising so I looked it up. According to the MPC7410 user manual[1], assuming the register bank is mapped as caching-inhibited, a sequence of stores is required to take effect in program order without needing `eieio` between them. However, a store followed by a load to a different address can be subject to reordering and does need `eieio`.

The ARM architecture does this more sanely. Page table entries have a flag that lets you choose between regular "Device" memory or "Strongly-ordered" memory; the latter performs all memory accesses in order without needing any synchronization instructions, and is more convenient in simple situations.

[1] https://www.nxp.com/docs/en/reference-manual/MPC7410UM.pdf - Table 3-8

vvanders · on May 13, 2018

x86 + win32 also forces a bunch of strict ordering(esp w/ volatile) that doesn't hold true on lots of other platforms. If you're developing code for testing on desktop then deploying to a target platform it's very easy to get bit by nasty concurrency bugs in a variety of ways.

elcritch · on May 13, 2018

Wouldn't agree they're both buggy, assuming the aliasing isn't an issue given the upstream types. The author mentions the potential of hardware re-ordering in TFA. C++11 atomics are suggested though not sure that'd deal with the issue unless the compilers utilize a bulk memory copy operation.

The proposed alternative of using a struct appears more confusing at first, but it's really helpful when referring back to the register layouts. TI makes pretty good usage of the method for the co-processors in the BeagleBones [1]. Generally I think it's nicer to have a `settings->control_bit = 1` than `*(base_prt + control_bit_offset) = 1`. Though the whole `reinterpret_cast` and volatile stuff in C++ confuses me everytime. C is _much_ easier but less type safe when dealing with volatile.

1: http://processors.wiki.ti.com/index.php/PRU-ICSS_Header_File...

sigjuice · on May 13, 2018

Using bitfields to represent register layouts is completely broken. The layout of bitfields is implementation defined.

viraptor · on May 13, 2018

Since you're (almost always) using a single, well defined toolchain for embedded development, does "implementation defined" mean "completely broken"? Your register and ports are implementation defined in the first place.

pjmlp · on May 13, 2018

Means that even the compiler vendor is free to switch the order when you upgrade to a new toolchain version.

viraptor · on May 13, 2018

> The CPU itself may reorder things.

> Generally you will need assembly code to avoid the bug.

What do you mean by this? Compiler outputs binary code the same way assembler does. If the CPU reorders your instructions, it will happen in both cases.

If there are some specific instructions which create memory/reordering barriers, they can be expressed in higher level languages as well.

burfog · on May 13, 2018

You can not express the needed instructions in standard C or C++ code. Some compilers may have intrinsics. The inline assembly for gcc on PowerPC might be:

__asm__ __volatile__("eieio":::"memory");

Normally the CPU will freely reorder loads and stores that go to different addresses. It is not necessarily reordering instructions; the reordering happens on the way to the memory bus.

That special instruction helps. It will ensure that previous accesses to IO memory will be done before any that follow. It also has a similar effect in RAM, but sadly not between RAM and IO memory. For that, which might be needed for a DMA engine, you'll also need the "sync" instruction.

viraptor · on May 13, 2018

Sorry, autocorrect, can't fix it anymore. I wanted to write exposed, not expressed. And yes, either via inline ASM or intrinsics.

BenFrantzDale · on May 13, 2018

“You can not express the needed instructions in standard C or C++ code.“

As of 2011, you can. In C++ it’s with std::atomic. http://en.cppreference.com/w/cpp/atomic/atomic C11 offers comparable tools, I believe.

burfog · on May 14, 2018

No, that is only good enough for threads. The compiler is not required to insert special instructions like "eieio" and is even free to optimize things in ways that would cause misbehavior of hardware.

For example, consider two values written to the same ___location. The compiler is free to optimize out the first one because no valid threaded program can depend upon seeing that value briefly appear.

BenFrantzDale · on May 14, 2018

If I understand you and the standard correctly, no: the default behavior is `memory_order_seq_cst` "On weakly-ordered systems (ARM, Itanium, PowerPC), special CPU load or memory fence instructions have to be used." http://en.cppreference.com/w/cpp/atomic/memory_order

I'm not 100% sure what you mean by your second paragraph. I think you are saying that repeated writes to a `std::atomic` can be optimized out, which sounds like it is true since `std::atomic` is not `volatile`. https://stackoverflow.com/questions/45960387/why-dont-compil...

monocasa · on May 13, 2018

A lot of times they aren't barriers, but metadata on the address map. ARM sticks it in the page tables, and x86 sticks it in the MTRRs.

PowerPC does treat it like a memory barrier instruction (eieio - Enforce Inorder Execution of I/O), but that's a stronger guarantee than you'd want for all volatile accesses.

tcbawo · on May 13, 2018

I was surprised not to see anything about RAII techniques (http://en.cppreference.com/w/cpp/language/raii). Tightly-managing resources via object lifecycle is probably one of the killer features that C++ brings to embedded.

taneq · on May 13, 2018

Often in embedded systems, it's actually less of an issue, because you don't need to "acquire" resources - you have the run of the hardware and you're not sharing any devices with any other code.

Likewise, you don't need to keep memory free for other applications to use, so there's a strong tendency to just allocate everything statically. It's simpler and more reliable.

monocasa · on May 13, 2018

No, RAII is great on embedded systems with static resource allocation.

Grabbing realtime priority or disabling interrupts with a lock_guard so you can't forget to release is a god send.

taneq · on May 13, 2018

True, there are a few cases where it's useful, and it's great for those.

stevenhuang · on May 13, 2018

In a lot of embedded systems, the penalties associated with RAII and other runtime goodies like vtables are often prohibitive, so they're commonly disabled. It's a factor in why C continues to be the lingua franca instead of C++ in the embedded space--it's simpler and still "good enough" not to warrant a switch-over to C++ (in addition to many other reasons).

The compile-time features of C++ are a great reason to switch over though, and for devices with greater resources, C++ becomes an even better fit.

jcelerier · on May 13, 2018

> the penalties associated with RAII and

.... there are 0 penalties associated with RAII.

    struct my_interrupt {
      my_interrupt() { vendor_specific_horrible_stuff(0x237ABC); }
     ~my_interrupt() { if(whatever) vendor_specific_horrible_stuff(0x237BAC); }
    };

    void f() {
      my_interrupt is_enabled;
      // do stuff
    }

will translate to exactly the same assembly than

    void f() {
      vendor_specific_horrible_stuff(0x237ABC);
      // do stuff
      if(whatever) vendor_specific_horrible_stuff(0x237BAC);
    }

monocasa · on May 13, 2018

Nobody disables RAII or vtables ( and I don't know how you would 'disable' those).

We do disable RTTI and exceptions though.

stevenhuang · on May 13, 2018

You're right, I was thinking about RTTI there, and wrong phrasing for vtables (should have said to avoid the virtual keyword and dynamic binding).

monocasa · on May 14, 2018

I mean, even virtual dispatch is fine if you opt in to it. Pointer indirection costs way less relatively on embedded systems than big boy systems with DRAM and caches.

proverbialbunny · on May 13, 2018

Speed is not the issue for embedded development.

The problem with C++ is it has a few language features that are not deterministic making standard C++ not ideal for real time code. Most of the stl uses these features, so most vanilla C++ is disabled for many embedded environments.

Also, there is a push for C++23 to have deterministic exception handling. If this passes (and it likely will) most of the stl will become available for embedded, and when that time comes C++ is going to become far more of a viable option than it currently is.

If you're curious you can read about it here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p070...

aseipp · on May 13, 2018

Nobody who uses C++ for embedded development uses or cares about the STL though, and its absence isn't exactly surprising for a number of reasons...

The selling point is features like RAII and templates, which can greatly reduce duplication and improve readability. Sure, you technically can't use the STL for whatever reasons, but practically it doesn't actually matter at all, and its absence comes with the territory for everyone involved.

Being able to use more of the STL in contexts like this would be nice for future projects, perhaps.

BenFrantzDale · on May 15, 2018

Keep in mind, the STL isn't limited to the containers, there are tons of algorithms that are very useful for embedded programming that don't throw exceptions, are vetted, and are as efficient as hand-written algorithms.

monocasa · on May 13, 2018

Exceptions aren't what's keeping people from using the still in embedded context's, it's the dynamic memory management.

proverbialbunny · on May 14, 2018

Dynamic memory management? What part of C++ do you mean? I'm unfamiliar. Surely not unique_ptr, I would think. C allocates on the heap too, so I must be misunderstanding.

monocasa · on May 14, 2018

Pretty much all heap allocation outside of init time is verboten in embedded contexts, whether C or C++. So yes, the regular unique_ptr, vector, some lambdas that heap allocated their captured, etc. You're not allowed to use malloc in C either.

Check out MISRA C, and the JSF C++ Standard for examples of embedded code standards.

stevenhuang · on May 13, 2018

Agree with what you said, and that's a very interesting paper--thanks for the link!

Gibbon1 · on May 13, 2018

> and for devices with greater resources, C++ becomes an even better fit.

It doesn't take having much more resources before switching away from C++ is a vastly better option though.

tripletao · on May 13, 2018

So maybe everyone but me knows this, but what's the behavior of bitfields on the bus? Like, if r.x is the bottom 6 bits of a 32-bit register, then does the compiler generate an 8-bit access or a 32-bit access?

What if it's bits 13 through 18? An unaligned 16-bit access? Two 8-bit? A 32-bit? Where is this specified?

Depending how your peripheral is implemented, this absolutely can matter. Nothing stops you from building devices where the low byte of

   *((uint32_t *)x)

doesn't work like

   *((uint8_t *)x)

and people do. I always just code the 32-bit access and pull the bits out myself, because I know what will happen.

This is even before considering stuff like a GPIO output implemented with two write-only registers, where 0 leaves the bit unchanged and 1 sets/clears it. The result of layering a bitfield on that does not seem intuitive to me.

The article also says that you should build abstraction layers for your peripherals, instead of making all your code write to registers everywhere. That seems (a) pretty well-settled as true, and (b) entirely independent of whether you do the register accesses with bitfields or explicit shifts and masks.

JoachimSchipper · on May 13, 2018

Yes, what bitfields do on the bus is very undefined.

Admittedly, a compiler is free to implement a dereference of uint32_t by loading four separate bytes, too, hence the imprecise "very undefined" in the first line...

tripletao · on May 13, 2018

I just tried this with IAR for Cortex-M3. I didn't check thoroughly, but so far it looks like right-aligned bitfields of eight bits or less generate an LDRB (8-bit). All other bitfields generate an LDR (32-bit), even if they lie entirely within a byte.

I see the makings of a fun trick to play on a colleague. I don't see anything I'd be too inclined to use in production code.

orbifold · on May 13, 2018

This is a problem I encountered just recently, even if your bus might in principle support byte wise access to memory, it might be the case that one of the custom devices on that bus simply ignore that feature, leading to surprising and unpredictable results.

mschwaig · on May 13, 2018

I picked C over C++ in an quasi-embedded environment given we would have a team with highly variable C/C++ skills.

I was worried people would get too fancy with C++. Probably it would have been fine if we would have taken away most of the language.

It turns out the problem with C in that setting was that good C code seems to require a lot of ceremony and conventions for things like error handling and modularity.

I hope that Zig (https://ziglang.org) will become a viable alternative in that space. It's quite a small language, more comparable to C than to C++ or Rust. It's meant to result in similar LLVM bitcode as well-written C, while being highly compatible with C and and giving you some extra safety guarantess over C as well.

youdontknowtho · on May 13, 2018

Help me out... Why not use a C or Algol derivative syntax?

mschwaig · on May 13, 2018

I think the syntax is not too far off from C.

I am not affiliated with the project, only supporing it on Patreon, so I cannot speak about the designer's intentions with Zig specifically.

Generally in language design, if something works the same way as in other languages, that's great. You can re-use the keywords/syntax and build on peoples prior knowledge. However if you have some feature that actually works quite a bit differently from other languages the syntax should probably be different. Otherwise people might feel at home right away on a superficial level, but actually get bitten by the surprising ways in which the new thing works differently to the one they are used to later.

yason · on May 13, 2018

The article could do without any references to C++. Surely writing long, imperative lists of register writes makes the reader's mind boggle. C has more than enough tools to encode all that into something that is maintainable and readable. Because if you just take C++ to implement some handy, convenient constructs you also invite the closet full of demons that come with the language.

And quite frankly the abstractions given in the article are quite trivial. Any decent programmer would create similar abstractions as a part of normal course of work because decent programmers know they are forgetful and they want to spend their modest number of brain cycles on the higher levels of thinking instead.

billforsternz · on May 13, 2018

Superb comment. Good programming is about building effective layers of abstraction. Bit twiddling is fine, but encapsulate it. Mass setting of registers should be presented as a readable table (possibly an instantiated structure) with no special executable code in sight.

bigcheesegs · on May 13, 2018

> N.B. Accessing registers modified by the hardware may be treated as a multi-threaded application. Therefore, it is worth considering using std::atomic<T> instead of volatile T.

This is incorrect. Compilers are allowed to reorder and combine atomics as long as they follow the rules. See: https://github.com/jfbastien/no-sane-compiler

tripletao · on May 13, 2018

If I understand correctly, that makes the article grossly wrong. Like, if you have a serial port transmit register and execute

   SERIAL_TX_DATA = 0x01;
   SERIAL_TX_DATA = 0x02;

to enqueue two bytes to transmit, then the compiler is allowed to skip the first write entirely.

Gibbon1 · on May 13, 2018

A naive implementation of the SERIAL_TX_DATA macro will totally give you behavior like that.

tripletao · on May 13, 2018

   #define SERIAL_TX_DATA (*((volatile uint32_t *)<addr>))

is pretty naive, and correct. The article's advice seems not to be.

Gibbon1 · on May 13, 2018

Friend smarter than me ran into a bug where when DMA was active on a multi-core uP you needed to do a read op between two writes to a IO address or the first write wouldn't happen.

I think on an ARM processor if it's normal memory you can't guarantee ordering. But ARM supports a device memory type which disables caching and reordering memory accesses by the hardware.

tripletao · on May 13, 2018

I think ARM's "strongly-ordered memory" concept is supposed to solve this? Was your friend's problem intended behavior, or a silicon bug?

http://infocenter.arm.com/help/topic/com.arm.doc.dai0321a/DA...

Gibbon1 · on May 14, 2018

I think the DMA accesses were confusing the memory controller. He said that sort of thing was common on internally developed silicon. They won't do a spin if there is a work around. Also my experience as well. I've had to deal with peripherals with an async clock where access timing becomes important.

Observation: CS appears to teach students that code with side effects is evil. AKA hide you side effects behind os calls. With embedded code side effects are important.

berti · on May 13, 2018

Nothing in the presented article that can't be done just as (or more) easily in C. The article really just presents the absolute basics from an intro embedded systems class.

minipci1321 · on May 13, 2018

yes, I somehow expected to see something like Kvasir:

https://github.com/kvasir-io/Kvasir

Khoth · on May 13, 2018

I'm not a fan of anything that writes to a register just by assigning a value to some variable.

I much prefer having some kind of READ_REG/WRITE_REG inline function (or macro at a pinch). Sooner or later you're going to find yourself wanting to log all register accesses, or run the firmware against a model of the hardware, or something, and when you do it's a great help to have to handle it in only one place.

youdontknowtho · on May 13, 2018

It would be great if everyone who knows better than the author would write articles instead of just critiquing them. I'm not trying to be facetious. It would be awesome if more of the advanced developers here would write things like this and share their knowledge.