This takes me back to the late 90s when I was a teen and decided to try to implement preemptive multi-threading in Turbo Pascal. I'd gotten a book on assembly so was somewhat versed in that, and took on the challenge.
Having to save and restore the registers was obvious, but as I recall it the main challenge was figuring out the right sequence to do that while also preserving the CPU flags.
I only used the timer interrupt to switch between threads in a round-robin way, so interactivity wasn't the best, but it worked. Was quite pleased with myself for accomplishing that.
I think it was a pretty decent challenge, and one that was quite instructive. Not just did you need to know some details about the various CPU instructions (like which ones affected flags), but it was also a kind of a puzzle to arrange it all the right way.
Ha, I did exactly the same in exactly same language but that was the end of 80s. Interactivity was just fine as I've increased frequency of timer interrupt to 4096 times / s and would pass control to standard handler every so often so it would work as standard at about 18 times/s. When not calling old interrupt I would execute threads switch and control logic. Turbo Pascal supported built in assembly so writing all low level stuff was piece of cake. I also used built in assembly to implement graphics.
After learning about effect systems, and generalising effects, my view on `setjmp` has changed considerably -- it seems effect systems "effectively" offer a design pattern for their use in languages without them.
ie., it feels that there's an async-await.h, try-catch.h, etc. to be written which would serve as c-ish design patterns. I'd be interested, then, in to what degree other langs can do the same.
As a side point, i've not yet read a good defence of why programming in C is so fun and satisfying in this way. People reduce the Rust/C issue to memory saftey... but isn't there something inherently wonderful about `while(*this++ = *that++)` (etc. etc.) ?
(A sense of fun beaten out of you by too many annotation guaranteeing its saftey?)
> inherently wonderful about `while(*this++ = *that++)`
As someone who hasn't used C as a primary language in some project for over a decade, I read this and (1) realize that LHS and RHS both post-increment, (2) don't remember if there is some UB I might be overlooking, (3) realize that the operator is assignment "=", not comparison "==", and I can't even remember what the loop termination criterion here would be. Until "*that++" is equivalent to false, or something?
It may be beautiful to the experienced programmer, but I personally would consider this just "clever" (which is a criticism, not a compliment). It feels like someone needlessly tried to pack everything into a single line of code.
This is something I'd write out more verbosely, if only to make reading it simpler. The compiler will probably generate the same machine code either way.
I'm fully aware that to the seasoned C developer, my criticism might come across as naive. However, even the fully seasoned C developer can get careless, or become tired, and in C, every one of those little things can come back and bite you in those situations.
Edit: removed the double pointer dereferencing remark, must have been an artifact from HN's special treatment of the asterisk.
Edit-Edit: I was probably wrong. I don't think it's possible to create a more verbose version without it affecting performance.
It's entirely valid C and (assuming this and that are byte pointers) copies a range of bytes until (and including) a zero byte is reached.
With a suffficient warning level (e.g. -Wall on gcc, which should always be enabled anyway, together with -Wextra), compilers will complain about the '=' and ask you to add a pair of braces to make clear that this is actually intended:
while( (*this++ = *that++) );
It's also one of those cases where the C code matches the output assembly pretty well:
As far as "obfuscated C" goes, this is a very tame example though, it's just a straightforward usage of language features, which might look strange only when coming from other languages that don't have pointers or a post-increment operator).
That extra pair of braces doesn't make the code 'ugly' ;)
And the code without braces is still entirely valid standard C, the warning is essentially just a lint to protect against typos (similar to JS linters warning about '===' vs '==').
PS: let's see if the alternatives would be any more readable:
char c;
while (c = *that++) {
*this++ = c;
}
...this is already buggy because it doesn't copy the final zero byte, so the test must happen inside the loop body and also lets try to get rid of the post-increment:
while (true) {
char c = *that;
*this = c;
this += 1;
that += 1;
if (c == 0) {
break;
}
}
...hmm not really any more readable...
Let's try with an index...
while (true) {
char c = that[i];
this[i] = c;
i += 1;
if (c == 0) {
break;
}
}
...might be a bit easier to grasp when used to other languages, but readability hasn't improved all that much I'd say...
For reference, MUSL also just uses the original approach:
I don’t write much C, but to an outsider like me this is a pretty big improvement.
It is a shame post-test loops aren’t more popular, given the similarity to the assembly they output. Seems more mechanically sympathetic. Oh well, at least it is an excuse to whip out the goto.
I find it crazy that you improve the readability so much and say readability hasn't improved that much.
The order things happen in *this++ is not obvious unless you know a bunch of C-specific rules, while the ordering of multiple statements is obvious even to someone who doesn't know C. Perhaps C programmers should find this obvious, but it seems to me more like cognitive overhead which has a non zero chance of confusing someone at some point.
That's almost a philosophical question ;) Should code in a specific language be more readable to programmers familiar to that language or to programmers who are not familiar?
E.g. I guess for a mathematician, all imperative languages are probably 'weird', while something like Haskell feels more familiar?
I was unclear, sorry: I didn't mean to say that the extra braces make it uglier, I meant to point out that something that was described as beautiful was actually flawed.
The flaw was minor in this case because the identifier names and lack of body make the intention clear, but my point is that there are a lot of minor things in C that can come and bite you at any time.
Edit: You are right, I don't see a way this could have been implemented more readable without sacrificing some performance.
I've added a couple of examples trying to find a more readable version, which actually isn't trivial. Sorry for the 'post-edit' :)
As for performance: I don't think such details matter much, first, compilers are pretty good to turn "readable but inefficient" code into the same optimal output (aka "zero cost abstraction").
And a really performance-oriented strcpy() wouldn't simply copy byte by byte anyway, but try to move data in bigger chunks (like 64-bit or even SIMD registers). Whether this is then actually faster also depends on the CPU though.
`this` and `that` are arrows which range over a stream of data; `=` is copy, and `++` moves the arrow along the stream.
This isn't a "clever one-liner" it is a clear and precise syntax for expressing the operation the machine actually performs.
while(copy(current(stream_a), current(stream_b)) and not end_of_stream(stream_a))
You might prefer the above, but then, that's every other major language. The beauty of C is that the above code has to compile to something like the C version. C just allows you to actually express it
GCC warnings can be overly pedantic. It's setup to warn about common footguns but doesn't know what your intent is. In this case it's a common enough idiom to assign within a control statement that GCC has the extra parens escape hatch.
You shouldn't just blindly let your tooling dictate how you work. It's a tool that's supposed to work for you, not control you. -Wall and -Wextra are good baselines but I always disable some of their warnings because I don't need the hassle on known good code.
Could we try to keep the topic on the article itself instead of complaints about C? It sucks to come and read the comments about this very nice article and have to scroll and scroll until I finally get to comments written by people who actually have something to say. This is a great blog, and the author puts a ton of effort into their posts. It’s hard for me not to view comments like this as being a bit thoughtless and inconsiderate.
Since we're already way off-topic, allow me to share my idea for solving this perennial problem: When commenting, there is a little selector:
[ ] My comment is on-topic and positive to neutral
[ ] My comment is critical
[ ] My comment is off-topic
You've got to select one. When viewing, the comment thread defaults to just showing on-topic, non-negative comments, but you can see the other stuff, too, if you want.
This solves two seemingly contradictory desires: the ability to read comments on things that interest you without having to fend off waves of negativity and wade through pools of offtopic text and the ability to speak freely.
That's not really how comment threads work. If you reply to a specific comment, you're replying to the stuff they said in that comment. The guy's not even complaining about c either, he's commenting on the bit of c code that the parent commenter wrote
These days the 'fashionable' way to implement async-await seems to be via compiler magic by transforming async functions into a 'switch-case state machine' and a hidden context pointer argument.
In vanilla C (without compiler magic) I've mostly seen it implemented via 'green threads' aka 'fibers' aka 'stack-switching', but TBH I'm not sure if this can be implemented with the standard setjmp/longjmp, I've mostly seen it implemented without (and instead use two small assembly functions for the context switch).
One downside of stack-switching is that it doesn't work on WASM.
The big problem with setjmp/longjmp for fibers is that a call to longjmp is undefined behavior if the `jmp_buf` argument was created by a call to `setjmp` on a different thread (1). That means fibers cannot be easily relocated onto a different thread, making M:N threading tricky to implement and erasing a lot of the benefit of fibers.
And that said, implementing a super-fast setcontext/swapcontext is like twenty lines of assembly with not too many gotchas, if you don't care about saving a few things that require syscalls.
But all that said the real downside of stack switching is that it's overkill for coroutines that can be implemented as a finite state machine, unless the runtime supports growable stacks (otherwise you pay a big cost on fiber creation, and eat a lot of memory for many fibers). There are a few languages that do this and it's super cool, but C isn't one of them.
WASM will almost certainly support stack switching, iirc there have been proposals for wasmtime to support it already?
> And that said, implementing a super-fast setcontext/swapcontext is like twenty lines of assembly with not too many gotchas, if you don't care about saving a few things that require syscalls.
sigaltstack(2) wasn't all that prohibitive when I did that in Python 2.6 back then. Was it 2009?
I seriously can't understand this obsession with FSMs. A naive setcontext()-based implementation outperfomed both greenlets (with its crazy legacy of memcpy-ing parts of stack from stackless) and Tornado/Twisted (with them being pure-python and therefore lacking any means to force some async on client libraries. which one does in C) while letting everyone write some nice clean synchronous-looking code.
10 years later we end up with half a language hacked up and still nowhere near the ease of use that was coded in a week or so.
> I seriously can't understand this obsession with FSM
It's the most optimal representation of the common-case (non-recursive asynchronous tasks), has the same overhead as a function call, plays very well with branch prediction, can be easily inlined by optimizers, and it's a lot easier to implement.
The goal is that a compiler should generate this for you. This code is equivalent to the following:
task1:
while True:
handle1 = async task2();
handle2 = async task3();
print(await handle1)
print(await handle2)
task2:
n = 0
while True:
yield n++
task3:
n = 0
while True:
yield n++
It doesn't actually run task2 and task3 both eagerly, I've not got around to scheduling the tasks on DIFFERENT threads. They currently queue onto a single thread, so task2 and task3 are parallel to task1 (but maybe not at the same time) but task2 and task3 are not parallel to each other. This is my goal.
POSIX pre-2008 had (and Linux/Glibc and {Free,Net,DragonFly}BSD still have) <ucontext.h> with proper stack switching functions, used as a fallback in a number of coroutine libraries. The fallback status is due to a self-inflicted inefficiency: they save and restore the signal mask, thus still need to go through the kernel (why e.g. Linux does not put signal mask manipulation in the vDSO, I don’t know). POSIX yanked them and now recommends rewriting to use POSIX threads instead, which is asinine.
Putting each tasklet's stack in different places on the actual stack and jumping between them is inherently unsafe and not portable.
You must be sure that each tasklet does not consume too much stack so that it does not overwrite another.
On BSD Unices, you can only longjump back up the stack. Otherwise, longjmp() will call longjmperror() and terminate the program.
> Putting each tasklet's stack in different places on the actual stack and jumping between them is inherently unsafe and not portable. You must be sure that each tasklet does not consume too much stack so that it does not overwrite another.
It's absolutely unsafe and a ridiculous to do, but what's the reasoning for it being unportable? Wouldn't it just be just as unsafe anywhere that C compiles?
> On BSD Unices, you can only longjump back up the stack. Otherwise, longjmp() will call longjmperror() and terminate the program.
The manpage claims that the semantics is not "only jump back up the stack", but rather that you can't longjmp to "[...] an environment that that has already returned". Technically, the tasklet the we're longjmp'ing to never terminates, right?
In any case, you definitely can't do it on Windows and Emscripten (WebAssembly), where longjmp invokes the same stack unwinding behavior as C++ exception handling, rather than just setting some registers and jumping. Windows has its own APIs for tasklets (fibers); no such luck on Emscripten.
I once took an "Advanced C Programming" class (which would have been better named as "How to do OOP in a language never intended for it") where the instructor expressly prohibited the use of "*i++" and several other language elements because he thought they were confusing. I got into many arguments with the instructor throughout the course, and I figured I would get a poor grade, but he still gave me an A. My main disagreement with him was this: If the language elements are there and well defined, why prohibit their use? The course after all was "Advanced C", wasn't it?
That has been my argument for "preprocessor abuse" - there isn't such thing as preprocessor "abuse"[0], it is part of the language and provides some form of extensibility in a language with an already limited set of features.
If anything the preprocessor needs more features (let me include files from macros and do loops dammit :-P).
[0] ok, i can think of some uses that might count, like "#define BEGIN {", etc that serve no practical purpose, but i don't think anyone called these "preprocessor abuse".
m4 is a bit weird but if you can use that you can use any preprocessor - including a custom one.
The issue is that chances are said preprocessor wont work with editors and IDEs that can parse C to provide tools like syntax completion, jumping to definitions, etc - if anything you'd be lucky if you get working line numbers t use a debugger with.
"Programs must be written for people to read, and only incidentally for machines to execute"
-- Abelson & Sussman, Structure and Interpretation of Computer Programs, 1984.
It's possible to write English with bad structure, clumsy metaphors, obscure vocabulary, and non-non-non-usual idiosyncrasies. And other people might be technically capable of understanding it if they really want to, and try hard enough.
After all, the language elements are there and well-defined, so why would anyone ever complain about "bad" writing?
(Out of curiosity, do you object to people who say the use of "goto" should be seriously restricted, or even prohibited, in most programs? Do you specifically use gotos to make the point that they can still be be useful and productive? The language element is there and well-defined.)
I've used a goto in production code twice in >30 years. In both cases it was the right thing to do, and it had more to do with hardware elements and mission assurance than with software engineering.
This is fun :-) I was pleased to learn about __builtin_longjmp. There's a small aside in this article about the signal mask, which skates past another horrible abyss - which might even make it sensible to DIY longjmp.
Some of the nastiness can be seen in the POSIX rationale for sigsetjmp (https://pubs.opengroup.org/onlinepubs/9699919799.2018edition...) which says that on BSD-like systems, setjmp and _setjmp correspond to sigsetjmp and setjmp on System V Unixes. The effect is that setjmp might or might not involve a system call to adjust the signal mask. The syscall overhead might be OK for exceptional error recovery, such as the arena out of memory example, but it's likely to be more troublesome if you are implementing coroutines.
But why would they need to mess with the signal mask? Well, if you are using BSD-style signals or you are using sigaction correctly, a signal handler will run with its signal masked. If you decide to longjmp out of the handler, you also need to take care to unmask the signal. On BSD-like systems, longjmp does that for you.
The problem is that longjmp out of a signal handler is basically impossible to do correctly. (There's a whole flamewar in the wg14 committee documents on this subject.) So this is another example of libc being optimized for the unusual, broken case at the cost of the typical case.
While some programs/libs (most notably libjpeg) use setjmp/longjmp, I tend to avoid them:
* Does setjmp/longjmp save/restore all registers (eg simd) of modern CPU variations? Can incosistiences happen here?
* As a goto, it restores some context (registers) but not others (memory), which can be counterintuitive. Does it work with volatile variables
* Does it play well with less standard C features like attribute(cleanup) or with C++ features like exception handling or class destructors? To avoid issues, it may be best to stick to very procedural and basic C when using setjmp/longjmp?
* Except registers and memory, there are also IO, FS, Net (at least) contexts, and these are not restored (cannot be really), so this notion of "restore certain vars to their original states" might not work well with certain types of code
Bc some programming gods recommend using setjmp/longjmp, my hesitance is likely unfounded.
"setjmp/longjmp" must save/restore only those registers that are defined by the C ABI as being preserved across function calls.
The program point to where "longjmp" jumps is viewed by the compiler as a function return point, so the compiler assumes that all the other registers hold undefined values.
For most CPUs, the C ABI defines the SIMD registers as being not preserved across function calls, so "setjmp" does not need to save them. Had some of them been defined as preserved registers, "setjmp" would have also saved those.
"setjmp/longjmp" cannot be used in any C++ program, unless all the nested functions that occur between a "setjmp" and a "longjmp" are C functions. This may happen in C libraries which are linked into C++ programs and which may use "setjmp/longjmp" internally, without interfering with C++ objects.
As you say, "attribute(cleanup)" is not something specified in the C language standard, so you cannot expect that "longjmp" will invoke the cleanup function, unless you use a specific libc implementation, whose documentation explicitly says that its "longjmp" will invoke the cleanup functions, when the program is compiled with a certain C compiler. I am not aware of any such "longjmp" implementation.
If you want to use exceptions in a C program, you must use "setjmp/longjmp". If you do not want to use exceptions in a C program, you have no need for them.
The same happens in any other programming language. If you do not want to use exceptions in a C++ program, you have no need to use keywords like "try" and "throw".
> I am not aware of any such "longjmp" implementation.
Clang on Windows behaves this way! That is, when targeting the Visual C++ runtime, as opposed to MinGW.
With the Visual C++ toolchain and runtime, longjmp behaves like a C++ exception throw and goes through a whole stack unwinding routine, unlike on Unix (and MinGW) where it just sets a few registers and branches. This has the downside of being much slower, but the benefit that longjmp will run destructors of C++ local variables as it unwinds (as well as SEH `__finally` blocks), making it less of a footgun in C++ code.
Visual C++ itself doesn't support GCC extensions like `__attribute__((cleanup))`, but these days Clang is highly compatible with Visual C++ while also supporting GCC extensions.
Edit: Windows also has the distinction of, on x86-64, treating the SIMD registers xmm6-xmm15 as callee-saved (preserved across function calls). As you mention, this means that setjmp has to save them, which is fine except that it bloats the size of jmp_buf.
Specifically, The C++ Programming Language standard ISO/IEC 14882:2011 18.10/4 [support.runtime] says this.
> A setjmp/longjmp call pair has undefined behavior if replacing the setjmp and longjmp by catch and throw would invoke any non-trivial destructors for any automatic objects.
Of course, the C++ language runtime is usually written in pure C, and at least one naive implementation of the C++ try/catch mechanism uses setjmp/longjmp under the hood.
I wonder though if setjmp/longjmp was designed with exceptions in mind, or just as a general escape hatch for the 'tamed' structured goto in C (my guess would be the latter).
As you have phrased them, the standard guarantees seem weaker than they are actually specified in the C standard.
Where a "longjmp" returns after a "setjmp", everything is preserved exactly like after a normal function call (including the registers specified by the ABI),
"except that the values of objects of automatic storage duration that are local to the function containing the invocation of the corresponding setjmp macro that do not have volatile-qualified type and have been changed between the setjmp invocation and longjmp call are indeterminate."
For the most frequent use case, when "longjmp" is used to throw exceptions, the only local variables that can become indeterminate, according to the standard, are those that have been passed as arguments to functions invoked inside the C equivalent of a "try" block, or which have been explicitly modified in another way there. Any local variable that is not assigned to or passed as an argument there is preserved.
The standard behavior is normal, because where a "longjmp" returns it is not known where the execution of the nested functions has been aborted and whether any of their output parameters already store their expected final values or only some unpredictable temporary values.
The ISO C longjmp has a silly feature. It takes an argument that is returned to the setjmp caller, but if it is specified as 0, it is rewritten to 1.
It means that a longjmp instruction sequence has to contain an extra conditional check which substitutes a nonzero value for a zero value. Instead of the completely trivial, branch-free instruction sequence it wants to be.
Of all the places where it would be good for C to compensate for a programmer mistake, why do it here? Why can't it just be that the value is the value. If you pass zero to longjmp, your logic breaks; the caller of setjmp gets a zero, and so it looks like a context has just been saved.
Meanwhile, if you perform an input operation on a read/write stream whose most recent operation was output, and you didn't fflush or use a positioning operation, you have undefined behavior.
MSVC doesn't support inline assembly for x86-64 (the declspecs are just hinting the compiler to not generate function entry/exit code and that it shouldn't be confused by the function not returning to the caller - not sure if declspec(naked) even works on x86-64 because I think it only makes sense with inline assembly).
TXR Lisp contains BSD licensed code for setjmp-like saving and restoring, for a modest number of platforms, in a single .S (assembly language with preprocessing) file.
The header file for that is unfortunately not cleanly separated; the top part of this file declares the context structures and the two functions jmp_save and jpm_restore.
I wonder if setjmp/longjmp can be implemented in hardware, i.e. introduce an instruction set that points, save and switch the current registers called register window that can be switched by saving a pointer.
This way both setjmp and longjmp are basically few cycles and exception handling would be hella fast.
The DEC VAX-11 instruction set had SVPCTX and LDPCTX for the kernel which made context switch simple. But they were reserved instructions; so they couldn't be used for setjmp() and longjmp(). VAX-11 also had queue instructions which made rescheduling simple. This was basically:
Which is kind of nice and already proven in 1961 as a better way to do low level coding in Burroughs B5000, nowadays still sold as Unisys ClearPath MCP (naturally with improvements).
Intrinsics can be better understood by the compiler type system, both for safety and optimization algorithms.
Do the intrinsics allow to load and store registers from and to memory? That's kind of the whole point why assembly is required.
Most cross-platform co-routine libraries I've seens for C or C++ use a small separate assembly file for the stack switching magic on MSVC, so if it's possible to do the same with intrinsics, then it's definitely not a well known technique ;)
"setjmp" and "longjmp" are the mechanism for implementing exceptions in C, i.e. they are just another form of writing "catch" and "throw".
The implementation of exceptions, i.e. of jumps over multiple levels of nested functions and blocks, is much simpler in C than in C++, because there are no destructors that must be called when unwinding the stack, so it is enough to restore the CPU registers to the values correct for the program point where the exception must be caught.
This is needed because at the point where the exception is thrown it is not known whether any of the nested functions that must be skipped has modified any of the registers that a function is expected to preserve and where the original values of the registers have been saved.
Any kind of exceptions can be easily misused, which is why it is recommended to be careful with the use of "setjmp" and "longjmp", but they are not more harmful than the use of exceptions in any other language.
The only additional problem of C is that since there are no implicitly called destructors, like in C++ and similar languages, in C the programmer must do what the compiler would do in C++.
This means that if the nested functions that are skipped by a "longjmp" have allocated heap memory, opened files or sockets etc., such resources must be freed in the exception handler marked by a "setjmp".
Therefore the C programmer must keep track of the resources that might have been allocated in the nested functions, e.g. by recording the allocations in some global table.
As well as being harder to work with correctly setjmp and longjmp are often a lot slower than other exception handling systems as you are paying the full cost of saving lots of register state even if an exception isn’t thrown. I’ve seen this cause serious performance issues on Windows on x86-64 in the past.
It also doesn’t really compose well, so if library A is using it for exceptions and library B is doing something else clever then it’s hard to coordinate those two uses.
Can you elaborate? Isn't the c++ exception system built on top of setjmp/longjmp?
For setjmp to have a measurable performance impact, you have to be calling it an awful lot. It is just a handful of mov instructions without dependencies.
> Isn't the c++ exception system built on top of setjmp/longjmp?
It is not. The dominant C++ exception system is based on "zero-cost exception handling", which refers to the fact that you do not call any extra code (like setjmp) until an exception is to be thrown. All of that logic is instead encapsulated in tables of exception-handling data and sophisticated unwind routines that look up those tables to figure out where to transfer control-flow to.
> For setjmp to have a measurable performance impact, you have to be calling it an awful lot.
If you built the C++ exception system on top of setjmp/longjmp, you would have to call setjmp on every instance that begins a try block. This includes implicit try blocks generated for every object that has a nontrivial destructor (which must be called if you unwind through the block). So yeah, you would be calling it an awful lot...
Depending on the ABI it’s saving a lot of registers, and it is very easy to end up calling it a lot if you cannot guarantee the code that may be called by functions you call and you need to do any cleanup.
C++ and other languages tend to make the fast path of no exception extremely fast and store additional data about functions that allow for stack unwinding when an exception is thrown.
None of the popular C++ vendors implement C++ exceptions using setjmp/longjmp on any of the more (and very few of the less) common targets. Maintaining C++ language runtimes is a part of my day job and I am thankful they don't.
The more general POV is that "setjmp" defines a continuation and "longjmp" invokes it.
While the most frequent use of explicit continuations is for implementing exceptions, you are right that there are many other programming techniques based on explicit continuations (for instance coroutines).
The underlying primitive offers even more flexibility, but exceptions (in the sense of programming languages) can also be used for many other things than exceptions (in the sense of natural language, to handle exceptional cases).
Mostly for the usual "don't reinvent a wheel the language standard library already provides" reasons, I think. For instance glibc's x86setjmp/sigsetjmp have been updated to support shadow stacks, but if you'd rolled your own you'd have to do that yourself:
https://elixir.bootlin.com/glibc/glibc-2.35/source/sysdeps/i...
Personally I think that's throwing the baby out with the bathwater for most use cases. You could rephrase my comment as "if you're already committed to reinventing half of libc's wheels, this one is not really any harder or more awkward than most, but if you're not aiming for that overall goal then reinventing just this one wheel is a bad plan" if you like.
longjmp is like return, which can jump out of a function, to the immediate caller. However, longjmp can return directly to a grandparent, great-grandparent, ... this happens without any unwind support; you have to invent your own conventions for that and implement them. Only carefully-written code which knows about the conventions of your particular longjmp wrapper library will get the unwinding.
Having to save and restore the registers was obvious, but as I recall it the main challenge was figuring out the right sequence to do that while also preserving the CPU flags.
I only used the timer interrupt to switch between threads in a round-robin way, so interactivity wasn't the best, but it worked. Was quite pleased with myself for accomplishing that.
I think it was a pretty decent challenge, and one that was quite instructive. Not just did you need to know some details about the various CPU instructions (like which ones affected flags), but it was also a kind of a puzzle to arrange it all the right way.