Systems Programming in C# by Joe Duffy

twotwotwo · on June 18, 2016

It always seems like a loss to me Joe's blogs read he's still selling the Midori C# OS to some extent, and either sees no real downsides to it or doesn't want to talk about them. Like, on the page introducing the series, he says his biggest regret is it wasn't out there so "the meritocracy of the Internet could judge its pieces appropriately," as if perhaps the big problem was just misjudgment in a non-meritocracy. (And, hey, the Internet's judgment sure has its warts too.)

It's fine to try to salvage good ideas from a project that failed at its initial goals (and MS seems to have!), or even to hold on to thinking the basic ideas were good even though the implementation failed. But if he could just be candid about what wasn't right, he could spread those expensive insights you get by actually building a thing, and show a real ability to learn from failures. Who wants a postmortem that doesn't discuss what went wrong? How useful is it to talk architecture if you aren't clear about downsides and tradeoffs?

Concretely: What did the decision-makers who finally canned Midori dislike? What were the performance bottlenecks (and numbers) after all their tuning? Had they given up too much safety by the time they had it performing well? Was it a compatibility thing, and if so what are his thoughts on an approach to that? Was it just going to take too much investment to finish? I think a little candor about the limitations, downsides, and hard-to-swallow parts of Midori could advance the thinking about its basic ideas much more than a lot of posts about implementation tricks.

pjmlp · on June 19, 2016

Looking at the history of similar fate in other safe stacks, the political reasons or desire to cut research money are higher on the list than technical issues.

Everyone that was able to use Oberon or the Xerox stacks knows that it is possible to have such systems and do productive work on them.

AbacusAvenger · on June 19, 2016

I don't think the project was canned on technical grounds, but more on project management grounds. The project ran for several years with no real deliverables or deadlines, and management inevitably had to question whether the project was bearing fruit.

snaky · on June 19, 2016

Looking at Microsoft (and any other company of comparable size) decision making process, I'm seriously doubt the main reason to drop some project is technical most of the time.

https://news.ycombinator.com/item?id=11761437

youdontknowtho · on June 18, 2016

The first couple of comments are missing the point about this. Joe actually built a real operating system with a variation of C#. Some of the lessons they learned will be informing the next version of C#.

Mads has said that the changes that are coming are for apps like games that need more low level features.

No one is saying that a previous version, the current version, or even the next version of C# will be used to write an OS.

That being said, I really wish they had open sourced midori so that work was available to build on. I know that a lot of you think that only C should be used for OS development. (With the exception of a few of you that think Rust is magic.) They actually built a real operating system with managed code. It's basis in singularity included really novel ideas about where process boundaries should be drawn and how an OS should be composed. It's a shame that it isn't available. Not everyone wants to use or work on a UNIX clone.

pjmlp · on June 19, 2016

Those ideas weren't novel in Singularity, just C's revisionism makes sure not everyone delves into the history of system programming languages.

Check Burroughs developed in 1961.

http://www.smecc.org/The%20Architecture%20%20of%20the%20Burr...

Or read about the Mesa/Cedar Workstation at Xerox PARC:

https://archive.org/details/bitsavers_xerox

If you want to read about a OS written in a fully memory safe language, including the source code, check Niklaus Wirth books:

http://www.ethoberon.ethz.ch/books.html

Specially Project Oberon. The 2013 re-edition has an updated hardware design for an FPGA.

http://people.inf.ethz.ch/wirth/ProjectOberon/index.html

This is how it used to look like in its latest incarnations as BlueBottle written in Active Oberon:

http://www.progtools.org/article.php?name=oberon&section=com...

somethingsimple · on June 18, 2016

I work on a project that does a lot of systems programming in C#. It's awful. Myself and several coworkers every now and then ask, "why wasn't this done in C or C++ in the first place?" So much jumping through hoops avoiding the GC kicking in, or the GC moving memory around, doing P/Invokes, translating things between managed and native, and so on... It's not fun at all.

Locke1689 · on June 18, 2016

It's definitely not great right now -- that's why we're trying to make it better. :)

I would definitely still advocate using the right tool for the job, though. If the vast majority of your application would be best written in C++ or Rust (something without managed memory) I would just go ahead and do that.

A lot of people, however, have cross-layer applications where a substantial amount of the code has strict performance requirements, but much or most of the rest of the code has looser requirements.

curiousDog · on June 18, 2016

Yup, same experience for us. We went back and forth between workstation and server mode gc. If you have a process that maintains things like leases/heartbeats in a low latency setting, C# doesn't seem like a good idea. Wonder how well Go works in this scenario considering it was purpose built for this.

twotwotwo · on June 19, 2016

Go does some of the same stuff Duffy recommends for C# in this post, like using an AOT compiler and stack-allocatable structs.

Some things about Go may make that style of programming feel more natural. For example, structs are the default and syscalls look like other function calls. The stdlib might be friendly to this style (for example, widespread io.Reader/Writer lets a lot of stuff stream data through a reusable buffer rather than allocate big blobs) but I don't know enough to usefully compare it with the .NET libs/BCL.

Or C# could be better for you. It has a lot of work behind it, including in the collector. Go's collector is now decent at keeping most work in the background but isn't generational or compacting as the CLR collector can be. And using a new language is always weird; you never start out as good with the new as you were with the old. The CLR's SustainedLowLatency collector mode, which tries to defer compaction as long as it can at the cost of RAM footprint, is the one that sounds most like Go's, FWIW.

It all depends so much on what kind of deadlines your app has, how much memory pressure, what else you're getting/paying for in C# land. It's always tricky to grok a different ecosystem. The best ideas I can think of are to look for something existing in Go that seems kind of like what you want to do (like if you're implementing some kind of queue, look at NATS or nsq or such), or just build the smallest project that seems like a reasonable test.

nine_k · on June 18, 2016

Why Go, it has GC, too. I'd look at Rust.

pjmlp · on June 19, 2016

Because it is the spiritual successor of Oberon.

https://en.wikipedia.org/wiki/Oberon_(operating_system)

http://www.ocp.inf.ethz.ch/wiki/Documentation/Language

nine_k · on June 19, 2016

Oberon the OS was cool, but Oberon the language is sort of too 1990s. (But I'd definitely take Modula-2 for low-level stuff instead of C any day, as I did in early 1990s.)

pjmlp · on June 20, 2016

I agree, hence why after my initial interest in Go I eventually switched focus to other languages.

However, it doesn't change the fact that it allows for lots of low level stuff in the similar vein as Oberon, which is why I happen to take Go's side, even if I rather use other more expressive programming languages.

And to be faire, Niklaus Wirth latest language changes (Oberon-07) are even more minimalist than Go's.

xoluxo123 · on June 18, 2016

Go is also a GC language.

jamra · on June 19, 2016

It has pretty low pause times by the GC.

jjawssd · on June 19, 2016

Yep, around 2 milliseconds for most programs

https://sli.mg/1RmNsB

im_down_w_otp · on June 19, 2016

Two-milliseconds is an eternity in kernel time. That's wired round-trip time between two GigE endpoints on Linux's mediocre TCP/IP stack.

Now imagine you stacked 2ms GC pauses into that level of the system. That would be a barely serviceable kernel. Forget any real-time facilities.

pjmlp · on June 19, 2016

OSes written in GC enabled systems programming languages always allowed for controlling the GC behaviour.

So you can have a GC free TCP/IP stack, while enjoying the GC comfort in areas where the 2ms pause aren't an issue.

snaky · on June 19, 2016

Or maybe even not completely GC-free [1]. What might be especially helpful is a good JIT that could reoptimize the code on-the-fly, when data patterns changes. Maybe performance level of 'data-guided optimization' provided by (controllable) GC and state-of-the-art JIT could beat down traditional approach someday.

http://lukego.github.io/blog/2013/01/03/snabb-switchs-luajit...

naasking · on June 19, 2016

> Now imagine you stacked 2ms GC pauses into that level of the system. That would be a barely serviceable kernel. Forget any real-time facilities.

Real-time just means bounded latency. If 2ms was a hard upper bound, that's hard realtime. If it's ~90% bounded by 2ms with a small variance, that's soft realtime.

unsignedqword · on June 18, 2016

Dlang is a much better fit if you want some high level-ish conveniences (e.g. opt-out GC, lazy evaluation) in a systems programming languages without too much trouble.

EpicEng · on June 18, 2016

Same here. My former boss demanded C# for all projects (he's more of a web guy). I protested, but lost. Now we have a ton of hardware interfaces and image processing/analysis routines which are unnecessarily complicated, difficult to maintain, and often slower than they should be.

naasking · on June 19, 2016

Slower I can understand, but I don't see how a C# interface could possibly be inherently more complicated than a C or C++ interface for something like an image library, at least to the extent you're implying.

Even hardware interfacing via memory mapped addresses would just need a small shim and types that are byte compatible with C structs you can call via P/invoke, isn't particularly complicated.

Can you give a specific example of what you're referring to?

EpicEng · on June 19, 2016

Poor wording in my part. It's not that the interfaces are more complicated, it's the implementation of those interfaces. Some of the hardware pieces come along with native SDK's (those which don't support e.g. a serial interface), so there's a lot of interop going on.

naasking · on June 19, 2016

Right, so the complication is just duplicating the interface in C# for interop, which obviously isn't needed if you just use the SDK language. Still, this just hides the complexities of using that language, like memory safety and garbage collection, so it seems hard to definitively state that it's more complicated than it otherwise would be.

What sort of performance issues do you see? Do you mean the p/invoke/marshalling costs?

EpicEng · on June 21, 2016

The complications are around all of the native interop. It's just a lot of PInvoke and type wrangling scattered about for no good reason.

The performance issues were in the image processing and analysis areas. Image analysis doesn't really lend itself to bounds checking, non-deterministic memory usage, little control over heap allocated memory, etc. Also, I lose access to some of the most powerful imaging libraries out there.

I can work around a lot of it, but why should I have to? Should have used the right tool from the start.

naasking · on June 21, 2016

You can circumvent the bounds checking via unsafe code, and avoid heap allocation by using structs. Not sure what non-deterministic memory usage means.

You haven't specified what the right tool is. I think classifying C/C++ as the right tool is contentious too for the reasons I outline. The "type wrangling" isn't there for no good reason, the reasons are quite clear: to maintain memory safety and benefit from automatic memory management. There's also the possibility that you're making it more complicated than it needs to be.

vmarsy · on June 18, 2016

Very interesting! No matter what the original intent was with Midory, writing an OS in C# and gain insights into what is wrong in C# Performance-wise is great. All of these findings can have impact on every C# program out there: a web server isn't considered "Systems programming", but it's not going to say NO to performance improvements

In the slides posted by pjmlp[0], I found one slide particularly interesting: Slide 38 about Contracts:

   Contract.Requires(buffer != null);
   Contract.Requires( 
     Range.IsValid(index, count, buffer.Length)
   );

or the Debug variant of it :

  Contract.Debug.Requires/Assert/Fail

It reminds me of the Dafny programming language[1], but here this seems to be used for performance. The future C# AoT compiler could be validating those Contracts, and from these Contracts enables more aggressive optimizations

The slide about PackN (and future "Safe" stackallock) is also great, it seems like the easiest optimization someone can apply to its current code.:

  int[] array = new int[8] { 0, ..., 7 }; 
  // Heap allocation!  For short-lived arrays, this is bad!

Versus the proposed "Safe":

  Span<int> span = stackalloc int[8] { 0, ..., 7 };

[0] https://qconnewyork.com/system/files/presentation-slides/csy...

[1] http://research.microsoft.com/en-us/projects/dafny/

EpicEng · on June 18, 2016

>writing an OS in C# and gain insights into what is wrong in C# Performance-wise is great

In some ways, yes, but was C# ever intended to be a systems language? No, and that's obvious from its design. So, what is this really telling us? That the language has issues when used in a way it was never intended to be used?

vmarsy · on June 18, 2016

I see your point, For me, it's more in the spirit : "Hey look C# used as a systems language really pinpoints performance problems X, Y, Z."

Improving the compiler to better handle X, Y, Z will yield to improvements all over the spectrum, not only in systems programming. C# used as a systems language only helped us find those issues faster.

Short-lived stack allocated arrays, zero copy, etc. isn't something only systems programmers need. if your average ASP.Net webserver can benefit from it, it's a good thing.

pjmlp · on June 19, 2016

In the 90's C programmers used to state that C++ wasn't a systems programming language, now their main FOSS compilers are written in C++.

nickpsecurity · on June 19, 2016

On top of that, even some L4 microkernels are written in C++. So, efficient that they fit into a L1 cache as they run ulta-fast. I keep trying to get people to not use C++ for these but funny it's almost a trend now.

naasking · on June 19, 2016

> I keep trying to get people to not use C++ for these but funny it's almost a trend now.

I can understand why L4 choose C++ at the time, as it had stricter checking than C, but OS kernels have very little internal code reuse that necessitates inheritance or templates. This is doubly true of microkernels. There is literally no reason to use C++ in this ___domain. Ada or C, and soon Rust should be the only considerations IMO.

nickpsecurity · on June 19, 2016

Total agreement. :)

antnisp · on June 19, 2016

Arguably C++ in the 90s is a different beast than C++11 and later.

pjmlp · on June 19, 2016

Those compilers and OSes written in C++ were started when C++98 was the latest version available across all major compilers.

zamalek · on June 19, 2016

> The future C# AoT compiler could be validating those Contracts

If you install the CodeContracts verifier, it does.

pjmlp · on June 18, 2016

Slides are available at https://qconnewyork.com/system/files/presentation-slides/csy....

Usually the videos take some time to appear at InfoQ.

atesti · on June 18, 2016

The slides link to https://github.com/joeduffy/csysprog, but this has been deleted. Does anyone have a copy or fork? Thanks!

pjmlp · on June 19, 2016

Just tried it now and the link is working.

atesti · on June 20, 2016

Are you somehow logged in? I only see the github 404 for that link!

pjmlp · on June 20, 2016

Sorry I misunderstood you, I was thinking about the PDF link.

You are right about the github one.

amaks · on June 18, 2016

It's frustrating to see that people advocate to use programming languages in the areas where they don't belong just because those languages are convenient, have nice syntax, popular, etc. Go for example started that way but ultimately went to the niche of writing networking services where it shines. It's weird that languages with unpredictable runtime characteristics such as Java and C# are being advertised as systems programming languages.

bitmapbrother · on June 18, 2016

There is a real time specification of Java called RTSJ. It's also available from several vendors including IBM and Aicas (JamaicaVM)

https://en.wikipedia.org/wiki/Real_time_Java

https://www.aicas.com/cms/sites/default/files/rtsj-next-gen-...

http://www.ibm.com/developerworks/java/library/j-devrtj1/ind...

nickpsecurity · on June 19, 2016

And Aonix (or Atego). Theirs had quite a few innovations. Plus, had a series of them ranging from full Java down to DO-178B, hard-real-time VM.

cakes · on June 18, 2016

Especially since in C# as soon as you are doing this it involves a lot of inconvenience, odd syntax, and is going to not be popular internally.

BrutallyHonest · on June 18, 2016

In this JavaScript world nothing is too mad.

naasking · on June 19, 2016

> It's weird that languages with unpredictable runtime characteristics such as Java and C# are being advertised as systems programming languages.

Languages don't have unpredictable runtime characteristics, only specific language runtimes have unpredictable language characteristics. One could replace the standard .NET runtime GC with a hard realtime GC, and .NET would then have more predictable runtime characteristics than C and C++.

xoluxo123 · on June 18, 2016

You realize that Go also is GCed right?

amaks · on June 18, 2016

Yes, and I never said that Go is a systems language. I said Go has found a niche of networking services. This doesn't make it a systems language.

pjmlp · on June 19, 2016

It makes it, because it has all the same language features as Oberon has.

https://en.wikipedia.org/wiki/Oberon_(operating_system)

http://wiki.osdev.org/Go_Bare_Bones

If I am able to write an OS using just the language, with the help of some Assembly, or bootstrap the language and runtime, it is a systems language.

Many of the criteria people use to judge systems languages like inline assembly, would disqualify C when applied to a pure ANSI C compliant compiler without language extensions.

amaks · on June 19, 2016

The primary criteria where a system language should be judged is the control over the underlying machine code execution. With GC languages you don't have any - the GC will kick in unpredictably. It may not be an issue for UI (although everyone hates when UI stumbles), but for system code like OS, DBMS, etc. it's simply not acceptable. Languages like C and Rust offer this level of control - you simply know exactly what your code is doing at any given time. With JVM, .NET, Go - you don't.

pjmlp · on June 19, 2016

And yet Xerox PARC, UK Royal Navy, DEC all managed to write operating systems in GC enabled systems programming languages, some of them quite productive Workstation OSes.

You don't want the GC to mess with your lovely data structure?

Allocate on the stack, as a global static or let the GC know not to touch it.

Check Algol-68RS, Mesa/Cedar, Modula-2+, Modula-3, Oberon, Oberon-2, Active Oberon, Oberon-07.

Rusky · on June 19, 2016

You can control the GC by not allocating, or by allocating off-heap, when you need to. It is totally possible to write kernels, DBMSes, network stacks, etc. in GC languages.

Whether a particular language makes that nice enough to be more worth using than a non-GC language is another question.

naasking · on June 19, 2016

> With GC languages you don't have any - the GC will kick in unpredictably.

This is only true of specific GC implementations. Incremental GCs never pause the program. On-the-fly and realtime GCs pause the program for a few microseconds.

snaky · on June 19, 2016

LuaJIT is GCed and being used in many system software projects from SnabbSwitch to NetBSD kernel[1]

[1] https://www.netbsd.org/gallery/presentations/mbalmer/fosdem2...