Hacker News new | past | comments | ask | show | jobs | submit login
Systems Programming in C# by Joe Duffy (infoq.com)
92 points by pjmlp on June 18, 2016 | hide | past | favorite | 57 comments



It always seems like a loss to me Joe's blogs read he's still selling the Midori C# OS to some extent, and either sees no real downsides to it or doesn't want to talk about them. Like, on the page introducing the series, he says his biggest regret is it wasn't out there so "the meritocracy of the Internet could judge its pieces appropriately," as if perhaps the big problem was just misjudgment in a non-meritocracy. (And, hey, the Internet's judgment sure has its warts too.)

It's fine to try to salvage good ideas from a project that failed at its initial goals (and MS seems to have!), or even to hold on to thinking the basic ideas were good even though the implementation failed. But if he could just be candid about what wasn't right, he could spread those expensive insights you get by actually building a thing, and show a real ability to learn from failures. Who wants a postmortem that doesn't discuss what went wrong? How useful is it to talk architecture if you aren't clear about downsides and tradeoffs?

Concretely: What did the decision-makers who finally canned Midori dislike? What were the performance bottlenecks (and numbers) after all their tuning? Had they given up too much safety by the time they had it performing well? Was it a compatibility thing, and if so what are his thoughts on an approach to that? Was it just going to take too much investment to finish? I think a little candor about the limitations, downsides, and hard-to-swallow parts of Midori could advance the thinking about its basic ideas much more than a lot of posts about implementation tricks.


Looking at the history of similar fate in other safe stacks, the political reasons or desire to cut research money are higher on the list than technical issues.

Everyone that was able to use Oberon or the Xerox stacks knows that it is possible to have such systems and do productive work on them.


I don't think the project was canned on technical grounds, but more on project management grounds. The project ran for several years with no real deliverables or deadlines, and management inevitably had to question whether the project was bearing fruit.


Looking at Microsoft (and any other company of comparable size) decision making process, I'm seriously doubt the main reason to drop some project is technical most of the time.

https://news.ycombinator.com/item?id=11761437


The first couple of comments are missing the point about this. Joe actually built a real operating system with a variation of C#. Some of the lessons they learned will be informing the next version of C#.

Mads has said that the changes that are coming are for apps like games that need more low level features.

No one is saying that a previous version, the current version, or even the next version of C# will be used to write an OS.

That being said, I really wish they had open sourced midori so that work was available to build on. I know that a lot of you think that only C should be used for OS development. (With the exception of a few of you that think Rust is magic.) They actually built a real operating system with managed code. It's basis in singularity included really novel ideas about where process boundaries should be drawn and how an OS should be composed. It's a shame that it isn't available. Not everyone wants to use or work on a UNIX clone.


Those ideas weren't novel in Singularity, just C's revisionism makes sure not everyone delves into the history of system programming languages.

Check Burroughs developed in 1961.

http://www.smecc.org/The%20Architecture%20%20of%20the%20Burr...

Or read about the Mesa/Cedar Workstation at Xerox PARC:

https://archive.org/details/bitsavers_xerox

If you want to read about a OS written in a fully memory safe language, including the source code, check Niklaus Wirth books:

http://www.ethoberon.ethz.ch/books.html

Specially Project Oberon. The 2013 re-edition has an updated hardware design for an FPGA.

http://people.inf.ethz.ch/wirth/ProjectOberon/index.html

This is how it used to look like in its latest incarnations as BlueBottle written in Active Oberon:

http://www.progtools.org/article.php?name=oberon&section=com...


I work on a project that does a lot of systems programming in C#. It's awful. Myself and several coworkers every now and then ask, "why wasn't this done in C or C++ in the first place?" So much jumping through hoops avoiding the GC kicking in, or the GC moving memory around, doing P/Invokes, translating things between managed and native, and so on... It's not fun at all.


It's definitely not great right now -- that's why we're trying to make it better. :)

I would definitely still advocate using the right tool for the job, though. If the vast majority of your application would be best written in C++ or Rust (something without managed memory) I would just go ahead and do that.

A lot of people, however, have cross-layer applications where a substantial amount of the code has strict performance requirements, but much or most of the rest of the code has looser requirements.


Yup, same experience for us. We went back and forth between workstation and server mode gc. If you have a process that maintains things like leases/heartbeats in a low latency setting, C# doesn't seem like a good idea. Wonder how well Go works in this scenario considering it was purpose built for this.


Go does some of the same stuff Duffy recommends for C# in this post, like using an AOT compiler and stack-allocatable structs.

Some things about Go may make that style of programming feel more natural. For example, structs are the default and syscalls look like other function calls. The stdlib might be friendly to this style (for example, widespread io.Reader/Writer lets a lot of stuff stream data through a reusable buffer rather than allocate big blobs) but I don't know enough to usefully compare it with the .NET libs/BCL.

Or C# could be better for you. It has a lot of work behind it, including in the collector. Go's collector is now decent at keeping most work in the background but isn't generational or compacting as the CLR collector can be. And using a new language is always weird; you never start out as good with the new as you were with the old. The CLR's SustainedLowLatency collector mode, which tries to defer compaction as long as it can at the cost of RAM footprint, is the one that sounds most like Go's, FWIW.

It all depends so much on what kind of deadlines your app has, how much memory pressure, what else you're getting/paying for in C# land. It's always tricky to grok a different ecosystem. The best ideas I can think of are to look for something existing in Go that seems kind of like what you want to do (like if you're implementing some kind of queue, look at NATS or nsq or such), or just build the smallest project that seems like a reasonable test.


Why Go, it has GC, too. I'd look at Rust.



Oberon the OS was cool, but Oberon the language is sort of too 1990s. (But I'd definitely take Modula-2 for low-level stuff instead of C any day, as I did in early 1990s.)


I agree, hence why after my initial interest in Go I eventually switched focus to other languages.

However, it doesn't change the fact that it allows for lots of low level stuff in the similar vein as Oberon, which is why I happen to take Go's side, even if I rather use other more expressive programming languages.

And to be faire, Niklaus Wirth latest language changes (Oberon-07) are even more minimalist than Go's.


Go is also a GC language.


It has pretty low pause times by the GC.


Yep, around 2 milliseconds for most programs

https://sli.mg/1RmNsB


Two-milliseconds is an eternity in kernel time. That's wired round-trip time between two GigE endpoints on Linux's mediocre TCP/IP stack.

Now imagine you stacked 2ms GC pauses into that level of the system. That would be a barely serviceable kernel. Forget any real-time facilities.


OSes written in GC enabled systems programming languages always allowed for controlling the GC behaviour.

So you can have a GC free TCP/IP stack, while enjoying the GC comfort in areas where the 2ms pause aren't an issue.


Or maybe even not completely GC-free [1]. What might be especially helpful is a good JIT that could reoptimize the code on-the-fly, when data patterns changes. Maybe performance level of 'data-guided optimization' provided by (controllable) GC and state-of-the-art JIT could beat down traditional approach someday.

http://lukego.github.io/blog/2013/01/03/snabb-switchs-luajit...


> Now imagine you stacked 2ms GC pauses into that level of the system. That would be a barely serviceable kernel. Forget any real-time facilities.

Real-time just means bounded latency. If 2ms was a hard upper bound, that's hard realtime. If it's ~90% bounded by 2ms with a small variance, that's soft realtime.


Dlang is a much better fit if you want some high level-ish conveniences (e.g. opt-out GC, lazy evaluation) in a systems programming languages without too much trouble.


Same here. My former boss demanded C# for all projects (he's more of a web guy). I protested, but lost. Now we have a ton of hardware interfaces and image processing/analysis routines which are unnecessarily complicated, difficult to maintain, and often slower than they should be.


Slower I can understand, but I don't see how a C# interface could possibly be inherently more complicated than a C or C++ interface for something like an image library, at least to the extent you're implying.

Even hardware interfacing via memory mapped addresses would just need a small shim and types that are byte compatible with C structs you can call via P/invoke, isn't particularly complicated.

Can you give a specific example of what you're referring to?


Poor wording in my part. It's not that the interfaces are more complicated, it's the implementation of those interfaces. Some of the hardware pieces come along with native SDK's (those which don't support e.g. a serial interface), so there's a lot of interop going on.


Right, so the complication is just duplicating the interface in C# for interop, which obviously isn't needed if you just use the SDK language. Still, this just hides the complexities of using that language, like memory safety and garbage collection, so it seems hard to definitively state that it's more complicated than it otherwise would be.

What sort of performance issues do you see? Do you mean the p/invoke/marshalling costs?


The complications are around all of the native interop. It's just a lot of PInvoke and type wrangling scattered about for no good reason.

The performance issues were in the image processing and analysis areas. Image analysis doesn't really lend itself to bounds checking, non-deterministic memory usage, little control over heap allocated memory, etc. Also, I lose access to some of the most powerful imaging libraries out there.

I can work around a lot of it, but why should I have to? Should have used the right tool from the start.


You can circumvent the bounds checking via unsafe code, and avoid heap allocation by using structs. Not sure what non-deterministic memory usage means.

You haven't specified what the right tool is. I think classifying C/C++ as the right tool is contentious too for the reasons I outline. The "type wrangling" isn't there for no good reason, the reasons are quite clear: to maintain memory safety and benefit from automatic memory management. There's also the possibility that you're making it more complicated than it needs to be.


Very interesting! No matter what the original intent was with Midory, writing an OS in C# and gain insights into what is wrong in C# Performance-wise is great. All of these findings can have impact on every C# program out there: a web server isn't considered "Systems programming", but it's not going to say NO to performance improvements

In the slides posted by pjmlp[0], I found one slide particularly interesting: Slide 38 about Contracts:

   Contract.Requires(buffer != null);
   Contract.Requires( 
     Range.IsValid(index, count, buffer.Length)
   );
or the Debug variant of it :

  Contract.Debug.Requires/Assert/Fail
It reminds me of the Dafny programming language[1], but here this seems to be used for performance. The future C# AoT compiler could be validating those Contracts, and from these Contracts enables more aggressive optimizations

The slide about PackN (and future "Safe" stackallock) is also great, it seems like the easiest optimization someone can apply to its current code.:

  int[] array = new int[8] { 0, ..., 7 }; 
  // Heap allocation!  For short-lived arrays, this is bad!
Versus the proposed "Safe":

  Span<int> span = stackalloc int[8] { 0, ..., 7 };
[0] https://qconnewyork.com/system/files/presentation-slides/csy...

[1] http://research.microsoft.com/en-us/projects/dafny/


>writing an OS in C# and gain insights into what is wrong in C# Performance-wise is great

In some ways, yes, but was C# ever intended to be a systems language? No, and that's obvious from its design. So, what is this really telling us? That the language has issues when used in a way it was never intended to be used?


I see your point, For me, it's more in the spirit : "Hey look C# used as a systems language really pinpoints performance problems X, Y, Z."

Improving the compiler to better handle X, Y, Z will yield to improvements all over the spectrum, not only in systems programming. C# used as a systems language only helped us find those issues faster.

Short-lived stack allocated arrays, zero copy, etc. isn't something only systems programmers need. if your average ASP.Net webserver can benefit from it, it's a good thing.


In the 90's C programmers used to state that C++ wasn't a systems programming language, now their main FOSS compilers are written in C++.


On top of that, even some L4 microkernels are written in C++. So, efficient that they fit into a L1 cache as they run ulta-fast. I keep trying to get people to not use C++ for these but funny it's almost a trend now.


> I keep trying to get people to not use C++ for these but funny it's almost a trend now.

I can understand why L4 choose C++ at the time, as it had stricter checking than C, but OS kernels have very little internal code reuse that necessitates inheritance or templates. This is doubly true of microkernels. There is literally no reason to use C++ in this ___domain. Ada or C, and soon Rust should be the only considerations IMO.


Total agreement. :)


Arguably C++ in the 90s is a different beast than C++11 and later.


Those compilers and OSes written in C++ were started when C++98 was the latest version available across all major compilers.


> The future C# AoT compiler could be validating those Contracts

If you install the CodeContracts verifier, it does.


Slides are available at https://qconnewyork.com/system/files/presentation-slides/csy....

Usually the videos take some time to appear at InfoQ.


The slides link to https://github.com/joeduffy/csysprog, but this has been deleted. Does anyone have a copy or fork? Thanks!


Just tried it now and the link is working.


Are you somehow logged in? I only see the github 404 for that link!


Sorry I misunderstood you, I was thinking about the PDF link.

You are right about the github one.


It's frustrating to see that people advocate to use programming languages in the areas where they don't belong just because those languages are convenient, have nice syntax, popular, etc. Go for example started that way but ultimately went to the niche of writing networking services where it shines. It's weird that languages with unpredictable runtime characteristics such as Java and C# are being advertised as systems programming languages.


There is a real time specification of Java called RTSJ. It's also available from several vendors including IBM and Aicas (JamaicaVM)

https://en.wikipedia.org/wiki/Real_time_Java

https://www.aicas.com/cms/sites/default/files/rtsj-next-gen-...

http://www.ibm.com/developerworks/java/library/j-devrtj1/ind...


And Aonix (or Atego). Theirs had quite a few innovations. Plus, had a series of them ranging from full Java down to DO-178B, hard-real-time VM.


Especially since in C# as soon as you are doing this it involves a lot of inconvenience, odd syntax, and is going to not be popular internally.


In this JavaScript world nothing is too mad.


> It's weird that languages with unpredictable runtime characteristics such as Java and C# are being advertised as systems programming languages.

Languages don't have unpredictable runtime characteristics, only specific language runtimes have unpredictable language characteristics. One could replace the standard .NET runtime GC with a hard realtime GC, and .NET would then have more predictable runtime characteristics than C and C++.


You realize that Go also is GCed right?


Yes, and I never said that Go is a systems language. I said Go has found a niche of networking services. This doesn't make it a systems language.


It makes it, because it has all the same language features as Oberon has.

https://en.wikipedia.org/wiki/Oberon_(operating_system)

http://wiki.osdev.org/Go_Bare_Bones

If I am able to write an OS using just the language, with the help of some Assembly, or bootstrap the language and runtime, it is a systems language.

Many of the criteria people use to judge systems languages like inline assembly, would disqualify C when applied to a pure ANSI C compliant compiler without language extensions.


The primary criteria where a system language should be judged is the control over the underlying machine code execution. With GC languages you don't have any - the GC will kick in unpredictably. It may not be an issue for UI (although everyone hates when UI stumbles), but for system code like OS, DBMS, etc. it's simply not acceptable. Languages like C and Rust offer this level of control - you simply know exactly what your code is doing at any given time. With JVM, .NET, Go - you don't.


And yet Xerox PARC, UK Royal Navy, DEC all managed to write operating systems in GC enabled systems programming languages, some of them quite productive Workstation OSes.

You don't want the GC to mess with your lovely data structure?

Allocate on the stack, as a global static or let the GC know not to touch it.

Check Algol-68RS, Mesa/Cedar, Modula-2+, Modula-3, Oberon, Oberon-2, Active Oberon, Oberon-07.


You can control the GC by not allocating, or by allocating off-heap, when you need to. It is totally possible to write kernels, DBMSes, network stacks, etc. in GC languages.

Whether a particular language makes that nice enough to be more worth using than a non-GC language is another question.


> With GC languages you don't have any - the GC will kick in unpredictably.

This is only true of specific GC implementations. Incremental GCs never pause the program. On-the-fly and realtime GCs pause the program for a few microseconds.


LuaJIT is GCed and being used in many system software projects from SnabbSwitch to NetBSD kernel[1]

[1] https://www.netbsd.org/gallery/presentations/mbalmer/fosdem2...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: