The syscall overhead isn't the problem, that's dirt cheap as you say. The proble...

CamperBob2 · on Jan 6, 2016

Sounds like multiple caches are an obvious solution. No code in the OS needs access to the user code or data cache, and vice versa.

Smart cache management might be one of Intel's goals for the Altera buyout.

xxs · on Jan 7, 2016

>> No code in the OS needs access to the user code or data cache

This is not true for the data. How do you pass any data structures outside CPU registers then, say the path to a file to open. Normally it's a char*[0] (indeed passed in a register) but then the OS actually reads the data off the process memory (L1 data cache usually)

[0]: http://linux.die.net/man/3/open

CamperBob2 · on Jan 7, 2016

I'd say a cache isn't the right structure for passing around data (or references to data) that you know will be accessed very soon by completely different code.

As userland code, you'd like to grant the OS access to a particular subset of lines in D$ while keeping it out of your C$ altogether. Traditional implementations fail in both respects.... and at the same time, the OS probably can't take advantage of its own historical locality because userland has evicted it since the last call.

From what other people are saying it sounds like these problems are being worked on, though.

slashdev · on Jan 6, 2016

It's already here[1], but I don't think it can be used for cache partitioning between the kernel and userspace (but I could be wrong!)

[1] http://danluu.com/intel-cat/

hadagribble · on Jan 7, 2016

(Disclosure: one of the authors here)

I don't think CAT can be used to partition kernel and userspace -- I'm not even sure how you'd go about doing that given that you can (and do) have shared pages between them.

That being said, from our experiments, if you're using userspace network and NVMe drivers the context switch and associated cache pollution is not a problem, since it is happening pretty infrequently (primarily just timer interrupts, and those can be turned off, but we haven't needed to).

uxcn · on Jan 7, 2016

I think Altera had more to do with generic user modifiable compute. They have been working with FPGAs for a while now (http://www.enterprisetech.com/2014/06/18/intel-mates-fpga-fu...).

ori_b · on Jan 7, 2016

> OS flushes a lot of data and instructions from the cache

Only if you actually switch contexts. For I/O, this isn't necessary -- you don't need to touch a thing in the TLB.