Operating Systems: CPU Scheduling

anonymousiam · on April 16, 2022

Nice overview. Windows 1-3 used "non-premptive" or "cooperative" scheduling, which is why any process could hang the system. That method does have its uses though. I once wrote a micro kernel for MSP430 that did it that way too. As long as you can guarantee that a task will complete within a specified time, it's a good lightweight solution.

addaon · on April 16, 2022

There's an intermediate point between pre-emptive and cooperative that I've found useful. For the lack of a better name let's call it "you-better-cooperate".

Suppose a real-time system using cooperative scheduling where well-behaved tasks yield within a guaranteed time window. Suppose also a system that has the ability to launch and restart processes for example in the case of error. In such a system, a poorly-behaved process can hang the system because it doesn't yield.

Introducing pre-emption to such a system avoids the potential hangs, but (a) adds the complexity of pre-emption; (b) only gets exercised in the case that you're already in a failure state (process failed to yield); and (c) allows processes in a known failure state to continue.

Instead, when a process is scheduled, set a timer interrupt for a time period after its guaranteed yield. When the process yields, cancel that timer (or re-schedule it for the next process). If the timer fires, just kill and restart the non-yielding process.

In a limited set of cases, this is a simpler, more robust, and equally powerful system compared to both full pre-emption and cooperation.

_e4mv · on April 16, 2022

If i'm understanding this correctly, then the difference between pre-emptive scheduling and the intermediate you described would be that in the pre-emptive scheduling case, a poorly-behaved process would be forced to give up the CPU and then be scheduled to run again at some point in the future (in a failure state) whereas, in the intermediate solution, a poorly-behaved process would be killed and restarted.

Is that correct? If so, wouldn't this make matters worse if the poorly-behaved process is guaranteed to hang? Is killing a process and restarting it worse or better than context-switching repeatedly?

addaon · on April 16, 2022

Yes, correct.

Killing a process and restarting it is /often/ better than context switching repeatedly, but not always. Pro: It puts the process into a known state. Con: It removes the opportunity for slow forward progress. In the case where the process has been designed to have fixed latency, then slow forward progress is roughly as scary as memory corruption -- something is horribly wrong and you don't know if you're observing a minor symptom of a major problem.

Balancing the pro/con there can be interesting, but the system level pro puts a pretty heavy thumb on the scales. In the intermediate approach, because there's no real pre-emption a whole class of race conditions can't exist. This can be pretty big for ease of system analysis.

ir193 · on April 17, 2022

how about a bookmark and yield mechanism(in PL level maybe), theb process can continue slow forward process when switched back

PaulDavisThe1st · on April 17, 2022

Linux real time scheduling (SCHED_FIFO and SCHED_RR) use a variation on this that approaches it from the other side.

Tasks in those two scheduling tasks will basically never be preempted by a lower priority task nor a task in any other scheduling class. However the system has an overall percentage limit on how much of the cpu tasks in these classes may consume. If they consume more than this limit, they will be preempted and non-RT tasks will run "for a while" before heading back to RT tasks.

anonymousiam · on April 17, 2022

A good example of a "complex" scheduling algorithm. Many others exist. Unfortunately there's no such thing as a "perfect" scheduler for all use cases.

MrYellowP · on April 16, 2022

A real time operating system can, by definition, not be cooperative. Definitely not a hard one and a soft one most likely not either, simply because it can not guarantee that it will process the interrupt quickly enough to be qualified as RTOS.

https://en.wikipedia.org/wiki/Real-time_operating_system

When you set a timer, which stops the running task to switch "to somewhere else", then it's not cooperative.

Can you elaborate?

addaon · on April 16, 2022

A real time system can be cooperative. A real time operating system can use pre-emption to achieve bounded latency. A real time system that does not use pre-emption can use other mechanisms -- including a combination of analytical bounding of task run-times and static cooperative scheduling -- to achieve bounded latency. The latter system can be enhanced by enforcing the analytical bounds, making it somewhat robust to errors in analysis. One can argue that this enforcement makes the system non-cooperative, it's certainly not full pre-emption with all the costs and benefits that go with.

As an example of such a system, consider a bare metal BLDC motor driver. You may statically schedule a sequence of tasks -- read current sensors, read commands, adjust PWM hardware registers, read temperature sensors, change state on temperature error, loop. Suppose that the 'read temperature sensors' task can fail to meet its analytic time budget because the I2C hardware can get into a weird state and just not return. (Suppose further that this isn't hypothetical...) Then having a kill-and-reset-on-timeout feature for the temperature sensor task is an obvious and reasonable workaround to give an improved system. That feature can be added to the temperature sensor task; or, it can be added as a general feature to the round robin scheduler in the way I described.

Hope that's a helpful description of what I was trying to explain! I'm not in any way saying this is a general solution or a universal replacement for a real RTOS; rather that it's a pattern I've ended up re-deriving a time or two that I find interesting.

anonymousiam · on April 16, 2022

Heh. I2C was part of what I was doing on the aforementioned MSP430 OS. I ended up breaking up all the I2C stuff into parts of a state machine (I think I ended up with six states), and ran it in the background as the lowest priority main loop.

anonymousiam · on April 16, 2022

The MSP430 RTOS I wrote was real-time and cooperative, with the caveat that it was application specific, and no task would ever take more than one millisecond to complete. I was careful to design all the tasks so they would never exceed the maximum run-time. For the tasks that needed more time, I broke them into several sub-tasks.

You said; "When you set a timer, which stops the running task to switch "to somewhere else", then it's not cooperative." You are correct. The OS does not require the cooperation of the task in order to suspend it and start/resume a different task. A non-cooperative OS does not require the task to either make a system call (such as waiting on I/O) or to finish. It will preempt the running task according to the scheduler rules. Typically a scheduler will receive periodic interrupts so that it can assess which task should be made active. On real-time systems without much processing headroom, the context switching between tasks can take up a significant percentage of CPU time, which is why I went with cooperative multitasking on the (25MHz) MSP430 micro-controller.

slaymaker1907 · on April 16, 2022

It's very useful as a tool, though I think the ideal is a mix of both. You can do cooperative for the happy path and have an interrupt if something goes wrong with a thread not yielding. The main disadvantage is that single threaded plus fully cooperative makes avoiding races a lot easier.

0xbadcafebee · on April 16, 2022

I remember when the Linux kernel's preemptible patches were introduced. It already had preemptible userspace, but the kernel wasn't, leading to lots of latency issues. Once they hacked up SMP support to add preempting, things were so much smoother

dev_tty01 · on April 17, 2022

Classic MacOS did the same thing. Basically, all GUI processes call GetNextEvent() at the head of the event loop. In 1985 Andy Hertzfeld wrote Switcher which used the GetNextEvent() entry point to pause an app and allow another app to run until the next call. Hertzfeld got the idea from the DOS utility Memory Shift.

https://www.folklore.org/StoryView.py?project=Macintosh&stor...

With well behaved apps it worked remarkably well. Apple brought it into the OS as MultiFinder.

phendrenad2 · on April 17, 2022

This is how RTOSs work on microcontrollers that don't have a supervisor mode. A hardware timer interrupts each process and the interrupt handler switches which process is active. However, there's nothing stopping one process from disabling interrupts for some critical activity (like bit-banging some serial output) and then forgetting to re-enable them.

boondaburrah · on April 16, 2022

I had a scanner that became much more stable running on a Mac OS 9 machine than a Windows machine simply because the scanning software could "hang" the mac and monopolize the SCSI port with exact timing.

traverseda · on April 16, 2022

Reminds me of await/async. With micropython you can get a nice little cooperative multiprocessing operating system going on a micro-controller!

inamberclad · on April 16, 2022

Yep, this is what the Apollo AGC did. Each process had to yield every 20 ms or so.

ailtonbsj_ · on April 16, 2022

I built two simulators when I studying this.

https://ailtonbsj.github.io/cpu-scheduling-simulator/

https://ailtonbsj.github.io/cpu-scheduling-simulator/old.htm...

Demo: https://youtu.be/vulTnsfu6GY

paulmooreparks · on April 17, 2022

> CPU utilization - Ideally the CPU would be busy 100% of the time, so as to waste 0 CPU cycles. On a real system CPU usage should range from 40% ( lightly loaded ) to 90% ( heavily loaded. )

There was an article posted here a while back that challenged the notion that 100% utilization is a desirable goal:

https://blog.danslimmon.com/2015/07/09/when-efficiency-hurts...

peter_d_sherman · on April 16, 2022

https://en.wikipedia.org/wiki/Idle_(CPU)

https://en.wikipedia.org/wiki/System_Idle_Process

https://en.wikipedia.org/wiki/HLT_(x86_instruction)

https://stackoverflow.com/questions/5112097/what-is-the-code...

https://github.com/torvalds/linux/tree/master/kernel/sched

aabbqq · on April 16, 2022

I like writing as list pattern/style. Very easy to be engaged and learn. I found a article about his pattern earlier in hackernews https://dynomight.net/lists/

Rechtsstaat · on April 16, 2022

I also followed a course that was based on the Silberschatz/Gagne/Galvin book, and when searching for background I found many courses based on the same - they were structured extremely similar.

Honestly, I don't see the point of all these instructors making their own summaries of the same book. I think that all these summaries condense the material to the point they're barely more than PowerPoint lists.

Can someone that learns better using this format explain why?

bulibuta · on April 16, 2022

I am a university professor teaching Operating Systems based on the Dinosaur book. I always use the authos' slides to which I add my own notes and extra/missing concepts.

My colleagues from other universities here do exactly what you say, they do their own summary which is an effort I also don't understand.

The only moment when I felt I needed to do my own slides was with the pandemic online courses that refrained me from using the whiteboards in the amphitheater. But in the end the graphical tablet saved me from that.

pantulis · on April 16, 2022

I guess these materials are intended as support for classes, so it is ok for them to resemble Powerpoint decks as there would be a lot of voice over and explanations on top of them by the teacher.

windows_kernel · on April 17, 2022

Vain attempt to justify tuition fees with "original" content.

7373737373 · on April 17, 2022

The KeyKOS mainframe operating system, to which users connected with terminals, had an interesting scheduling and resource metering system: http://www.cap-lore.com/CapTheory/KK/Arch/Three.Special.html...

Users could buy CPU time and grant portions of it to other users in the system (say, their employees) via keys, who could in turn grant resources to other users/ subprocesses recursively, resulting in a metering tree.

The programming language Joule, which was inspired by it, used this system too: http://www.erights.org/history/joule/MANUAL.BK8.pdf

The Sel4 and Genode operating systems as well

backes · on April 17, 2022

Such a system is based on "capabilities" and many research OSes such as Barrelfish<https://barrelfish.org/>, Hydra, Mach, and EROS explored these.

The overall question is to decide who (user, process, ...) has which permissions (read, write, execute, share, revoke,...) on which resources (processes, memory, files,...). Capability based systems provide a finer grained solution to this problem than the classical UNIX access control list with interesting trade-offs.

As an example, privilege escalation exploits can be avoided with a capability based OS. This is also known as the "confused deputy problem". Imagine you're on a server and get billed for each ms of runtime that a compiler uses. You provide an input (e.g. main.c) to the compiler, and an output ___location (e.g. out.a). The compiler runs, writes out.a, and appends a line to a bookkeeping file. The problem is that the compiler has privileged access to the bookkeeping file and every write it does is in this privileged mode. Therefore, you can provide as output file the ___location of the bookkeeping file and overwrite it with the compiled binary.

On a capability based system, the compiler has one (privileged) capability to write to the bookkeeping file, and the user not only provides an output ___location, but also a capability to write to the requested ___location. The compiler then uses the corresponding capability for each write and this guarantees that it'll never escalate privileges.

It is not trivial to design a sound (and efficient) capability based system, nor to implement it. I wonder when they appear in modern OSes as they can solve many privacy-related problems.

ciarcode · on April 16, 2022

I always think to priority-based scheduling in this way. Maybe it can help someone else.

The scheduler will always schedule the highest priority task among ones in the ready queue, but at different times.

- If preemptive, the scheduler will schedule the higher priority task (higher than the currently running one) as soon as it enters the ready queue.

- If non preemptive, the scheduler will schedule the higher priority task only when the running one terminated or explicitly call a yield() (call to yield --> cooperative)

In principle, you can mix both scheduling types, making some tasks "non-preemptable" and other tasks "preemptable".

nopurpose · on April 16, 2022

I've been working with Linux systems for almost 20 years and became familiar with many files in /proc kernel provides to inspect its various subsystems.

/proc/sched_debug is one I still don't understand.

Terry_Roll · on April 17, 2022

The early RaspberryPi's are/were excellent for experimenting with when trying to squeeze performance out of it, I managed to get one to clock up to close to 2Ghz which was that mythical beast to get past whilst remaining stable (no cpu heatsinks or fans) and the cpu scheduling certainly affects performance on them, but it affects all the main cpu's and OS near enough. So what works on Windows can be applied to Linux and vice versa.

This gives a nice overview, and adjusting the Quantum can certainly help any CPU but like the same on the Rpi, its possible to get to a point where the quantum is degrading performance noticeably and I think this has helped pc sales enormously.

https://www.microsoftpressstore.com/articles/article.aspx?p=...

Quantum Settings Registry Value

HKLM\SYSTEM\CurrentControlSet\Control\PriorityControl\Win32PrioritySeparation

2A Hex = Short, Fixed , High foreground boost.

29 Hex = Short, Fixed , Medium foreground boost.

28 Hex = Short, Fixed , No foreground boost.

26 Hex = Short, Variable , High foreground boost.

25 Hex = Short, Variable , Medium foreground boost.

24 Hex = Short, Variable , No foreground boost.

1A Hex = Long, Fixed, High foreground boost.

19 Hex = Long, Fixed, Medium foreground boost.

18 Hex = Long, Fixed, No foreground boost.

16 Hex = Long, Variable, High foreground boost.

15 Hex = Long, Variable, Medium foreground boost.

14 Hex = Long, Variable, No foreground boost.

https://docs.microsoft.com/en-us/windows-hardware/test/wpt/c...

https://live.sysinternals.com/tools/Clockres.exe

willis936 · on April 16, 2022

How do you have a hard realtime system when the amount of work needed to be done in a period of time is not deterministic? How can you even prove the bounds of the amount of work?

ip26 · on April 16, 2022

The way I understand it is a hard realtime system does not solve that problem but does ensure the process with too much work does not prevent any other process from meeting its obligations. In other words, the failure to meet demand is isolated & prevented from cascading into other tasks.

Of course, if the process that is overloaded is the process responsible for keeping the system from crashing into the Sun, this doesn't quite fix the problem.

ciarcode · on April 16, 2022

Then it would be not an hard real time system.

sedatk · on April 16, 2022

With non-maskable hardware interrupts?

sydthrowaway · on April 16, 2022

Now we need to know the state of the art.

_e4mv · on April 16, 2022

I agree, there are many people working on their own open-source operating systems. I would put in the effort to learn and contribute if only I knew what the state of the art was. Maybe linux kernel == S.O.T.A ?

sydthrowaway · on April 16, 2022

How about seL4?

_e4mv · on April 16, 2022

Interesting suggestion, thanks.

snvzz · on April 17, 2022

seL4's mixed criticality (MCS) kernel and the time capabilities absolutely are the state of the art.

asdadsdad · on April 16, 2022

love scheduling theory