How Bochs Works Under the Hood (2012) [pdf]

rlyshw · on April 9, 2020

I did a bunch of hacking on bochs for a little virtualization toy project last year and found the source pretty easy to follow, at least much more-so than qemu. Wish I knew about this overview doc before then...

anthk · on April 9, 2020

I would love to have a 3DFX->GL plugin. Also, some optional JIT, but that breaks Bochs' philosophy. If it was X86 focused, it would work as DOSEmu. Fast but without falling into the virtualisation trap.

As a curiosity, I tried NT4 under Bochs and Unreal Engine based games worked well... albeit at 3-4 FPS. That emulated under an AMD Turion64 from ~2005-6.

Bochs emulates a VooDoo and 2/3 are WIP, so in a near future it mayb be an easier to setup alternative to PCE/Box86.

A Ryzen probably will be able to push Deus Ex under Bochs with no issues at all.

bogomipz · on April 10, 2020

The document states:

>"In order to reduce the complexity a bit, all the decoding of the operand fields is done first and then the instruction is executed by one of many hundred small methods that don't have to care(much) about their operands."

I'm confused by this, the operands are decoded first and then the op code is decoded? Or am I reading this incorrectly?

saagarjha · on April 9, 2020

> Some of them are gross speed hacks, to cover up the slow speed that C++ causes.

That’s certainly an interesting sentiment in 2012…

epx · on April 9, 2020

It might have been true by the 1990s due to gcc and even then it was grossly exaggerated by folks that led us to GObject

Koshkin · on April 9, 2020

I don't know about 2012 but in general if used correctly C++ can often be faster than C due to the widespread inlining and the use of templates. (On the other hand, if one writes C++ as if it was Java - the code will be faster than Java's but often can be optimized further indeed.)

userbinator · on April 9, 2020

due to the widespread inlining and the use of templates

On the other hand, this often causes a huge increase in code size, which means less code can fit in the instruction cache, and - even worse - when the system is on the verge of swapping, can make performance drop even faster. It's unfortunate that microbenchmarks often tend to favour such globally-inefficient code that's only slightly faster in isolation but bogs down the system as a whole.

This article is related: https://news.ycombinator.com/item?id=18777735

tpolzer · on April 9, 2020

If you write C++ like it's Java, it will be slower, mostly for two reasons:

- A C++ compiler cannot do speculative devirtualization. Which hurts if you split everything into small virtual functions.

- If you are churning through lots of small allocations and aren't maxing all CPU cores, a garbage collector running in parallel is nearly free - unlike the bookkeeping your C++ deallocations have to go through.

saagarjha · on April 9, 2020

> A C++ compiler cannot do speculative devirtualization.

Yes, it can! For example: https://godbolt.org/z/-JtSiJ

quotemstr · on April 9, 2020

A AOT compiler (almost all C++ compilers are AOT) can't do speculative devirtualization based on dynamic runtime feedback. (PGO is one-shot.) That's what the OP meant.

saagarjha · on April 9, 2020

Fair enough. That's why there's the push in modern C++ for compile-time static dispatch :)

bitwize · on April 9, 2020

For values of "used correctly" that, by the time you have mastered them in C++x, have changed completely in C++x+3.

ahartmetz · on April 9, 2020

If it's about performance, that is usually not the case. In a few cases one could get a performance improvement using move semantics - most other new features are only about safety and convenience.