I did a bunch of hacking on bochs for a little virtualization toy project last year and found the source pretty easy to follow, at least much more-so than qemu. Wish I knew about this overview doc before then...
I would love to have a 3DFX->GL plugin. Also, some optional JIT, but that breaks Bochs' philosophy. If it was X86 focused, it would work as DOSEmu. Fast but without falling into the virtualisation trap.
As a curiosity, I tried NT4 under Bochs and Unreal Engine based games worked well... albeit at 3-4 FPS. That emulated under an AMD Turion64 from ~2005-6.
Bochs emulates a VooDoo and 2/3 are WIP, so in a near future it mayb be an easier to setup alternative to PCE/Box86.
A Ryzen probably will be able to push Deus Ex under Bochs with no issues at all.
>"In order to reduce the complexity a bit, all the decoding of the operand fields is done first and then the instruction is executed by one of many hundred small methods that don't have to care(much) about their operands."
I'm confused by this, the operands are decoded first and then the op code is decoded? Or am I reading this incorrectly?
I don't know about 2012 but in general if used correctly C++ can often be faster than C due to the widespread inlining and the use of templates. (On the other hand, if one writes C++ as if it was Java - the code will be faster than Java's but often can be optimized further indeed.)
due to the widespread inlining and the use of templates
On the other hand, this often causes a huge increase in code size, which means less code can fit in the instruction cache, and - even worse - when the system is on the verge of swapping, can make performance drop even faster. It's unfortunate that microbenchmarks often tend to favour such globally-inefficient code that's only slightly faster in isolation but bogs down the system as a whole.
If you write C++ like it's Java, it will be slower, mostly for two reasons:
- A C++ compiler cannot do speculative devirtualization. Which hurts if you split everything into small virtual functions.
- If you are churning through lots of small allocations and aren't maxing all CPU cores, a garbage collector running in parallel is nearly free - unlike the bookkeeping your C++ deallocations have to go through.
A AOT compiler (almost all C++ compilers are AOT) can't do speculative devirtualization based on dynamic runtime feedback. (PGO is one-shot.) That's what the OP meant.
If it's about performance, that is usually not the case. In a few cases one could get a performance improvement using move semantics - most other new features are only about safety and convenience.