Hacker News new | past | comments | ask | show | jobs | submit login
Remus: possible high availability of apps through replicating virtual machines (usenix.org)
17 points by ktom on Feb 2, 2009 | hide | past | favorite | 4 comments



Brendan Cully's masters thesis has more detail http://www.cs.ubc.ca/grads/resources/thesis/Nov07/Cully_Bren...

This was also interesting: "we believe that the high-frequency checkpointing mechanism we have engineered in support of Remus will have many other interesting applications, ranging from forensics and error recovery tools based on replayable history to software engineering applications such as concurrency-aware time-travelling debuggers."


The problem with these systems is they don't know what is significant state so they have to copy everything to the slave.

The way Remus gets round this is it bulk copies (upto 40 times a second) rather than on every change. So the master runs slightly ahead.

Terracotta is something similar for the JVM. I think they get round it by exploiting the fact the JVM knows what's going on so for example you could say I want only this field on a class to be replicated. (But I've never used terracotta so someone might have to correct me on that.)


I imagine the performance hit is pretty substantial but for things like VOIP or messaging servers this will make real HA possible on commodity hardware. Pretty cool.


someone should combine this with openmosix!




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: