Hacker News new | past | comments | ask | show | jobs | submit login

Currently, we have no elegant way to achieve what you want.

When failures occur, repair is done through workers that says when they launch, when they repair chunks, and when they exit in the logs. We also have `garage status` and `garage stats`. The first command displays healthy and non healthy nodes, the second one displays the queue length of our tables and chunks, if their values are greater than zero, we are repairing the cluster. We are documenting failure recovery in our documentation: https://garagehq.deuxfleurs.fr/documentation/cookbook/recove...

For the near future, we plan to integrate opentelemetry. But we are still discussing the design and information we want to track and report. We are currently discussing these questions in our issue tracker: https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/111 https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/207

If you have some knowledge/experience on this subject, feel free to share it in these issues.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: