Fault Tolerant Linux #

FT-Linux pioneers full-software stack fault-tolerance on a single SMP machine based on the Linux software ecosystem, demonstrating – through a proof-of-concept – that monolithic operating systems such as Linux can be made fault resilient.

FT-Linux enables Linux to tolerate transient and permanent hardware failures. FT-Linux is built on top of the replicated-kernel version of Linux, Popcorn Linux, and aims to provide fault tolerance along the whole software stack, from the OS kernel to applications. FT-Linux targets SMP machines in which there are multiple core/processor domains as well as memory controllers (and eventually I/O controllers), with the presumption that each of them may fail independently. Two main categories of faults are considered: a) permanent failures such as fail-stop errors and b) transient errors such as bit-flops in main memory.

ftlinux

You can find more information about FT-Linux in the following papers and Virginia Tech thesis:

Source code and documentation is available online on GitHub.

Contact: Binoy Ravindran, Virginia Tech: binoy@vt.edu


This is an open-source project of the Systems Software Research Group at Virginia Tech.

This work is supported in part by AFOSR (grants FA9550-14-1-0163 and FA9550-16-1-0371) and ONR (grants N00014-13-1-0317 and N00014-16-1-2711). Any opinions, findings, and conclusions or recommendations expressed in this site are those of the author(s) and do not necessarily reflect the views of AFOSR and ONR.