Resilient Execution with Bounded-Time Recovery (REBOUND)


		Resilient Execution with Bounded-Time Recovery (REBOUND) Overview Recently there has been an increasing number of attacks on critical infrastructure systems, such as factory control networks, medical devices, and even nuclear power plants. Since these systems directly interact with the physical world, a successful attack can have serious consequences, including physical damage and even loss of life. Thus, it is important that we find a way to provide better security. However, these systems have two key characteristics that make them difficult to secure. First, they tend to include many embedded devices, which have limited resources and cannot easily support defenses that require substantial amounts of redundancy. And second, in contrast to "classical" distributed systems, timing is critical: if a correct action (say, stopping the injection pump in a medical device) is taken too late, or at the wrong time, the results can be just as damaging as those of an incorrect action, or no action at all. In combination, these two characteristics rule out most existing defenses: many are too heavyweight or can handle only benign faults, and – to our knowledge – there is no general defense at all that can provide hard timing guarantees when nodes are compromised by an attacker. The goal of the proposed project is to develop a completely new way to build systems that are resilient to attacks. We do not attempt to mask all symptoms of an attack, which many existing defenses do at great cost. Instead, we exploit the fact that many systems cannot change their state arbitrarily quickly – due to properties such as inertia or thermal capacity – and can thus already tolerate brief disruptions, as long as the system quickly returns to a correct state. Thus we aim to guarantee that 1) the system will meet its timing requirements in the absence of an attack, and that 2) when under attack, the system will return to a correct state within a bounded amount of time. We call this approach bounded-time recovery (BTR). Compared to classical fault tolerance, BTR has a number of potential advantages, including a substantially lower cost, the ability to handle more severe attacks, and a way to provide graceful degradation under attack. We aim for a solution that can deliver provable guarantees in the "Byzantine" threat model, without a-priori knowledge of what the attacks will look like, or which nodes will be attacked. Publications Fault Tolerance and the Five-Second Rule Ang Chen, Hanjun Xiao, Andreas Haeberlen, and Linh Thi Xuan Phan 15th Workshop on Hot Topics in Operating Systems (HotOS XV), Kartause Ittingen, Switzerland, May 2015. [PDF] [BibTex] [Slides] Contributors Faculty: Linh Thi Xuan Phan Andreas Haeberlen Students: Brian Sandler Neeraj Gandhi Funding This work is funded by the National Science Foundation under the Secure and Trustworthy Cyberspace program (grant number CNS-1750158).


Web site contact: Andreas Haeberlen