Sie sind auf Seite 1von 6

Systems Design of Cybersecurity in Embedded Systems

M. Vai, D. Whelihan N. Evancich, K.J. Kwak, M. Britton, J. Foley, D. Schafer, J. DeMatteis


MIT Lincoln Laboratory J. Li M. Lynch AFRL/RIGA
244 Wood Street Intelligent Automation, Inc. Alion Science and Technology 525 Brooks Road
Lexington, MA 02420 15400 Calhoun Drive 220 Salina Meadows Parkway Rome, NY 13441
POC: mvai@ll.mit.edu Rockville, MD 20855 Syracuse, NY 13212

AbstractMission critical embedded systems should be coordinated. Therefore, performing cybersecurity analyses on a
capable of performing intended functions with resiliency against system-under-design with incomplete information, if not done
cyberattacks. The methodology of design-for-cybersecurity is properly, could be error-prone, misleading, and even counter-
now widely recognized, in which the effects of cybersecurity, or productive. We thus need to, in addition to subjecting the
lack thereof, on system objectives must be determined. However, finished system to red-teaming assessment, incrementally
developers are often challenged by the difficulty of analyzing a analyze, articulate, and demonstrate the effectiveness of the
system-under-design without complete specifics. In this paper, we cybersecurity architecture while the system is still being
describe a systems design approach, which incrementally models developed. The systems design approach to be described in this
the cybersecurity architecture, components, and interfaces of an
paper has been developed to fulfill this need.
embedded system for analysis and demonstration. We have
applied this approach to analyze the mission resiliency of an This paper is organized as follows. Section II provides the
avionic computer being developed and demonstrate its definitions of security and resilience in our project. Section III
operations in a scenario when the system is under attack. presents a baseline resilient architecture for the mission
computer, currently in its early development phase, and
Keywordssecurity; resiliency; metrics; embedded system; overviews its design principles. The rest of the paper is
systems design; embedded processor; secure processor; separation dedicated to explaining a systems design approach created to
kernel; key management; cryptography; modeling and simulation;
analyze and evaluate, at mission level, the cybersecurity of this
rapid prototyping.
architecture. The use of modeling and simulation to articulate
and demonstrate its cybersecurity operations in a realistic
I. INTRODUCTION scenario is also described.
The defense against cyberattacks is inherently asymmetric.
Despite the use of best practice to protect a system, an attacker II. CYBERSECURITY: SECURITY AND RESILIENCY
can potentially defeat the entire security scheme by exploiting
Numerous publications have discussed topics relevant to
a single system vulnerability. A mission critical embedded
cyber resiliency, for example, cyber resiliency engineering
system must thus be resilient against successful attacks. In this
framework [1] and cyber resiliency metrics [2]. These two
paper we describe the use of a cybersecurity oriented systems
documents have cited a substantial collection of references and
design approach in the development phase of mission critical
provided a wealth of foundational concepts and information for
embedded systems. We have been applying this approach in
our work. Perhaps the research lineage of resiliency could even
the early development phase of a cyber resilient avionic
be traced back to fault tolerance, as attacks could be considered
mission computer. The overall project goal is to develop,
as intentionally induced faults, which the system must be
prototype, and demonstrate a reference cyber resilient
resilient against along with other faults (e.g., bugs, defects,
architecture, which is capable of detecting and isolating
etc.). However, there is an important difference between fault
intrusions, restoring operations, and evolving to defuse future
tolerance and cyber resiliency. Fault tolerance technologies
attacks.
generally assume faults are independent events. Cyberattacks,
We have adopted an incremental development process in on the other hand, are likely to be coordinated. Cyber
this project, in which the system is designed, implemented, and resiliency analysis thus requires the cautious use of probability
tested by adding a little more features each time until the and statistics, which are popular in fault tolerance research. For
system is finished. Such a process is good for managing the example, the use of conditional probability, such as Bayesian
risks in ambitious R&D (research and development) projects, networks, has been proposed to mitigate this problem [3].
but presents a challenge for the analysis and evaluation of
For the purpose of our research project, we use the term
cybersecurity. Cybersecurity depends not only on the choice of
cybersecurity to cover both security and resiliency. Various
security primitives, but also on how they are assembled and
definitions of security and resiliency exist. We define
This work was sponsored by the Department of the Air Force under Air security as the capability of a system being protected or safe
Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any from attacks. Security design thus aims at preventing attacks so
opinions, findings, conclusions or recommendations expressed in this material
are those of the author(s) and do not necessarily reflect the views of the that a system can be entrusted to perform in support of
Department of the Air Force. Approved for Public Release; Distribution successful missions. We envision, in a highly simplified way,
Unlimited: 88ABW-2016-2047 20160421
1

978-1-5090-3525-0/16/$31.00 2016 IEEE


an approach to secure a system as follows. The security Example of using moving target techniques for evolving has
requirements of a system are determined by expected behaviors been described in [6].
according to the CONOPS (concept of operations). The best
practice, determined by acceptable cost and performance III. CYBER RESILIENT MISSION COMPUTER
overheads associated with defensive features, is then applied to
secure the system. The mission computer controls the operations of an
unmanned aerial vehicle (UAV) and is a natural target for
This approach is of course correct, but we need to cyberattacks. Current protection schemes are insufficient for
understand a major caveat. As mentioned, the current mission assurance given the dynamic nature of vulnerabilities
cybersecurity landscape is actually in favor of attackers. An and cyberattacks. As the objective of this paper is to explain
attacker can render any system defense useless by discovering the use of a systems design approach for incremental resiliency
a single, unexpected vulnerability. An example of this analysis, the pros and cons of various cybersecurity
challenge in the form of security software is illustrated in Fig. 1 technologies are beyond the scope. Instead, we summarize the
[4]. While the average number of lines of code in malware design principles of our mission computer architecture as
stays virtually unchanged, the size of security software has follows. The design will minimize trusted units as they are also
been growing exponentially. attack surfaces. Cryptography (crypto) and key management
will be used to protect data (at-rest, in-transit, and in-use), and
authenticate system configuration and parameters. Critical
functions will be randomized and diversified to avoid break-
one-break-all situations. In the event of a successful attack, the
system will detect anomalies by monitoring system behavior,
isolate and limit cascading failures, and restore rapidly by
reconfiguring, replacing, and restarting. Fig. 2 shows a
baseline of our mission computer architecture created
according to these principles. Its goal is to withstand attacks,
both expected and unpredicted, and achieve the mission
objectives.

Fig. 1. Asymmetric nature of security defense vs. threats


[4].
As it is impossible to correctly predict every future attack,
securing a system to prevent it from being attacked cannot
guarantee mission assurance. Just being secure is no longer
adequate, systems must also be resilient. Our project is
vigorously pursuing answers to the following essential mission
assurance questions: What happens if the attack is successful?
Can, and how does, the system withstand attacks and complete Fig. 2. Baseline resilient mission computer architecture.
mission goals, recover from a degraded state back to a nominal
state, and evolve against further threats? We adopt the The key cybersecurity features in this architecture are the
definition of resiliency as the ability for a secure system to use of a separation kernel and the incorporation of crypto and
withstand unpredicted and successful attacks, continue to key management. The separation kernel creates a virtual
provide necessary functions for mission success despite distributed environment in which each process is executed in
adversarial actions, and recover within an acceptable time, its own isolated partition [7] [8]. Information can only flow
cost, and risks [5]. Our ultimate goal is to equalize the contest from one partition to another using channels established and
by designing our system so that the discovery and exploitation controlled by the separation kernel. In the event of a successful
of a vulnerability is at least as difficult as our ability to detect attack, this facilitates the quarantine of infected processes and
or recover from the attack. minimizes the damage. Another benefit is it encourages a
modular design, an essential system property that allows
Three essential cybersecurity operations support mission individual processes to be reloaded and restarted.
assurance in the event of successful attacks. First, the system Cryptography provides the foundation to ensure confidentiality
should maintain critical functions required for mission and integrity for data-at-rest, data-in-transit, and data-in-use
completion. This could be achieved by using, for example, [9][10].
redundancy and separation to minimize the impacts of a
successful attack. Second, the system should dynamically We will first describe the operational concept of this
restore resources and services to recover subsequent to a architecture, and then explain the need of a systems design
successful attack. Third, the system should dynamically evolve approach for cybersecurity analysis and evaluation, which we
to alternative configurations to avoid repeated attacks. will describe in the next section. Fig. 2 partitions the
Approved for Public Release; Distribution Unlimited: 88ABW-2016-2047 20160421
architecture into software and hardware layers. The hardware

978-1-5090-3525-0/16/$31.00 2016 IEEE


layer includes processor cores, FPGA (field programmable loss of communications), instead of specific attacks (e.g.,
gate array) fabrics, and associated memories, which provide jamming) in the systems design process. Fig. 3 shows the
processing power for application and system functions. In identification step of attack categories (AC1 to ACk) that the
addition to a regular network interface, the architecture also mission will potentially encounter during its execution.
incorporates an interface extension to a standard avionic data
bus system, such as the 1553 bus commonly used in military The relevancy of these attack categories to our mission
aircrafts [11]. A secure processor, such as the key-centric objectives will be analyzed to develop a Risk Analysis Graph
secure thread processor, in which each piece of data or code (RAG) with respect to the system architecture. As will be
can be protected by encryption while at rest, in transit, and in explained later in this section, the RAG hierarchically captures
use, establishes the trusted computing base for the entire the dependency of mission objectives to system functions (SF1
system [12]. It handles secure boot and configuration to SFm) and subsystem functions (Sub1 to Subp). The RAG also
authentication at startup time and provides a secure and connects tangible system level metrics (SM1 to SMm) to
isolated environment for the resiliency features. A crypto and mission level metrics. Examples of system level metrics are
key management coprocessor, such as the SCOP (security system reboot time, data access time, etc.
coprocessor), provides crypto services to the software layer The calculus of mission level resiliency metrics, their
[13]. The hardware layer is interfaced with the software layers relationship with physically measurable system level metrics,
by architecture and board support packages (e.g., protocol and developing tools to support analyses and operations are not
conversion drivers). without challenges. Currently we rely on Subject Matter Expert
On top of the separation kernel, applications, such as auto (SME) assessments and system level metrics to estimate the
pilot modules (APM) and intelligence/surveillance/ likelihood for a mission objective to fail. The RAG is also a
reconnaissance (ISR) payloads operate in their own partitions. tool for strengthening the system architecture to reduce the
The crypto service provides application program interface likelihood of failing certain mission objectives. Our intention is
(API) to the crypto and key management coprocessor in the to produce quasi-quantitative mission objective metrics for
hardware layer. The monitoring and recovering services, decision makers to perform cost-benefit analysis and compare
supported by a policy engine, are included in the architecture to multiple resiliency technologies.
provide essential resiliency functions for system availability.
The development plan for this architecture is ambitious and
not without risk. No COTS (commercial-off-the-shelf)
processor board can provide all the specified hardware
properties. We thus need to develop a custom processor board
as the mission computer hardware and provide support
packages for the implementation of separation kernel,
applications, and cybersecurity functions. We use an
incremental modular development approach to manage risks in
this multi-year project. The plan is to develop the mission
computer in spirals and demonstrate its cybersecurity in
phases. The dilemma is that a lot of the cybersecurity details in
the monitoring and recovering approaches cannot be identified
during early design phases when neither hardware nor software
can be fully specified. In the next session, we will explain the
use of a cybersecurity oriented systems design process to Fig. 3. Systems design work flow for defining and evaluating
incrementally co-design the cybersecurity features along with mission level resiliency metrics.
the functionality development.
The rest of this session explains the overall concept of
mission level resiliency design and evaluation using a simple
IV. SYSTEMS DESIGN FOR CYBERSECURITY example of our mission computer design.
The concept of our systems design approach for
cybersecurity is to focus on system behaviors when mission A. Mission Objective
critical functions are lost, regardless of their causes (defects or We will use a simple mission objective of reaching and
attacks). This approach allows us to analyze and measure how reporting from multiple waypoints to illustrate our systems
well the system fares under successful attacks. Besides, this design process. Fig. 4 illustrates this simplified mission
approach enables a systematic mitigation of vulnerabilities, for environment. Mission success depends on the coordination of a
example, by providing redundancy. drone (as a UAV surrogate), a pilot, and a ground control
Fig. 3 illustrates the systems design work flow for defining station (GCS). Fig. 4 also depicts an adversarial attacker,
and evaluating mission level resiliency metrics. First we whose goal is to steer the drone away from its waypoints.
identify the mission objectives (MO1 to MOn). The potentials The pilot wirelessly provides flight command and control
of achieving these mission objectives will be used as mission (C2) to the drone. The GCS receives and displays telemetry
level resiliency metrics. It is impossible to predict all potential data sent by the drone. The mission computer aboard the drone
attacks, instead we will use high level attack categories (e.g., is responsible for carrying out the mission objective of flying to
Approved for Public Release; Distribution Unlimited: 88ABW-2016-2047 20160421
3

978-1-5090-3525-0/16/$31.00 2016 IEEE


the waypoints. To avoid the difficulty of proving a negative, exercise, combined with an assessment of adversary capability,
our quasi-quantitative mission level metric is defined as the will facilitate an analysis to estimate the likelihood of
likelihood of the drone missing waypoints (i.e., failing the functional failures (i.e., caused by attack categories).
mission objective).

Table 1. Example attack categories and CIA violations.

Attack
Effects CIA Violations
Categories
Fly drone to a
Modify Mission different site; Availability and
Control Miss waypoints; integrity
Steal or destroy
UAV;
Disrupt Make drone
Fig. 4. Mission objective: reaching multiple waypoints. Command and unavailable; Availability
Control Delay reaching
waypoints;
B. Attacks
Warn targets to Confidentiality;
Security designs count on accurate threat models as it is move or take
impractical to target all possible attacks. In our systems design Exfiltrate potentially also
methodology, we create and apply threat models with respect Mission Plans actions; availability and
to attack categories. Raising threat models to a level higher is Reduce waypoint integrity
necessary as we need to consider resiliency to failures caused significance;
by attack categories, rather than individual attacks. For
example, we will consider resiliency in a scenario when
communication is lost, which could have been the result of C. System Operations
many causes, including unpredictable ones. In this example, we consider a highly simplified resiliency
Threat models are determined by mission objectives and scheme in which the system is equipped with both a main
CONOPS. Attacks could be directed at either the drone or the APM and a resilient APM as a backup. The system operates as
GCS and may come from many known and unknown channels, follows: The radio receiver receives C2 from the pilot and/or
such as data links, insider threats, etc. In this simple example, the CGS. During normal operations, the main APM interprets
we assume that attacks will be directed at the drone from a flight instructions to control the drone motors. In the event that
malicious control, which injects malicious commands into the the main APM has been compromised by a successful attack, it
drone Auto Pilot Module (APM) and steer it away from its begins to issue malicious flight instructions. The monitoring
intended waypoints. service detects the attack and directs the resilient APM to take
over the propeller control. The recovering service then directs
The systems design approach uses the information the main APM to reload (its code) and restart. When the main
assurance (IA) CIA triad (confidentiality, integrity, and APM acknowledges that it has successfully rebooted, the
availability) to guide the thought process of resiliency recovering service directs main APM to retake the control of
requirements. In our example, the confidentiality requirement the motors.
is violated if waypoint information is accessed by the attacker.
The integrity requirement is violated if waypoint information is The details of monitoring (e.g., how attacks are detected),
modified in any way. The availability requirement is violated if and recovering (e.g., how continuity is maintained when
positive flight control cannot be maintained to reach control is being switched) will be included for analysis
waypoints. incrementally as they are being developed. The resiliency
modeling and simulation in the next section considers two
Table 1 shows a few example attack categories derived by parameters. The first one is the time latency of attack detection
considering the CIA triad and its effects. Each of the attack and the second one is the time required for reload and restart.
categories should be interpreted broadly and liberally. For The importance of these high level parameters does not change
example, in addition to literally modifying flight control by for with the specifics of monitoring and recovering.
example, issuing fake commands (e.g., the land command),
similar effects could also be achieved by GPS spoofing. In the D. Modeling and Simulation
case of mission plan exfiltration, even though a simple model
We use modeling and simulation as a tool for analysis. Fig.
may consider them as mere confidentiality violations, we treat
5 shows a Simulink [14] model of the resilient architecture,
them as potentially violations of availability and integrity.
Targets with this intelligence could prepare to hide, or take which includes an attack module. In simulation at random
actions that would confuse or cause the UAV to deliver times determined by a user selectable attack probability model,
incorrect information. The attack categories derived from this the attack module generates and sends malicious flight
Approved for Public Release; Distribution Unlimited: 88ABW-2016-2047 20160421
instructions to the drone mission computer.
4

978-1-5090-3525-0/16/$31.00 2016 IEEE


The resilient architecture in Fig. 5 consists only of the reaching all waypoints and system functions. In Fig. 7, the
modules involved with the resiliency scheme. The parameters mission goal of reaching all waypoints depends on positive
mentioned above, attack detection latency and reboot latency, flight control. The dependency of this system level function
are both user selectable. For performance comparison of the can be hierarchically derived to lower level sub-system
architecture with and without resiliency, the model provides a functions. For example, positive flight control depends on the
switch to turn on and off the resiliency feature. correct functions of APMs, communications, monitoring, etc.
Currently, such essential function dependency diagrams are
The model is simulated and its output used to drive a drone manually developed by analyzing the system being evaluated.
flight simulator. A screen capture of the in-flight drone video Eventually we envision the development of an automatic tool
feed in the flight simulator is shown in Fig. 6. The designer can to extract information from a system model to create these
visualize the effects on the drone with and without resiliency, diagrams.
as well as the performance of resiliency under various settings
(latencies and attack probability). Besides providing a tool for
security and resiliency analysis, this capability is useful for
demonstration and articulation of the resilient architecture
being designed. Modeling sophistication (e.g., hardware-in-the-
loop) can be added incrementally as architecture specifics
become available.

Fig. 7. Essential functions (partial list only) supporting mission


goal.

Next we use the function dependency graph and relevant


attack categories to develop a risk analysis graph (RAG),
analyze a secure architecture, and estimate mission failure
likelihood. This approach has been adopted from the NASA
Fig. 5. Systems design model (screen capture). fault tree analysis approach [15]. RAG is a systematic
deductive process to decompose mission failing events into
component failures. We have a few objectives. The first is to
calculate a quasi-quantitative score for the mission level
resiliency metric. In the ongoing example, we have created a
representative RAG for the event of reaching waypoint mission
failure and calculate a score for the corresponding metric. Fig.
8 shows a portion of this RAG developed for the mission
metric.

Fig. 6. Simulated video feed of drone flying under the control


of the simulator in Fig. 5.
Fig. 8. Risk analysis graph (partial) for mission effective
E. System Analysis assessment. (* An attack must be detected and defeated before
the drone crashes.)
In order to estimate mission assurance metrics, we analyze
in Fig. 7 the dependency between the mission objective of
Approved for Public Release; Distribution Unlimited: 88ABW-2016-2047 20160421
5

978-1-5090-3525-0/16/$31.00 2016 IEEE


In order to develop a full RAG, we need to analyze the As mentioned, we acknowledge the restriction of using
drone embedded system architecture with the assistance of probability and statistics in cybersecurity analysis and are
system engineers (i.e., SMEs) who understand mission looking into the use of Bayesian network functions for
operations. The concept, shown in Fig. 8, is that for failing to mitigations [3]. Further research, experiments, and validation
reach a waypoint (top level), one or more of the events will be performed to refine this systems design methodology,
depicted at one level below must have occurred. We can then determine appropriate system level metrics, estimate
hierarchically develop the entire RAG until we reach the leaf probabilities, provide the information to system control, and
nodes. The designer can make a node a leaf node if she support decision making.
considers it appropriate for the developer or a SME to assign a
probability of its occurrence. Note that in the RAG, each leaf REFERENCES
node should represent a measure at lower software and/or
[1] D. Bodeau and R. Graubart, Cyber Resiliency Engineering Framework,
hardware levels. In order for the drone to survive, an attack MITRE, September 2011. (https://www.mitre.org/sites/default/files/
must be detected and the switch over to the resilient APM pdf/11_4436.pdf, accessed March 7, 2016.)
completed in time (i.e., before the drone crashes). Coming [2] D. Bodeau, R. Graubart, L. LaPadula, P. Kertzner, A. Rosenthal, and J.
down from the top in Fig. 8, the waypoint reaching mission Brennan, Cyber Resiliency Metrics, Version 1.0, Rev. 1, MITRE, April
fails when one of the following functions fails: flight control, 2012. (https://register.mitre.org/sr/12_2226.pdf, accessed March 7,
propeller, bus, or radio. The failure of flight control indicates 2016.)
that the main APM must have been successfully attacked and [3] P. Xie, J. Li, X. Ou, P. Liu, and R. Levy, Using Bayesian Networks for
Cyber Security Analysis, IEEE/IFIP International Conference on
the resilient APM has failed to take over. Going down the Dependable Systems and Networks (DSN), 2010, pp. 211 220, 2010.
graph, this is the consequence of the monitor failing to detect [4] P. Zatko, If you dont like the game, hack the playbook, DARPA
the attack in time. Cyber Colloquium, November 2011. (http://www.politico.com/
static/PPM223_111206_darpapresentation.html, accessed March 7,
We now explain how we estimate, based on the RAG in 2016.)
Fig. 8, the scoring of a mission level metric. As the RAG is [5] Y. Y. Haimes, Risk Modeling, Assessment, and Management, Wiley,
incomplete, we will focus on the likelihood estimation for 2009.
missing waypoints caused by losing flight control. We will [6] H. Okhravi, T. Hobson, D. Bigelow, and W. Streilein, Finding Focus in
assume a worst case scenario in which the probability of a the Blur of Moving Target Techniques, IEEE Security & Privacy,
successful main APM attack is 1. Without resiliency, it is March/April Issue, 2014.
apparent that the likelihood of mission failure is 1. In this case, [7] Secure Microkernel Project (seL4), http://ssrg.nicta.com.au/projects/
simulation shows that the drone either crashes or becomes seL4/, accessed March 7, 2016.
uncontrollable. With resiliency switched on, we can analyze [8] A Time & Space Partitioned DO-178 Level A Certifiable RTOS,
whether the system can recover from successful attacks. http://www.ddci.com/products_deos.php, accessed March 7, 2016.
Assume that a SME has estimated that the monitoring service [9] R. Khazan, R. Figueriredo, C. McLain, R. Cunningham, Securing
Communication of Dynamic Groups in Dynamic Network-Centric
has a 0.1 probability of failing to detect the attack in time, we Environments, MILCOM 2006, Washington, DC, 23 October 2006.
can propagate this to the top of the RAG and conclude that the [10] D. Utin, R. Khazan, J. Kramer, M. Vai, and D. Whelihan,
likelihood of missing waypoints is 0.1. SHAMROCK: Self Contained Cryptography and Key Management
Processor, ACM Conference on Computer and Communications
With a complete RAG, the designer can continue to Security (CCS), 2013.
estimate probabilities of other system essential function [11] What is MIL-STD-1553? http://www.milstd1553.com/, accessed March
failures and eventually, the probability of failing a mission 7, 2016.
objective. [12] D. Whelihan, K. Thurmer, and M. Vai, A Key-centric Processor
Architecture for Secure Computing, IEEE International Symposium on
V. SUMMARY AND ONGOING WORK Hardware Oriented Security and Trust (HOST), 2016.
[13] M. Vai, B. Nahill, J. Kramer, M. Geis, D. Utin, D. Whelihan, and R.
In this paper we have described a systems design approach Khazan, Secure Architecture for Embedded Systems, IEEE High
that we have been working on for the cybersecurity analysis of Performance Extreme Computing Conference (HPEC), 2015.
embedded systems. Our goal is to provide development tools [14] Simulink - Simulation and Model-Based Design, http://www.
for analyzing and demonstrating the cybersecurity of a system- mathworks.com/products/simulink/index.html?s_tid=gn_loc_drop,
under-design without complete specifics. accessed March 27, 2016.
[15] B. Vesely, Fault Tree Analysis (FTA): Concepts and Applications,
While we have demonstrated that the systems design NASA HQ, 2007. (www.hq.nasa.gov/office/codeq/risk/docs/ftacourse.
approach can produce scores to determine the contribution of pdf, accessed March 31, 2016.)
resiliency, we need to be mindful of the quasi-quantitative
nature of such scores. We recommend using these scores for
relative comparisons only. We will continue to develop and
improve the current scoring system.

Approved for Public Release; Distribution Unlimited: 88ABW-2016-2047 20160421

978-1-5090-3525-0/16/$31.00 2016 IEEE

Das könnte Ihnen auch gefallen