Beruflich Dokumente
Kultur Dokumente
ANKITA PATIDAR
HARI KRISHNA VETSA
JASMEET KAUR
JIACHANG GE
KAMALPREET KAUR
KUSHAL REDDY
MADHURI N MURTHY
CONTENTS
CONTENTS ......................................................................................................................................... 1
ABSTRACT ......................................................................................................................................... 2
1.
4.
RSA........................................................................................................................................23
REFERENCES: ....................................................................................................................................28
ABSTRACT
Malicious hackers launch thousands of distributed denial of service (DDoS) and web application
attacks each day. Application attacks can steal valuable data and the damage could be
irreplaceable.
Among the several network security threats that internet faces today, Denial of service attacks
are very severe. It is essential to have a secure channel to communicate data between various
nodes. Since, there is no clear to solution to this, this problem has troubled security architects for
over a decade. We come up with a solution to mitigate the DoS by bombarding the bad node
itself with large packets (ping of death).
We have implemented a Dual-Core and quad-threaded Processor with a DDoS detection and
Prevention. Our processor is based on RISC Architecture with 5-stage Pipeline. Nodes are
classified into authorized and Un-Authorized nodes. During the initialization phase, the IP and
MAC table is populated with the new IP addresses and the corresponding MAC addresses. Now,
when the packets arrive, the hardware accelerator (which is the IP and MAC table) is checked to
see if the packets are received from an authorized client or an IP-spoofed client. If packets are
received from authorized nodes, they are passed onto the output queues, otherwise the packet
is re-routed to a dump node. If packets are received from un-authorized nodes, i.e. If the IP is
not a part of the IP-MAC table (hardware accelerator), it is dropped. In addition to this, even if
the MAC address matches, it is checked for the MAC address and even if the IP and MAC address
do not match, the packet is dropped, since it is considered to be IP spoofed.
Figure 1: The various DDoS attacks across the globe on March 26th, 2016. (Source: Digital-attack-map)
Hence, due to this increasing statistics of Volumetric DDoS, we are concentrating on creating a
defense mechanism to identify these types of attacks and secure the server from overloading
and eventually crashing.
1.1.2 IP Spoofing
Another problem that our hardware will be addressing is that of IP Spoofing. This happens when
a bad-client tries to spoof the IP Address of another client (Could be an authorized clients IP
address) and send bad packets to congest the processor. In order to identify this, we implement
a defense mechanism. For any packet being received from either an authorized or an authorized
node, the MAC address and IP address are extracted and are compared to the predefined data in
the LUT (Look up table) in the processor. In case of any mismatch in the values, then the IP
spoofing is successfully detected and the node gets denied service.
As shown above, the incoming packets are stored in a convertible FIFO, and the IP and MAC
address is extracted from the packet, and then compared with the MAC and IP LUT. If an IP
address of the source is not found in the IP MAC table, the packet is dropped. Also, if the IP and
MAC addresses mismatch, it means that the IP of the source is spoofed and is a potential attack,
therefore, we re-route such a packet to a dump node.
The whole implementation of the system is described in the subsequent sections.
3.
In the following section, we describe the hardware system in a top-to-down fashion, describing
what happens at the top level and further, explaining all the modules in detail eventually.
a. NetFPGA
NetFPGA (Network Field Programmable Gate Array) is the major component of our hardware
system. It is a line-rate, flexible, open networking platform to develop open source hardware and
software for rapid prototyping of computer network devices. This allows users to develop designs
that are able to process packets at line-rate, a capability generally not afforded by software based
approaches.
The NetFPGA 1G specifications have been tabulated below:
In a normal communication, the server is able to serve the packets from a client and the CPU
utilization of the server is low. However, when an attacker, along with an army of bots sends
attack messages to the server, the server is unable to service the genuine messages from the
client, and crashes under the load the attack messages. The CPU utilization serves as a proof for
this, as it grows really high when several SYN packets are sent. Unlike a normal three way
handshake TCP communication, which uses the SYN, SYN-ACK and ACK, the attacker and the bot
do not wait for the SYN acknowledgement, rather they keep sending the SYN packets. So
eventually, the client is overloaded with several number of unserved packets, which will
eventually lead to the input buffer getting full and thereby, dropping off genuine packets.
4.
IMPLEMENTATION DETAILS
We are programming our NetFPGA as a dual-core quad-threaded processor. Each core is a 5-stage
pipeline consisting of the IF-ID-EX-MEM-WB stages. The five stage pipeline is explained in the
next section. The hardware router keeps track of malicious packets by checking whether the
source IP of the packet matches one of the IPs in the IP lookup table, such that it doesnt bombard
our server with malicious packets, and the malicious packet is dropped at the router itself. This
enables the servers CPU utilization to remain low such that it is able to serve the genuine packets.
In the absence of the hardware solution, the server is bombarded with malicious packets, thereby
reducing the server throughput.
4.1.2 IP Spoofing
When the other nodes spoof the source IP such that it looks like it is arriving from the trusted IPs,
we can still detect that it is a malicious packet based on the MAC address of the packet, since the
attackers generally spoof the IP of the packet, and the MAC address remains that of the evil client
itself. Such a packet is also dropped. By doing all this, we can reduce the load on the server as our
system will be placed before the destination node.
10
4.2.1 DropFIFO
DropFIFO is a convertible dual-port data memory/FIFO. It has 512 locations each are 64 bit wide.
Initially, as a packet arrives, it is stored in the DropFIFO. When the FIFO is full with the packet
contents, the FIFO sends a done signal to the processor, after which the processor starts
modifying the packet.
After the packet is decoded by the pipeline and verified by the hardware accelerator, a match
signal is sent to the processor signaling the dropFIFO whether to drop the packet or send it to
11
the output queue. The way to generate the match signal is described in the hardware
accelerator section.
4.2.2 Processor
A dual-core quad-thread processor is implemented for the hardware system. The following figure
shows the block diagram of the processor. Each core serves a packet each, and packet will be
sent to both cores using the De-Mux when it gets the signal from cores.
12
While sending packets to output queue, Mux will select the packets from one of the cores based
on the signals from the cores when processing is completed.
13
The Opcode is taken as the input into the control unit and based on the instruction to be executed
a set of control signals is generated by the Control Unit. These control signals are consumed by
other modules as and when necessary for the successful execution of the instruction.
14
15
4.3 Decryption
The packet which is received at the NetFPGA is decrypted using the symmetric key for the
corresponding nodes. The symmetric key is earlier obtained by RSA key exchange which will be
explained in following section.
The packet is first decrypted using the XOR logic employed in the hardware (symmetric key
decryption). So, once the packet is received at the destination node, the packet will be decrypted
using exchanged symmetric key.
4.4 Encryption
We load the first part of the instruction memory with the instructions that are needed to perform
the encryption of the packet which is going to be send to the server node. Once the check sum is
calculated and the packet is encrypted by XNORing with symmetric key, we modify the encrypted
packet header with the new check sum.
Bloom Filter
A Bloom filter is a space-efficient probabilistic data structure, that is used to test whether
an element is a member of a set. False positive matches are possible, but false negatives are not,
thus a Bloom filter has a 100% recall rate. In other words, a query returns either "possibly in set"
or "definitely not in set". Elements can be added to the set, but not removed. The more elements
that are added to the set, the larger the probability of false positives.
An empty Bloom filter is a bit array of m bits, all set to 0. There must also be k different hash
functions defined, each of which maps or hashes some set element to one of the m array
positions with a uniform random distribution. Typically, k is a constant, much smaller than m,
16
which is proportional to the number of elements to be added; the precise choice of k and the
constant of proportionality of m are determined by the intended false positive rate of the filter.
Here we chose our k as 3 and m as 128.
To add an element, feed it to each of the 3 hash functions to get 3 array positions. Set the bits at
all these positions to 1, as shown below.
To query for an element (test whether it is in the set), feed it to each of the 3 hash functions to
get 3 array positions. If any of the bits at these positions is 0, the element is definitely not in the
set if it were, then all the bits would have been set to 1 when it was inserted. If all are 1, then
either the element is in the set, or the bits have by chance been set to 1 during the insertion of
other elements, resulting in a false positive.
Here we have a false positive rate of 3 % which is fairly acceptable.
There are two stages in the bloom filter
1.
Initialization
2.
Query
4.6.1 Initialization
In the initialization phase we ping all the nodes in the network and thus we store all the ip and
mac addresses of the corresponding nodes in the bloom filter.
Here we concatenate the ip and mac addresses of the nodes and perform all the 3 hash
functions on the concatenated value, thus we get 3 array values and the corresponding array
elements are set to 1.
17
4.6.2 Query
So when a new packet comes in, we extract the ip and mac address from the received packet
and send to our bloom filter.
The bloom filter concatenates the received ip and mac address and perform all the three hash
functions on the concatenated value, thus again we get 3 values, using these values we access
the array elements and check whether the accessed elements of the array are set to 1.
If all the array elements are accessed are 1s we say that the packet received is from the genuine
node and send the packet to encryption module which is further sent to the destination node.
If the accessed array elements are all not ones then the received packet is from attacker, so we
drop the packet.
4.7.1 C compiler
The C compiler is required to convert the C-codes written to convert the C-code into machine
code. The compiler is implemented in two stages:
C to Assembly conversion: We use the GNU C Cross compiler to convert C code to MIPS Assembly
code. This outputs a .s file, which can be converted to our instruction sets assembly code (which
is slightly different from the MIPS code). We have written a Perl script to convert the Cross
compilers assembly output to our ISAs assembly code. This creates another .s assembly code.
Assembly to instruction code conversion: After the assembly code is created, we convert the
assembly code to binary format (using a perl script) such that the binary code can be fed into the
instruction memory. The assembly code conversion is based on the opcode and the registers to
be written and read.
18
Rs
Rt
Rd
Imm field
[31:28]
[27:23]
[22:18]
[17:13]
[12:0]
Since 15 instructions are supported by the ISA, a 4-bit opcode field is used for to specify the
instructions. There are 32 registers, so the Rs, Rt and Rd fields are 5-bit wide each.
The following are the 15 instructions supported by our ISA. All the instructions are described
below:
1. Load Word
LW R1,0001 //Load contents of location 0001 into register1
Description
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
19
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
6. LEFT_SHIFT R1 R2 R1 //R1=R1<<R2
Description
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
20
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
21
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
22
Adds two source registers and stores the result in the destination register
Operation
Rd <- Rs+Rt
Syntax
ADD Rs,Rt,Rd
Encoding
The MAKE_UP instructions above are used in the checksum calculations, these are created to
extract the 16-bit substrings out of a 64-bit data. The MAKE_UP instruction1 extracts the first
16-bits, MAKE_UP instruction2 extracts the next 16-bits [31:16] and so on. The other
instructions execute instructions like a normal MIPS architecture, like load, store, add, subtract,
xnor, xor, etc.
4.8 RSA
To exchange the symmetric key which are used for encryption of all packets for secure
communication between any two nodes. Next the RSA key setup is explained.
Step1: Generating public and private key at each node
Each user generates a public/private key pair by selecting two large primes p and q
and computing their system modulus N=p.q. Next we compute (N)=(p-1)(q-1)
Now we select random encryption key e where 1<e<(N) and gcd(e,(N))=1
Solve following equation to find decryption key
e.d=1 mod (N) and 0dN
Publish their public encryption key: KU={e,N}
Keep secret private decryption key: KR={d,p,q}
23
In this way, the symmetric key is exchanged between any two nodes, such that no
attack can obtain the key.
RSA example: At the node n0, the RSA key is generated and message (symmetric key) is
encrypted using the generated key.
1.
2.
Compute n = pq =1711=187
3.
Compute (n)=(p1)(q-1)=1610=160
4.
5.
Determine d: de=1 mod 160 and d < 160 Value is d=23 since 237=161= 10160+1
6.
7.
24
5.Benchmarking
To compare the implemented hardware against the available software solutions, we use two
parameters namely CPU utilization, latency and data handling (throughput).
We used an open source DDoS attack generator and detection tool developed by UCLA students
and compared it against our hardware. The source code link can be found here:
https://github.com/kenzshi/DDoSProject
In the source code mentioned above, the topology consists of four main nodes: server (receiving
traffic), client (sending good traffic), master (sending malicious packets) and bots (slaves that
work for master and help in sending malicious packets). When the attack is generated, the server
is flooded with malicious packets such that it is unable to serve the good packets. The DDoS attack
is generated in the DeterLab topology as described in section 3.2, and also shown below. The
attack is generated from the master node and the bot node helps it generates DDoS attack traffic.
After the implementation of the software mitigation, TCP
dump is monitored at the server node and the following
parameters are extracted and compared with the
parameters obtained with the hardware. The software also
analyzes the tcpdump to check whether any packet is a
possible DDoS attack or not.
Advantages of the hardware solution over the software
solution:
25
22%
17.14%
3.3%
CPU Utilization
30%
20%
10%
0%
H/w mitigation
DDoS Attack
S/w mitigation
H/w mitigation
26
5.3 Latency
Latency is measured according to RTT (return trip time). It is the elapsed time between the end
of an inquiry or demand on a computer system and the beginning of a response. This is calculated
using the ping RTT of the server.
Following snapshot shows the ping time of the software mitigation on the left and the hardware
mitigation on the right.
Latency (RTT)
25
20
15
10
5
0
Latency
S/W mitigation
H/w mitigation
So the parameters above show that the hardware solution is much better than the software
solution.
5.4 Throughput
Throughput is the rate of sending or receiving of data by a network. It is a good measure of the
channel capacity of a communication link, and connections to the internet. When attack is
launched , the number of clients which have completed three way handshake reduces than the
maximum throughput as both legitimate and attack traffic both are received at server. We have
observed the throughputs at different attacking speeds. We have measured the throughput at
window size (256Kbps)
Attacking traffic (Packets per
second)
Hardware
Throughput(Mbps)
Software Throughput(Mbps)
2000
867.6
805
6000
433.12
367.3
27
Team Members:
Ankita Patidar: apatidar@usc.edu
Hari Krishna Vetsa: vetsa@usc.edu
Jasmeet Kaur: jasmeetk@usc.edu
Jiachang Ge: jiachang@usc.edu
Kamalpreet Kaur: kamalprk@usc.edu
Kushal Reddy: chennare@usc.edu
Madhuri Murthy: mnmurthy@usc.edu
REFERENCES
1.
2.
3.
4.
5.
6.
7.
28