P 2 Psharing

Detecting P2P dataleakage
Attilla de Groot attilla.degroot@os3.nl

Jarno van de Moosdijk jarno.vandemoosdijk@os3.nl
Willem Toorop willem.toorop@os3.nl
Stefan Roelofs stefan.roelofs@os3.nl
December 23, 2008
Abstract
This report gives an overview of the current P2P (peer to peer) net-
works and clients, and presents an analyses where the current threats
of data leakage on these networks exist. It also outlines a proof of
concept for the listing of locally shared files on Gnutella and eDonkey
networks. This is done through passive deep packet inspection and
actively querying clients.
1
Contents
1 Introduction 3
2 Peer to Peer Networks 4

2.1 Bittorrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Centralized Bittorrent . . . . . . . . . . . . . . . . . . 4
2.1.2 Decentralized Bittorrent . . . . . . . . . . . . . . . . . 4
2.1.3 Bittorrent and data leakage . . . . . . . . . . . . . . . 5
2.2 Edonkey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Gnutella . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Peer to Peer Clients 9

3.1 Bearshare and Imesh . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 LimeWire and FrostWire . . . . . . . . . . . . . . . . . . . . . 9
3.3 Shareaza . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 eMule, aMule, xMule . . . . . . . . . . . . . . . . . . . . . . . 11
4 Finding sensitive data 11
5 Bugs, viruses & rootkits 13

5.1 Bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Viruses & rootkits . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6 Monitoring 15
6.1 Existing Software . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.2 Active and Passive monitoring . . . . . . . . . . . . . . . . . 16
6.2.1 eDonkey . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2.1.1 A hybrid network . . . . . . . . . . . . . . . 18
6.2.1.2 Offering files . . . . . . . . . . . . . . . . . . 19
6.2.1.3 eDonkey in action . . . . . . . . . . . . . . . 19
6.2.1.4 What do you share? . . . . . . . . . . . . . . 21
6.2.1.5 The gory details . . . . . . . . . . . . . . . . 22
6.2.1.6 Actually sniffing . . . . . . . . . . . . . . . . 24
6.2.2 Gnutella . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2.2.1 Pure P2P opens up a protocol . . . . . . . . 26
6.2.2.2 Detection of Gnutella clients . . . . . . . . . 27
6.2.2.3 The Browse Host Extension . . . . . . . . . 28
7 Conclusion 29
7.1 Further research . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2
1 Introduction
This report is written for the course Security of Systems & Networks (SSN)
for the System & Network Engineering study at the University of Ams-
terdam. After Ian Cook’s presentation (P2P Risks) at the SURFcert IBO in
October 2008 we investigated possible ways to detect sensitive information
sharing on peer-to-peer (P2P) networks. During four weeks of research, we
tried to supply an answer to the following research question:
”What is the best solution to detect (malware influenced) P2P users on the local
network that unknowingly share sensitive data?”
In this report, you will find information on the general working of P2P net-
works and their clients. Also we will have a quick look at some viruses,
bugs and rootkits who may influence the standard behaviour of clients.
After this the monitoring of shared files on a local network is outlined.
This will be investigated through the use of existing software, passive deep
packet inspection and the active querying of clients.
During this research we have experienced that the security community is

not very helpful on sharing information about this subject. Questions to Ian
Cook and Arjen Landgraaf - speakers at the SURFcert IBO - were dismissed
and requests to try software for academic purposes were ignored. Also, so
called ”P2P Forensic” discussion groups on the Internet were not willing to
let us contribute to their discussion. Nevertheless, we would like to thank
Jaap van Ginkel for his ideas and general opinion about the subject.
3
2 Peer to Peer Networks

To detect peer to peer data leakage we have to know how the protocols
work according to the specifications. In this section we will outline the
general behaviour of P2P (peer to peer) networks. Since this P2P data leak-
age project has to be research within a timeframe of four weeks we have
decided to use the most popular protocols.
At this moment there are dozens of p2p protocols available and are being
used on a daily basis. Most protocols use random tcp or udp ports for com-
munication. Without deep packet analysis at an internet exchange or large
ISP it is not possible to get reliable statistics of protocol usage. However if
we look at client usage we can conclude that the networks that these clients
support are also the most popular ones. This is the reason why we have
chosen to analyse the Bittorrent, eDonkey and Gnutella protocols.
2.1 Bittorrent
The bittorrent network has grown very popular over the last years. This
is largely due to the fact that every client concurrently uploades chunks of
data when downloading data from other peer to peer clients. This provides
a way to distribute files without huge bandwidth demands. The protocol
is also used for distributing software and even updates for games such as
World of Warcraft.
The bittorrent protocol can be used in two different ways to distribute data.
• Centralized with tracker system as described in the original protocol
specification[1]
• Decentralized by using a distributed hash table[2].
2.1.1 Centralized Bittorrent

To share a file though a certain tracker a user has to create a torrent file with
metadata about the files he wants to share. The generated file contains a
hash of the file(s) to be shared and the url of the centralized tracker that
will be used. The tracker keeps track of the clients that are sharing the
datablocks that are specified in the torrent file as shown in figure 1.
The torrent file is normally distributed through a website. When this tor-
rent file is downloaded, the client can do a lookup for other nodes at the
tracker. With this information clients can exchange blocks of data untill
they have received the complete file.
2.1.2 Decentralized Bittorrent

With a decentralized structure every node in the network effectively be-
comes a tracker. Every node in the network has a unique id chosen from a
4
Figure 1: Centralized bittorrent structure
160bit space. File id’s and node id’s are generated from the same keyspace.
A node is responsible for the file id’s close to his node id (basic chord sys-
tem as in figure 2). The network id’s are compared with a distance metric
to determine what nodes are close in the network. A node has more infor-
mation about nodes that are close then when they are far away from the
overlay network perspective.
When a node wants download a file, he contacts a node that he knows is
close to the id. He does this by an iterative search over the nodes. He will
get a reply from the node responsible for the file, containing a list of peers
that are sharing the requested file.
Figure 2: Chord system
2.1.3 Bittorrent and data leakage

When using the bittorrent network to share a file, a user has to manually
generate a file containing the needed metadata. Due to this, it is highly
unlikely that users unknowingly share data. Since we are focussing our
5
research on unkowningly sharing sensitive data, we have decided not to

include bittorrent in our research.
2.2 Edonkey
The edonkey system uses centralized servers to distribute data. A client
shares a number of directories and files. These files are hashed with an
MD4 hash function so that they are uniquely identifyable in the network.
A list of these files is transferred to the server (see figure 3). The server
keeps track of all files that the nodes share. This might be an option to
detect data leakage that is further discussed in section 6.
Figure 3: Edonkey filelist
To download a file from the edonkey network a url[4] containing the file-
name and the hash is used. The client queries the server for a list of nodes
that share a specific file (see figure 4). The client is then able to contact other
nodes to download the blocks.
From a network perspective there aren’t any measures that prevent data
leakage. Sharing of files is left totally to the clients. It isn’t required for a
user to manually create metadata file as is the case with bittorrent.
2.3 Gnutella
There are no central servers within the Gnutella network, because Gnutella
is a fully decentralised peer to peer network. However some nodes on
6
Figure 4: Edonkey search and download scheme
highspeed connections are designated as an ultra-peer(see figure 5). These

ultra-peers are usually clients on high-bandwidth connections.
As with eDonkey, a client shares files. Since there are no centralized servers,
no filelist is transferred over the network. Instead when a client searches
for a file it sends a search query to multiple other nodes. These nodes will
in turn contact other nodes to relay the search query. When a node receives
a query it will directly contact the sender with a response to the query. If a
client responds to a search query a node can download the file with a direct
http connection. A file is never transferred over the gnutella network itself.
Search queries and http sessions contain filenames. With deep packet in-
spection it is possible to capture these filenames. This means that every
gnutella packet has to be inspected. This would be very inefficient.
However, the queries give the searching client information to contact a file
providing client.
Pong (0x01)[5]
Pong messages contain information about a Gnutella host. The

message has the following fields:
Bytes Description
0-1 Port number.
2-5 IP Address.
7
Figure 5: Ultrapeer and leafnodes
When looking at the clients, it appears that they have a function that can
request file listings of other clients in the network. This function sends an
http GET request to the other client. It uses the ip address and port number
supplied by the pong packet.
8
3 Peer to Peer Clients

Data is being shared through client applications. Most applications of to-
day contain features which should save you from sharing sensitive data.
We tested 6 popular p2p applications that support one or more of the net-
works mentioned before. We selected the clients based several articles on
popularity of p2p clients [14], [17], [15], and the download statistics of week
47 on download.com.
3.1 Bearshare and Imesh

The clients Bearshare and Imesh look very similar. They only run on Win-
dows systems. Both applications support the Gnutella network. You will
not be able to share any sensitive company data with them. It is only pos-
sible to share audio and video files. The clients don’t contain any options
that make it possible to share documents or other file types.
3.2 LimeWire and FrostWire

LimeWire and Frostwire are both java clients. FrostWire is a fork of LimeWire.
Both applications support the BitTorrent and the Gnutella network. You are
able to share audio, video, images, archives and filetypes such as doc, pdf,
html, dvi and txt with them. Since version 4.0, both applications have a
new option on board called “Do not share sensitive file types”. This option
is enabled by default. It grays out the option of sharing .doc, .pdf and rtf
files as can be seen in 6. You are able to edit which extensions are shared.
9
Figure 6: Do not share sensitive file types
3.3 Shareaza
With the Windows only application Shareaza, you are able to share on and
download from the networks: eDonkey, Gnutella and BitTorrent. Audio,
video, images, archives, pdf and txt files are shared by default. Shareaza
has an extension block list (see 7) of files that will never be shared. This
list contains file types like pst. Pst stands for Personal Storage Table and
is used by Microsoft Outlook to store all your e-mails, agenda items and
contacts. You are able to edit this extension list yourself.
Figure 7: Shareaza shared extensions
10
3.4 eMule, aMule, xMule

eMule is a very popular client that supports the eDonkey network. There
are many eMule-clones on the Internet like aMule and xMule. This because
eMule only runs on Windows systems. The interface and default settings
of the mentioned clients are all the same. This is a very dangerous family
of clients because there are no restrictions on which file types that are being
shared. None of the clients contain settings that give you the ability to block
the sharing of a specific file extension. Due to this, there is a high potential
of sharing sensitive data by accident.
The folder where completed downloads are stored is shared automatically.
If people select their “My Documents” folder to store completed down-
loads, they automatically share all their documents. People often don’t
realize this.
4 Finding sensitive data

During our tests we tried to find sensitive data on the Gnutella and the
eDonkey network. We didn’t succeed in finding any sensitive data on the
Gnutella network, however, the eDonkey network was full of it. We were
able to download recent copies of full mailboxes (see 8) from this network.
They contained all the persons e-mail and agenda items up to a few days
back. It was fairly easy to retrieve passwords for various services from
these mailboxes. This because most websites send a confirmation e-mail
containing the username and password of the individual registering on it.
Some websites send e-mails containing deep-links which give you the abil-
ity to automatically log in to the website. This can be abused easily.
Figure 8: eMule search results
Sharing your mailbox backup by accident isn’t the only way of losing sen-
sitive data. A lot of people share their whole “My Documents” folder by
accident, as mentioned before. Due to this we were able to download full
11
address books, various contracts and password administrations of several

people (see 9).
Figure 9: Shareaza search results
12
5 Bugs, viruses & rootkits

In this section we will discuss bugs and viruses related to P2P clients be-
cause these may influence the standard behaviour of clients.
5.1 Bugs
In 2005 two bugs were found in LimeWire [6] which made it possible to ex-
ecute malicious requests on the affected clients. The first bug in LimeWire
releases 4.1.2 through 4.5.6 made it possible to send an HTTP ”get” request
to request any file from any location accessible by the remote user. This
request resulted in the client sending the file over the network to the re-
questing party.
The second bug in LimeWire releases 3.9.6 through 4.6.0 made the same
thing possible through MAGNET requests. Both bugs were effective on
Microsoft Windows and Linux operating systems but are fixed on newer
releases of LimeWire.
Since LimeWire lacks an automatic update mechanism, users of these old
versions are still susceptible to these bugs and for our data leakage re-
search, must be protected. As described in the monitoring section (6]) of
this report, it is possible to do deep packet inspection on LimeWire pack-
ets. In the packets, LimeWire advertises its version number. Based on the
bug affected version numbers it is then possible to alert system adminis-
trators or users to upgrade their client(s). Off course this is also possible
by other means such as software policies when a client computer is under
(own) administrative jurisdiction.
5.2 Viruses & rootkits

Following are a couple of examples of viruses and rootkits concerning P2P
software, since a complete listing of all historical P2P rootkits and viruses
goes beyond the scope of this report.
In 2006, alternate copies of the BitTorrent client [7] were downloaded from
IRC servers [8] by control of the lockx.exe rootkit. These clients used Bit-
Torrent to push Disney movies on to ”victims” computers.
We believe that this could be a future warning by hackers that they may
use BitTorrent in the near future to distribute malicious code to victims
computers. In this specific case it also brings copyright infringements from
copyright protection groups.
Also in 2005, W32/Tibick-E was released [9]. This Microsoft Windows
worm spreads itself through P2P systems by editing several registry en-
tries to share its main executable. It also changes its main executable name
to several well known programs.
13
5.3 Future
By looking at the rootkits and worms we can predict a future case where a
worm comes in through mimicking popular files on P2P clients. If a user
unknowingly downloads and executes an infected file it could be possible
for the worm to edit registry entries or configuration files of P2P clients.
During this research we took a closer look at registry entries and configu-
ration files of P2P clients and by editing these it is possible to share more
folders and extensions than by default. Since most hacker groups have no
profit in sharing somebody’s files on public networks it is likely that they
will use a private BitTorrent tracker to share interesting files. This conclu-
sion supports the fact that we need a mechanism to watch which files are
shared by clients on the local network since the default behaviour of the
clients can be easily influenced.
14
6 Monitoring
In this section we will discuss different monitoring strategies to list the
sharing of files on a local network. Our approaches to monitoring are with
use of existing software, passive with deep packet inspection (eDonkey)
and active through the querying of clients (Gnutella). In this section, a ba-
Figure 10: basic network layout.
sic network layout is assumed as a starting point. In figure 10 you see local
peer to peer clients, an internet gateway with a listening server which sees
all internet traffic and several nodes on the internet. The function of these
internet nodes will be explained during this section.
6.1 Existing Software

Existing software on the market for monitoring P2P traffic is ”Forensic P2P”
by Spear Forensics [10] and an in-house developed ”P2PMon” service [12]
from e-secure-it [11], an IT security firm of Arjen Landgraaf. Unfortunately
we could not test both pieces of software ourselves because commercial li-
censes for Forensic P2P cost $350,- and Spear Forensics was not willing to
share it for academic purposes. Also Arjen Landgraaf was not willing to
help us test his software.
However, according to a review published by 8-bits [13] and the Spear
Forensics website [10] we can conclude that Forensic P2P is nothing more
than a customised P2P client for the Gnutella network. The application has
15
no special features which can be used to list or scan (a range of) nodes.
The e-secure-it solution is a ”Peer2Peer Monitoring and Surveillance Ser-
Figure 11: e-secure-it P2PMon.
vice” [12] solely based on the Gnutella network. As can be seen on fig-
ure 11, with the open-source Gnutella client Phex all files shared on the
Gnutella network are being harvested and saved in a large database. Through
a web based system it is possible to search or auto generate reports on
certain keywords. This service is meant as an early warning system for
companies to detect if internal or customer information is publicly avail-
able on the Gnutella network. According to Arjen Landgraaf the service
is very labour intensive because the system gives many false positives on
keywords. On our question why e-secure-it only harvests the Gnutella net-
work he wouldn’t give us an answer. This question was specially bothering
us because we find the eDonkey clients more susceptible to data leakage.
The e-secure-it system has no build-in capabilities to scan (a range of) nodes
on their active shared file list and couldn’t therefore be of any help to us.
6.2 Active and Passive monitoring

To deal with data leakage from P2P clients in a corporate network, one
might try to block P2P protocols completely. However this is not possible
with reasonable measures. The network ports used on P2P networks are
not fixed numbers, and often chosen randomly at client installation. Even
with a firewall that blocks everything except for some well known network
services, it is still possible for a P2P client to use for example TCP port
443 (https) or another port using an encrypted channel. On such channels
16
the firewall has no means to detect if the communication is in fact https or

something else.
Only when restricting everything to extremes (e.g. only allowing traffic to
certain sites and only through content validating proxies) one could possi-
bly rule out P2P clients all together.
Also when restricting so rigorously, one would put a very heavy load on
the restricting equipment. For large organizations this might simply be
impossible or not worth the price. Rumour has it that the Hogeschool van
Amsterdam tried to block known default P2P ports, but they immediately
allowed them again, because of their routers going haywire.
It might just be better to allow P2P clients, while trying to detect what files
are going out. Or even better, try to detect what files are offered by the
clients.
In this section we will have a look at what possibilities there are for deep
packet inspection and actively quering clients who offer P2P files on a cor-
porate network. To gain this insight we have to look at certain details of the
specific P2P-networks. Two P2P-networks will be looked at, the eDonkey
network and the Gnutella network. We will discover that these need two
different approaches to inventorize the shared files.
1. Shared files on the eDonkey network can be discovered by sniffing

the outgoing connections.
2. Just sniffing on the Gnutella network is not enough. We will have to

discover the clients by sniffing and then query them for there shared
files.
We will call those two methods passive and active detection respectively.
Throughout this document the peers that are on the corporate network will
be referred to with the term client and the peers on the internet with peers.
6.2.1 eDonkey
The eDonkey network was originally created by the MetaMachine Corpo-
ration. On September 28 MetaMachine went out of business, after they lost
a case against the RIAA1 . However independently from MetaMachine a
closed-source but free eDonkey server, eserver, and a GPL-licenced client,
eMule, were still actively developed. The eDonkey network survived and
is still very lively today.
As a consequence of this, there is no official eDonkey maintainer or speci-
fication. It is as the wikipedia entry on the eDonkey network states:
The eD2k protocol is not formally documented (especially in

its current extended state), and it can be said that in practice the
1
RIAA - Recording Industry Association of America
17
eD2k protocol is what eMule and eserver do together when run-

ning, and also how eMule clients communicate among them-
selves. As eMule is open source, its code is freely available
for peer-review of the workings of the protocol (at the program
source code level).
We have not investigated the source code of eMule, but we did find some
excellent referencesi [19][18] on how the protocol operates. However, the
documents are somewhat outdated, and we did have to reverse engineer
certain aspect of the protocol.
Also: the following paragraphs are not a complete description of the eDon-
key protocol. We only describe the parts that are relevant to our research.
Server Server Server
↓Are you Here is your

↑Hi! ↑Sure! ↓
reachable? High ID.
Client Client Client
Figure 12: Initial connection
6.2.1.1 A hybrid network The eDonkey network is a so called hybrid

network. This means that hosts participating in the network have two dis-
tinct roles: client and server. The client role is typically the role of the
P2P client software which people use to share and search for files. The
servers are the ones inventorizing the files offered by the clients and refer-
ring clients searching for a specific file to the clients offering this file. The
server software is typically run by people having the eDonkey at heart, and
having good internet connectivity at hand.
In figure 12 we can see a client connecting to a server. The server then tries
to connect back to the client. If this succeeds, this means that the client
is accessible from the internet and is not behind a connection-tracking or
masquerading packet filter (firewall) and can in fact accept incoming con-
nections. This is good for any P2P network, because all other peers (filtered
or not) can connect to this peer directly to fetch a file. We will elaborate on
why this is good a bit more later. For now it will suffice to state that the
unfiltered client is rewarded by the server with a High ID. A High ID gives
the client unrestricted access to the eDonkey network.
A High ID is the clients IP address in big-endian order (IP address A.B.C.D
will become 0D 1C 2B 3A in bytes). This means that a connectable client al-
ways gets the same High ID, as long as its IP address is not changed, no
18
matter what server it is connected on. This makes High ID clients a stable
factor in the eDonkey network. It heightens the chance that a client, that
knows a file is on a certain peer with an High ID, can download or continue
to download it at a later moment. A Low ID is always a number lower then
16777216 (0x10000000) and varies per server.
This initial connection is a TCP connection and stays open throughout the time
the client is on the eDonkey network. This allows the server to contact a client
even if it does not allow incoming connections (and thus has a Low ID)
and through the server this initiative is available to all other peers. In the
figures this lifeline is drawn by a thick red line.
Server Server Server Server
t
es
st
I have I know about

ue
request
qu
status
↑ ↓
q
re
re
us
us
these files these other servers
at
at
st
st
Client Client
Figure 13: Startup
6.2.1.2 Offering files Immediately after the initial connection, the client
sends a list of files it has to offer to the server. This is a great opportunity
to inventorize the offered files of a P2P client for the eDonkey network on
a corporate network, and we will investigate how this can be done exactly
later. For now it suffices to say, that this list consists of the filenames and
the file hashes (MD4), uniquely identifying the content of the files. The file
hashes enable peers to find more peers that offer a certain file, even if it has
a different filename.
The server replies with a list of other servers it knows about. The client
completes (if necessary) its own list of servers. The client does not connect
to these other servers with a TCP session (as with the initial connection),
but it does check if they are alive by regularly sending status-requests over
UDP. See figure 13. If a server fails to answer those status-requests after a
certain limit, the server is considered dead and removed from the clients
server list.
6.2.1.3 eDonkey in action The client is now ready to seek and fetch files.
It queries its server, the server replies with the available sources that match
the search criteria. A human selects the sources it wishes to download, and
the client then asks the server which peers can offer these sources.
19
Peer Peer Peer

A B C
Server
Server
D
I’m looking for You can find
↑ ↓
these files them here
Client
Figure 14: Search for files
This transaction is shortened to just one question and one reply in figure 14.
In this figure the server tells the client that the sources are available at peers
A, B and C. The diode symbols, , are used to indicate that peers A and B
are unable to receive incoming connections. We will see the consequences
of that in the next figure.
These transactions are not suitable for detecting outgoing files. The re-
quests for files on the corporate clients are sent to a server which is not
on the corporate network. Even when running a eDonkey server inside the
corporate network, there is no way to force the clients to connect with this
server.
It also seems that an eDonkey server has no insight in files shared by clients
on other servers. Although we have not looked into this, the greatly vary-
ing number of files per server is an indication of this[20]. If it would know
about files on other servers, running your own server would be a way to
determine what files are shared from your corporate network, because all
files shared on the eDonkey network would be known by it.
In figure 15 we see the client actually fetching the file from the peers. Ap-
parently it is a big file, because it requests different parts of the file from
different peers. Big files in eDonkey are split up in parts of 9.28MB. Hashes
are calculated over those parts by the.peer offering the file, and provided
to the server during its startup (See figure 13). Clients can then retrieve
different parts of the file from different peers at the same time, to improve
download speed.
Peer C is connectable from the client, therefor the client fetches the part
from that peer immediately. Because peer A and peer B are not directly
connectable (they have a Low ID), the client has to ask the servers on which
they are connected to ask the peers to call the client back. The servers can do
this through the initial TCP connection from the peers (the lifeline) which
20
Peer Peer Peer

A ←
B ←
C
ca ca
ll ll
cli clie
en nt
t
Server fetch
Server ↑
←
D part
fetc
hp
←
↑Ask B to
art
fetch A
↑ sk
part ca A call me
ll to
m
e
Client
Figure 15: Fetch the file-parts from the sources
stays open throughout the time the peers are on the eDonkey network. Our
client is directly connectable from the internet (the client has a High ID), so
the peers can connect to the client directly. Peer A and peer B initiate a TCP
connection to the client, and the client sends the “fetch part” requests to
those peers over those connections.
It is clear that with this mechanism it is not possible for a peer with a Low
ID to fetch files (or parts) from other peers with a Low ID. It used to be
so, that two peers with a Low ID on the same server could still - via the
server - fetch files from each other, but most servers no longer support this
because of the overhead this inflicts on the server. Interchanging files from
two peers with a Low ID on different servers was never possible. This is
one of the major restrictions a peer with a Low ID has.
The added value of this to our detection process is that: if we know what
files are shared by peers on our corporate network, and we know what
hashes they are identified by, we can determine when a certain offered file
is actually downloaded. It might be good to know that when a certain file
containing sensitive information is offered, it is not actually leaked by the
time this fact is observed. And when it is actually leaked, measurements
can be taken to minimize the damage.
6.2.1.4 What do you share? There is one other interesting feature in eDon-
key that deserves our attention: One can ask a peer what files it shares (fig-
ure 16). Intercepting and interpreting the offered files during startup (see
figure 13), is not that easy, as we will see later. It might be easier to just
detect what eDonkey clients are on the corporate network and actively ask
them what files they share. However, in the investigation of the different
eDonkey client software (see 3) it was not possible to list the shared files on
21
Peer
Server
C
↓those
What files do
↑
you share? Client
Figure 16: Asking what files a client shares
a peer without having authorization of that peer. The authorization process

also required a Captcha-test, which makes it impossible to automate. For
this reason we did not investigate this scheme.
Another difficulty that might arise is that when the corporate eDonkey
clients have a Low ID, they might not be open for connections. They would
provide files to other peers through TCP sessions which they initiate them-
selves on request. An actively scanning detection program might have to
ask the server to which the clients are connected to call back.
Further research should be done to be able to rule this scheme out with
complete certainty.
0 1 2 3 4 5
0xE3
or Message size (-5) 0x15
0xD4
Data
6 7 8 9
Number of files
file1: hash, filename, ...

...
Figure 17: I have these files
6.2.1.5 The gory details What does the “I have these files” message in
figure 13 look like exactly? In figure 6.2.1.5 is a representation of the struc-
ture of this message. The squares with numbers above them, are the in-
dexed bytes of the packet. The first byte defines the protocol. 0xE3 means
eDonkey protocol. 0xD4 means eMule compressed data. When this is the
22
first byte, the Data block should be decompressed (using the deflate algo-
rithm, See RFC1951).
Bytes 1 until 4 countain the message size excluding the first five bytes. Be-
cause the Data block starts on byte 6, size of the Data block is thus, message
size minus one. The message size is in little-endian format, which is not the
conventional network byte order.
The sixth byte (at position 5) is the message type. 0x15 means this message
is a “Offer files” message.
The (if necessary deflated) Data block contains the summary of the offered
files. The first four bytes contain the “Number of files” in the list (also
in little-endian format). Following that are “Number of files” times a File
entry.
0 1 2 3 4 5 6 7
MD4 Hash of the file’s content

8 9 10 11 12 13 14 15
16 17 18 19 20 21
Client ID TCP Port

Tags
6 7 8 9
Tag count
File Name Tag
File Size Tag
Tag
Tag
...
Figure 18: A file entry
A file entry, as shown in figure 18, consists of a 16 bytes long MD4 hash
‘uniquely’ identifying the content of the file, followed by the Client ID and
TCP Port. When the client has a Low ID, the TCP Port is 0, otherwise it is
the port on which the peer can receive connections on. The IP address of
a High ID client is deducible from the client ID, as we saw in paragraph
6.2.1.1 A hybrid network. Much like the Data block in figure , the Tag block
consists of 4 bytes specifying the number of tags, followed by those tags.
0 1
Type Name Value
A tag has the format . The second byte is the name
of the tag, represented by a number. A value of 0x01 would indicate that
the Value is a filename; 0x02 that the Value is the filesize; 0x03 the file
type (Audio, Video, Image, Doc, etc.). Many more of those names are pre-
defined.
23
Value of Type
nibble 1 nibble 2 Format of Value
0
0x8 0x9 Integer
0 1
0x8 0x8 Integer
0 1 2 3
0x8 0x3 Integer
0x9 Size String
0 1
0x8 0x2 Size String
Figure 19: Tag types
The first byte indicates the type of the value. This would be <String> in
case of a filename and file type, and <Integer> in case of a filesize. A few
more types are defined. Interestingly enough, we have seen two different
kind of <String>s and three different kind of <Integer>s. Those types
were not in the documents describing the protocol and we had to figure
them out by reverse engineering.
Figure 19 is a table of the different tag-types we have been able to identify.
The Type byte is split in to two nibbles. We have done this because of the
especially quirky string type where nibble 1 has value 0x9. The Size of the
string is there specified by the second nibble.
6.2.1.6 Actually sniffing There are a few things that have to be consid-
ered when actually sniffing for the “Offer files” message.
The message is in the middle of a TCP session. When sniffing a ‘running’
TCP session, first one has to make sure the packets are defragmented, so we
can interpret whole packets. Only then we will be able to ‘see’ the packet
that starts with 0xE3 or 0xD4 and has 0x15 as its sixth byte.
The message size (bytes 1 to 4) indicates the size of the message. This might
be greater then the maximum size for the data in the TCP packet. When
the size is greater then the TCP-packet data-size, we have to collect more
TCP packets and append the data to the message, until it has the indicated
size. To do this correctly we have to take the TCP sequence numbers into
account. The TCP-packets should be interpreted in the right order!
Snort is the de facto program for intrusion detection. It does this using
24
deep packet inspection. Deep packet inspection is Snorts speciality. Could

we use Snort to capture the “Offer files” message?
Snort can do packet-defragmentation and TCP-reassembly. It realises this
via so called “preprocessors”. The preprocessor for defragmentation is
called Frag3. There are two TCP-reassembling preprocessors: Stream4 and
Stream5. The difference is that Stream5 takes into account the platform
the TCP session is targeted for. This solves issues with intentionally mal-
formed TCP-packets to screw up certain operating systems and is of non
importance of our research.
Because TCP-reassembly is a memory and CPU intensive operation, snort
by default only reassemble TCP-packets to certain TCP-ports. P2P proto-
cols do not use statically defined port numbers. The TCP-sessions can be
to any TCP destination port. Therefor we have to alter the default config-
uration of the preprocessor and let it reassemble TCP-sessions to all TCP
ports. Snort is also unfortunately only able to apply the preprocessor to
all seen packets. It is not possible to enable TCP-reassembly based on the
deep packet inspection rules evaluation. For that reason it would be advis-
able to have a machine dedicated to sniffing P2P traffic and not use a Snort
instance that is already used for intrusion detection.
Listing 1: Snort configuration
1
2 p r e p r o c e s s o r f r a g 3 g l o b a l : max frags 65536
3 preprocessor frag3 engine : policy f i r s t detect anomalies
4
5 p r e p r o c e s s o r s t r e a m 5 g l o b a l : t r a c k t c p yes
6 preprocessor stream5 tcp : ports server a l l
7
8 a l e r t t c p $HOME NET any −> $EXTERNAL NET any (
9 msg : ” P2P eDonkey O f f e r F i l e s ” ;
10
11 c o n t e n t : ” | E3 | ” ; depth : 1 ;
12 c o n t e n t : ” | 1 5 | ” ; depth : 1 ; o f f s e t : 5 ;
13
14 session : a l l ;
15 l o g t o : ” / var/l o g/ed2k . s e s s i o n s . l o g ” ;
16
17 s i d :<a unique r u l e i d e n t i f i e r >;
18 )
In listing 1 is shown how snort could be configured to detect “Offer Files”
messages. On line 2 to 6 we tell Snort to use the Frag3 and Stream5 prepro-
cessors. On line 6 we say we want to reassemble TCP sessions to any TCP
destination port. Line 8 states it is an outgoing session. With line 14 and 15
we indicate that we wish to log the complete session to /var/log/ed2k.sessions.log.
25
Line 11 and 12 detect the byte values that identify the “Offer Files” mes-
sage. Note that this would probably trigger a lot of false positives, and for
all of those the complete TCP session will be logged to /var/log/ed2k.sessions.log.
There is no way to configure Snort so it logs only the complete “Offer files”
message. We can either, log the packet or log the complete TCP session.
Snort can not interpret the “Message size” bytes as a little-endian integer
and read so much more of the packet.
The /var/log/ed2k.sessions.log should later be interpreted by an
program to extract the actual offered files, and identify which clients of-
fered them.
Snort is obviously not very suitable for doing this type of information ex-
traction. Snort does offer an API to create your own modules. The best way
to passively detect eDonkey “Offer Files” messages would probably be by
an specifically for this purpose written snort module.
We did not do this in our research. In stead, we wrote a Python script which
does the interpretation of the “Offer Files” message. By making use of the
Python bindings to the libpcap2 library, pcapy3 , we empowered the script
to actually capture the message as well. This is easier and more convenient
then interpreting snort log files, but we have not taken packet defragmen-
tation or TCP sequences into account. We assessed that unnecessary for a
“Proof-Of-Concept” script. The script is in Appendix 7.1.
6.2.2 Gnutella
6.2.2.1 Pure P2P opens up a protocol In contradiction to eDonkey, Gnutella
does not use central servers. It is a so called Pure P2P protocol. There are
no central servers involved in Gnutella communication. The network con-
sists of peers only. The eserver (eDonkey server software) was not open
source because it would, according to the author: “be used to build fake
servers[26].” The author is referring here to the attempts by the record-
ing industry to pollute the P2P networks with false files, to demotivate the
sharing of copyrighted material.
The fully distributed nature of Gnutella makes it less vulnerable to legal
threats. This makes Gnutella very popular, and many open source clients
have arisen using the protocol. It is in the advantage of those clients that
the protocol is well defined. Unlike the eDonkey protocol, there is excellent
up-to-date documentation on the workings of Gnutella[5]. The RFC is dated
2002 but, as described in de draft, the protocol can be extended through a
documenten extension mechanism. The recent development of new exten-
sions and other bells and whistles, are discussed in the “Gnutella Devel-
opment Forum”[23] An excellent overview and description of all the new
2
http://www.tcpdump.org/
3
http://oss.coresecurity.com/projects/pcapy.html
26
developments can be found on the Wiki of the developers of the LimeWire

Gnutella client[24].
It is accustomed in Gnutella protocol description to use the term Servent
for a Gnutella client, and we will conform to this term to indicate the client
on the corporate network. The servents the client is connecting to, will be
referred to (as before) as peers.
6.2.2.2 Detection of Gnutella clients At startup a servent connects to a

initial peer. The discovery of the host address of the initial peer is called
bootstrapping. Bootstrapping is not part of the RFC-draft. There are many
different ways to realise the discovery of the initial peer[21]. None of those
methods are of relevance to our research, and therefor we will not discuss
them here.
Peer
GNUTELLA CONNECT/0.6
GNUTELLA/0.6 200 OK
User-Agent: LimeWire
User-Agent:
Listen-IP:
145.100.104.196:9253
↑ ↓ Bearshare
←-
←-
Servant
Figure 20: The initial handshake
Interesting is what the initial connection looks like. Once the TCP connec-
tion is established a handshake message is exchanged with the peer. The
handshake looks a lot like a HTTP request. The initial message consist of
lines of text. The first line is GNUTELLA CONNECT/0.6. After that a set of
headers follow which describe the servents capabilities. The headers follow
the standars described in RFC822 and RFC2616.
The servent reveals on which IP-address and port it listens for incoming
connections by a Listen-IP header. This header was first introduces by
the BearShare client on March 18 2002[22]. Although this header is not in
the RFC, and thus not mandatory, all clients we have investigated so far
provide this header during handshake.
The Listen-IP header provides an easy way to inventory all the ser-
vents on the corporate network. When sniffing, we only have to lookup
for GNUTELLA CONNECT as the first 16 characters, and then look further
in the packet for the Listen-IP header. We have accomplished the first
step in our quest (find the P2P client), the only thing remaining is detecting
what files this client if sharing.
27
GET / HTTP/1.1
Host: 145.100.104.196
User-Agent: Active Scanner
← Accept: text/html,
application/x-gnutella-packets
Connection: close
←- Active
Servant HTTP/1.1 200 OK Scanner
Server: LimeWire
Content-Type:
→
application/x-gnutella-packets
Connection: close
←-
data
Figure 21: The Browse Host Extension
6.2.2.3 The Browse Host Extension In paragraph 6.2.1.4 (What do you

share) we have investigated the “What files do you share?” message of
the eDonkey protocol. This feature was not very usable in automated de-
tection of the shared files, because of the inter-client relationship that was
necessary to be authorized to make such a request. In establishing this re-
lationship, a CAPTCHA challenge had to be answered, which ruled out
automated use of this message completely.
Gnutella has a similar message, which is called The Browse Host Extension.
Gnutella does not have the restrictions of the eDonkey message. We have
not found any indication that there are ambitions to implement such re-
strictions.
A Browse Host Extension request looks like shown in figure 21. It is a sim-
ple HTTP request asking for the root index. The Accept header line, indi-
cates that two different formats are accepted. It is important to accept those
two, because old BearShare clients only support text/html and LimeWire
(and derivatives) only support application/x-gnutella-packets.
At the moment for a Browse Host Extension it is mandatory to implement at
least the application/x-gnutella-packets media type. Its format
is defined as “a stream of binary Query Replies.”[25] It is quiet complex,
and we have not investigated it thoroughly. In stead, when looking at the
data we saw that a filename was always preceded and followed by a 0x00
byte, which then was followed by the characters “urn:.” We assessed that
sufficient for our “Proof-Of-Concept” script. The script is in Appendix 7.1.
28
7 Conclusion
Reviewing our research question from section 1.
What is the best solution to detect (malware inuenced) P2P users on

the local network that unknowingly share sensitive data?
We can conclude that there is no general way to detect sharing of sensitive

data and thus it is not possible to compare detection methods. To detect
sharing of sensitive data every protocol and client for this protocol has to
analyzed seperately. As a proof of concept we have analyzed the eDonkey
and Gnutella protocol and build a practical implementation to detect the
files that users are sharing with these protocols.
For the eDonkey network this is done by deep packet inspection of the con-
nection setup. During this connection the complete filelist is send over the
network, by analysing the protocol we were able to unpack this message
and list the files that are shared.
The Gnutella protocol however is a fully distributed protocol and no filelist
is send over the network. To detect what files users are sharing we first had
to locate the client on the local network. In the protocol messages the clients
send their ip-adres and tcp port to connect to. Since the Gnutella network
uses http to transfer files a file list could be downloaded with a http get
request. This allowed us to list the files that a Gnutella user is sharing.
While we have shown that is possible to detect what files clients are sharing
on a peer to peer network one might wonder if it’s worth the trouble to
solve this problem with a technical solution. Maybe it is a better solution
to inform the users about the dangers of unknowingly sharing sensitive
information.
7.1 Further research

If the technical solution is accepted as the best solution more research has
to be done on the subject.
• Large scale practical implementation

Our proof of concept is tested on single clients that were all managed
by us. Will the detection methods and or scripts also work in an en-
viroment with hundreds of clients on a gigabit internet connection?
• User warning system

Detecting shared files over peer to peer networks is the first step. This
method should envoke some kind of warning system to contact a user
with a warning that sensitive information is shared.
29
• Protocol obfuscation
The eDonkey clients have an optional possibility to obfuscate the pro-
tocol. This will break tcp packets and we were not able to detect and
unpack filelists. Is it possible to still detect filelists with protocol ob-
fuscation?
• Protocol encryption
Over the last few years protocols have grown to (semi-)decentralized
networks. Although peer to peer systems can be used to share legal
files they are still mostly used to transfer copyrighted material. We’re
expecting that in the near future clients will also encrypt their connec-
tions. Would it still be a viable solution to detect sensitive information
sharing thru some form of packet inspection?
30
References
[1] Bram Cohen
The BitTorrent Protocol Specification
http://bittorrent.org/beps/bep_0003.html
[2] Andrew Loewenstern

DHT Protocol
http://bittorrent.org/beps/bep_0005.html
[3] TheoryOrg
BitTorrent Specification
http://wiki.theory.org/BitTorrentSpecification
[4] Wikipedia
Edonkey URI scheme
http://en.wikipedia.org/wiki/Ed2k_link
[5] T. Klingberg, R. Manfredi

Gnutella RFC
http://rfc-gnutella.sourceforge.net/src/rfc-0_
6-draft.html
[6] Kevin Walsh

LimeWire Bug Report concerning HTTP and MAGNET requests.
http://seclists.org/bugtraq/2005/Mar/0239.html
[7] www.bittorrent.com
Official website of the BitTorrent client.
http://www.bittorrent.com/
[8] Paul F. Roberts

eWeek Article on botnet that uses BitTorrent to push movies.
http://tinyurl.com/9wrsgg
[9] Sophos
Sophos Analyses on W32/Tibick-E.
http://www.sophos.com/security/analyses/
viruses-and-spyware/w32tibicke.html
[10] Spear Forensics

Official website Forensic P2P.
http://www.spearforensics.com/products/forensicp2p/
index.aspx
[11] e-secure-it Arjen Landgraaf

Official website of e-secure-it IT security firm.
https://www.e-secure-it.com/
31
[12] e-secure-it Arjen Landgraaf

Official article of e-secure-it on P2PMon.
https://www.e-secure-it.com/Services.pdf
[13] 8-bits (Author Unknown)
8-bits Review on Forensic P2P.
http://stam.blogs.com/8bits/2008/05/forensic-p2p.
html
[14] Sandip
Top 20 Best P2P File Sharing Programs
http://tinyurl.com/7fgr2t
[15] p2pon.com
Popular File-Sharing Programs
http://www.p2pon.com/file-sharing-programs
[16] Bradley Mitchell
Top 10 Free P2P File Sharing Programs
http://compnetworking.about.com/od/p2ppeertopeer/tp/
p2pfilesharing.htm
[17] Bradley Mitchell
Top 10 Free P2P File Sharing Programs
http://compnetworking.about.com/od/p2ppeertopeer/tp/
p2pfilesharing.htm
[18] Yoram Kulbak, Danny Bickson
The eMule Protocol Specification
http://forum.emule-project.net/index.php?showtopic=
100724&st=375&p=810507&#entry810507”
[19] Alexey Klimkin
*Unofficial* eDonkey Protocol Specification v0.6.2
http://www.mirrorservice.org/sites/download.
sourceforge.net/pub/sourceforge/p/pd/pdonkey/
eDonkey-protocol-0.6.2.html
[20] Edonkey Network servers
http://edk.peerates.net/servers.php?lang=0
[21] Gnutella Bootstrapping
http://wiki.limewire.org/index.php?title=
Bootstrapping
[22] Gnutella Headers(GDF membership is required)
http://groups.yahoo.com/group/the_gdf/database?
method=reportRows&tbl=9
32
[23] Gnutella Development forum

http://groups.yahoo.com/group/the_gdf/
[24] Gnutella client development

http://wiki.limewire.org/index.php?title=GDF
[25] Query Format

http://wiki.limewire.org/index.php?title=Standard_
Message_Architecture#Query_Hit_.280x81.29
[26] Fake servers

http://forum.emule-project.net/index.php?showtopic=
100724&st=375&p=810507&#entry810507
33
Appendix - The P2P detection script
1 #!/usr/bin/env python
2 #
3 # This script listens on a network interface
4 # for eDonkey2000 "Offer Files" messages and
5 # prints the offered files to screen in a
6 # human readable form.
7 #
8 # It also detects a Gnutella initial handshake.
9 # If it sees one, it actively quiries the servent
10 # for the files it shares.
11 #
12
13 import pcapy
14 import impacket
15 from impacket.ImpactDecoder import EthDecoder
16 from zlib import decompress
17
18
19 def int2(bytes):
20 """
21 int2 converts the little-endian formated
22 2 byte integer to a python integer.
23 """
24 return ord(bytes[0]) | ord(bytes[1])<<8
25
26
27 def int4(bytes):
28 """
29 int4 converts the little-endian formated
30 4 byte integer to a python integer.
31 """
32 return ord(bytes[0]) | ord(bytes[1])<<8 | \
33 ord(bytes[2])<<16 | ord(bytes[3])<<24
34
35
36 # TAGNAMES is an list of all the 256 possible tag names
37 # only the first 8 are actually named.
38 TAGNAMES = ('',
39 'name' , 'size' , 'type' , 'format',
40 'collection', 'part path', 'part hash', 'copied') + \
41 ('???',) * 247
42
43
44 def tags(ntags, data):
45 """
46 tags reads <ntags> from <data> and returns a
47 dict containing the tag values keyed by their
34
48 tagname.
49 It also returns the remaining data.
50 """
51 # The dict to return
52 tags = dict()
53 while ntags:
54 if not data:
55 break
56 type = ord(data[0])
57 name = TAGNAMES [ord(data[1])]
58 data = data[2:]
59
60 if type & 0xF0 == 0x90: # : Small string type
61 size = type & 0x0F # The second nibble
62 tags[name] = data[:size] # contains the strings
63 data = data[size:] # size.
64
65 elif type == 0x82: # : Big string type
66 size = int2(data[:2]) # The first two
67 data = data[2:] # bytes of value
68 tags[name] = data[:size] # contain the strings
69 data = data[size:] # size.
70
71 elif type == 0x89: # : 1 byte integer type
72 tags[name] = ord(data[0])
73 data = data[1:]
74
76 tags[name] = int2(data[:2])
77 data = data[2:]
78
80 tags[name] = int4(data[:4])
81 data = data[4:]
82
83 ntags -= 1
84 return data, tags
85
86 def print_offer_files(offer_files_packet_data):
87 """
88 print_offer_files takes the sniffed packet data which is
89 assumed to be a "Offer Files" message. It walks through
90 the offered files, retrieves the tags associated with it
91 and prints that to screen.
92 """
93 size = int4(offer_files_packet_data[1:5]) - 1
94 message = offer_files_packet_data[6:]
95
96 if offer_files_packet_data[0] == 0xD4: # Is the file
35
97 message = decompress(message) # compressed?

98 # Then deflate
99
100 nfiles = int4(message[:4]) # First for bytes are the
101 message = message[4:] # number of files.
102
103 # Walk through the files
104 #
105 while nfiles > 0: # A file entry,
106 hash = message[:16] # starts with a
107 message = message[16:] # 16 byte MD4 hash,
108
109 client_id = int4(message[:4]) # followed by
110 message = message[4:] # a 4 byte Client ID,
111 # ( High ID clients
112 # have their IP
113 # address in reversed
114 # order stored here
)
115
116 port = int2(message[:2]) # followed by
117 message = message[2:] # a 4 byte TCP port
118 # on which the client
119 # listens,
120
121 tagcount = int4(message[:4]) # followed by
122 message = message[4:] # An integer indicating
123 # the number of tags
124 # that will follow.
125
126 # The tags are read from the message
127 #
128 message, tags = tags(tagcount, message)
129
130 print client_id, port, tags
131
132 nfiles -= 1
133
134
135 interface = pcapy.open_live('eth0', 65536, 1, 100)
136 interface.setfilter("tcp") # Only look at TCP traffic
137
138 decoder = EthDecoder()
139
140 while True:
141 # Get the data part of the TCP packet
142 #
143 hdr, data = interface.next()
144 eth = decoder.decode(data)
36
145 ip = eth.child()
146 tcp = ip.child()
147 data = tcp.get_data_as_string()
148
149 # Detect the "Offer Files" message and print
150 # it to screen.
151 #
152 if len(data) > 5 and data[5] == '\x15':
153 if data[0] == '\xE3' or data[0] == '\xD4':
154 print_offer_files(data)
155
156
157 # Detect an Gnutella handshake
158 #
159 elif data.startswith("GNUTELLA CONNECT"):
160 # It could be interesting to see the Handshake
161 # initial connection message.
162 #
163 # print data
164
165
166 # Retrieve the IP address and the port it listens
167 # on for incoming connections.
168 # It is the value of the Listen-IP header.
169 #
170 HOST, PORT= [
171 ' '.join(x.split(':')[1:]).strip().split()
172 for x in data.split('\n')
173 if x.lower().startswith('listen-ip:')
174 ][0]
175 PORT = int(PORT)
176
177 # Sent the "Browse Host Extension" message to the
178 # servant.
179 #
180 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
181 try:
182 s.connect((HOST, PORT))
183 except socket.error:
184 print 'Kon niet verbinden met', HOST, PORT
185 continue
186 to_send = """GET / HTTP/1.1\r
187 Host: %s\r
188 User-Agent: Limewire x.y.z Pro\r
189 Accept: text/html, application/x-gnutella-packets\r
190 Connection: close\r
191 \r
192 """ % HOST
193
37
194 # It could be interesting to see what is send

195 #
196 # print 'Sending', to_send
197
198 s.send(to_send)
199
200 # Get the full reply
201 #
202 data = ''
203 while True:
204 read = s.recv(65536)
205 if not read:
206 break
207 data += read
208 s.close()
209
210 # Print the filenames.
211 # They are right before the occurrences of urn:
212 # in between 0x00 bytes.
213 #
214 for file in data.split('urn:')[:-1]:
215 file = file[:-1]
216 ep = file.rfind('\x00')
217 file = file[ep:]
218 print file
38

P 2 Psharing

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

P 2 Psharing

Hochgeladen von

Copyright:

Verfügbare Formate

Detecting P2P dataleakage

Attilla de Groot attilla.degroot@os3.nl

December 23, 2008

2 Peer to Peer Networks 4

3 Peer to Peer Clients 9

4 Finding sensitive data 11

5 Bugs, viruses & rootkits 13

During this research we have experienced that the security community is

2 Peer to Peer Networks

2.1.1 Centralized Bittorrent

2.1.2 Decentralized Bittorrent

Figure 1: Centralized bittorrent structure

Figure 2: Chord system

2.1.3 Bittorrent and data leakage

research on unkowningly sharing sensitive data, we have decided not to

Figure 3: Edonkey filelist

Figure 4: Edonkey search and download scheme

highspeed connections are designated as an ultra-peer(see figure 5). These

Pong messages contain information about a Gnutella host. The

Figure 5: Ultrapeer and leafnodes

3 Peer to Peer Clients

3.1 Bearshare and Imesh

3.2 LimeWire and FrostWire

Figure 6: Do not share sensitive file types

Figure 7: Shareaza shared extensions

3.4 eMule, aMule, xMule

4 Finding sensitive data

Figure 8: eMule search results

address books, various contracts and password administrations of several

Figure 9: Shareaza search results

5 Bugs, viruses & rootkits

5.2 Viruses & rootkits

Figure 10: basic network layout.

6.1 Existing Software

Figure 11: e-secure-it P2PMon.

6.2 Active and Passive monitoring

the firewall has no means to detect if the communication is in fact https or

1. Shared files on the eDonkey network can be discovered by sniffing

2. Just sniffing on the Gnutella network is not enough. We will have to

The eD2k protocol is not formally documented (especially in

eD2k protocol is what eMule and eserver do together when run-

Server Server Server

↓Are you Here is your

Client Client Client

Figure 12: Initial connection

6.2.1.1 A hybrid network The eDonkey network is a so called hybrid

Server Server Server Server

I have I know about

Figure 13: Startup

Peer Peer Peer

Figure 14: Search for files

Peer Peer Peer

Figure 15: Fetch the file-parts from the sources

Figure 16: Asking what files a client shares

a peer without having authorization of that peer. The authorization process

file1: hash, filename, ...

Figure 17: I have these files

MD4 Hash of the file’s content

Client ID TCP Port

Figure 18: A file entry

0x9 Size String

Figure 19: Tag types