Beruflich Dokumente
Kultur Dokumente
Abstract
This report gives an overview of the current P2P (peer to peer) net-
works and clients, and presents an analyses where the current threats
of data leakage on these networks exist. It also outlines a proof of
concept for the listing of locally shared files on Gnutella and eDonkey
networks. This is done through passive deep packet inspection and
actively querying clients.
1
Detecting P2P dataleakage
Contents
1 Introduction 3
6 Monitoring 15
6.1 Existing Software . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.2 Active and Passive monitoring . . . . . . . . . . . . . . . . . 16
6.2.1 eDonkey . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2.1.1 A hybrid network . . . . . . . . . . . . . . . 18
6.2.1.2 Offering files . . . . . . . . . . . . . . . . . . 19
6.2.1.3 eDonkey in action . . . . . . . . . . . . . . . 19
6.2.1.4 What do you share? . . . . . . . . . . . . . . 21
6.2.1.5 The gory details . . . . . . . . . . . . . . . . 22
6.2.1.6 Actually sniffing . . . . . . . . . . . . . . . . 24
6.2.2 Gnutella . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2.2.1 Pure P2P opens up a protocol . . . . . . . . 26
6.2.2.2 Detection of Gnutella clients . . . . . . . . . 27
6.2.2.3 The Browse Host Extension . . . . . . . . . 28
7 Conclusion 29
7.1 Further research . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2
Detecting P2P dataleakage
1 Introduction
This report is written for the course Security of Systems & Networks (SSN)
for the System & Network Engineering study at the University of Ams-
terdam. After Ian Cook’s presentation (P2P Risks) at the SURFcert IBO in
October 2008 we investigated possible ways to detect sensitive information
sharing on peer-to-peer (P2P) networks. During four weeks of research, we
tried to supply an answer to the following research question:
”What is the best solution to detect (malware influenced) P2P users on the local
network that unknowingly share sensitive data?”
In this report, you will find information on the general working of P2P net-
works and their clients. Also we will have a quick look at some viruses,
bugs and rootkits who may influence the standard behaviour of clients.
After this the monitoring of shared files on a local network is outlined.
This will be investigated through the use of existing software, passive deep
packet inspection and the active querying of clients.
3
Detecting P2P dataleakage
2.1 Bittorrent
The bittorrent network has grown very popular over the last years. This
is largely due to the fact that every client concurrently uploades chunks of
data when downloading data from other peer to peer clients. This provides
a way to distribute files without huge bandwidth demands. The protocol
is also used for distributing software and even updates for games such as
World of Warcraft.
The bittorrent protocol can be used in two different ways to distribute data.
• Centralized with tracker system as described in the original protocol
specification[1]
• Decentralized by using a distributed hash table[2].
4
Detecting P2P dataleakage
160bit space. File id’s and node id’s are generated from the same keyspace.
A node is responsible for the file id’s close to his node id (basic chord sys-
tem as in figure 2). The network id’s are compared with a distance metric
to determine what nodes are close in the network. A node has more infor-
mation about nodes that are close then when they are far away from the
overlay network perspective.
When a node wants download a file, he contacts a node that he knows is
close to the id. He does this by an iterative search over the nodes. He will
get a reply from the node responsible for the file, containing a list of peers
that are sharing the requested file.
5
Detecting P2P dataleakage
2.2 Edonkey
The edonkey system uses centralized servers to distribute data. A client
shares a number of directories and files. These files are hashed with an
MD4 hash function so that they are uniquely identifyable in the network.
A list of these files is transferred to the server (see figure 3). The server
keeps track of all files that the nodes share. This might be an option to
detect data leakage that is further discussed in section 6.
To download a file from the edonkey network a url[4] containing the file-
name and the hash is used. The client queries the server for a list of nodes
that share a specific file (see figure 4). The client is then able to contact other
nodes to download the blocks.
From a network perspective there aren’t any measures that prevent data
leakage. Sharing of files is left totally to the clients. It isn’t required for a
user to manually create metadata file as is the case with bittorrent.
2.3 Gnutella
There are no central servers within the Gnutella network, because Gnutella
is a fully decentralised peer to peer network. However some nodes on
6
Detecting P2P dataleakage
Pong (0x01)[5]
7
Detecting P2P dataleakage
When looking at the clients, it appears that they have a function that can
request file listings of other clients in the network. This function sends an
http GET request to the other client. It uses the ip address and port number
supplied by the pong packet.
8
Detecting P2P dataleakage
9
Detecting P2P dataleakage
3.3 Shareaza
With the Windows only application Shareaza, you are able to share on and
download from the networks: eDonkey, Gnutella and BitTorrent. Audio,
video, images, archives, pdf and txt files are shared by default. Shareaza
has an extension block list (see 7) of files that will never be shared. This
list contains file types like pst. Pst stands for Personal Storage Table and
is used by Microsoft Outlook to store all your e-mails, agenda items and
contacts. You are able to edit this extension list yourself.
10
Detecting P2P dataleakage
Sharing your mailbox backup by accident isn’t the only way of losing sen-
sitive data. A lot of people share their whole “My Documents” folder by
accident, as mentioned before. Due to this we were able to download full
11
Detecting P2P dataleakage
12
Detecting P2P dataleakage
5.1 Bugs
In 2005 two bugs were found in LimeWire [6] which made it possible to ex-
ecute malicious requests on the affected clients. The first bug in LimeWire
releases 4.1.2 through 4.5.6 made it possible to send an HTTP ”get” request
to request any file from any location accessible by the remote user. This
request resulted in the client sending the file over the network to the re-
questing party.
The second bug in LimeWire releases 3.9.6 through 4.6.0 made the same
thing possible through MAGNET requests. Both bugs were effective on
Microsoft Windows and Linux operating systems but are fixed on newer
releases of LimeWire.
Since LimeWire lacks an automatic update mechanism, users of these old
versions are still susceptible to these bugs and for our data leakage re-
search, must be protected. As described in the monitoring section (6]) of
this report, it is possible to do deep packet inspection on LimeWire pack-
ets. In the packets, LimeWire advertises its version number. Based on the
bug affected version numbers it is then possible to alert system adminis-
trators or users to upgrade their client(s). Off course this is also possible
by other means such as software policies when a client computer is under
(own) administrative jurisdiction.
13
Detecting P2P dataleakage
5.3 Future
By looking at the rootkits and worms we can predict a future case where a
worm comes in through mimicking popular files on P2P clients. If a user
unknowingly downloads and executes an infected file it could be possible
for the worm to edit registry entries or configuration files of P2P clients.
During this research we took a closer look at registry entries and configu-
ration files of P2P clients and by editing these it is possible to share more
folders and extensions than by default. Since most hacker groups have no
profit in sharing somebody’s files on public networks it is likely that they
will use a private BitTorrent tracker to share interesting files. This conclu-
sion supports the fact that we need a mechanism to watch which files are
shared by clients on the local network since the default behaviour of the
clients can be easily influenced.
14
Detecting P2P dataleakage
6 Monitoring
In this section we will discuss different monitoring strategies to list the
sharing of files on a local network. Our approaches to monitoring are with
use of existing software, passive with deep packet inspection (eDonkey)
and active through the querying of clients (Gnutella). In this section, a ba-
sic network layout is assumed as a starting point. In figure 10 you see local
peer to peer clients, an internet gateway with a listening server which sees
all internet traffic and several nodes on the internet. The function of these
internet nodes will be explained during this section.
15
Detecting P2P dataleakage
no special features which can be used to list or scan (a range of) nodes.
The e-secure-it solution is a ”Peer2Peer Monitoring and Surveillance Ser-
vice” [12] solely based on the Gnutella network. As can be seen on fig-
ure 11, with the open-source Gnutella client Phex all files shared on the
Gnutella network are being harvested and saved in a large database. Through
a web based system it is possible to search or auto generate reports on
certain keywords. This service is meant as an early warning system for
companies to detect if internal or customer information is publicly avail-
able on the Gnutella network. According to Arjen Landgraaf the service
is very labour intensive because the system gives many false positives on
keywords. On our question why e-secure-it only harvests the Gnutella net-
work he wouldn’t give us an answer. This question was specially bothering
us because we find the eDonkey clients more susceptible to data leakage.
The e-secure-it system has no build-in capabilities to scan (a range of) nodes
on their active shared file list and couldn’t therefore be of any help to us.
16
Detecting P2P dataleakage
We will call those two methods passive and active detection respectively.
Throughout this document the peers that are on the corporate network will
be referred to with the term client and the peers on the internet with peers.
6.2.1 eDonkey
The eDonkey network was originally created by the MetaMachine Corpo-
ration. On September 28 MetaMachine went out of business, after they lost
a case against the RIAA1 . However independently from MetaMachine a
closed-source but free eDonkey server, eserver, and a GPL-licenced client,
eMule, were still actively developed. The eDonkey network survived and
is still very lively today.
As a consequence of this, there is no official eDonkey maintainer or speci-
fication. It is as the wikipedia entry on the eDonkey network states:
17
Detecting P2P dataleakage
18
Detecting P2P dataleakage
matter what server it is connected on. This makes High ID clients a stable
factor in the eDonkey network. It heightens the chance that a client, that
knows a file is on a certain peer with an High ID, can download or continue
to download it at a later moment. A Low ID is always a number lower then
16777216 (0x10000000) and varies per server.
This initial connection is a TCP connection and stays open throughout the time
the client is on the eDonkey network. This allows the server to contact a client
even if it does not allow incoming connections (and thus has a Low ID)
and through the server this initiative is available to all other peers. In the
figures this lifeline is drawn by a thick red line.
t
es
st
request
qu
status
↑ ↓
q
re
re
us
us
these files these other servers
at
at
st
st
Client Client
6.2.1.2 Offering files Immediately after the initial connection, the client
sends a list of files it has to offer to the server. This is a great opportunity
to inventorize the offered files of a P2P client for the eDonkey network on
a corporate network, and we will investigate how this can be done exactly
later. For now it suffices to say, that this list consists of the filenames and
the file hashes (MD4), uniquely identifying the content of the files. The file
hashes enable peers to find more peers that offer a certain file, even if it has
a different filename.
The server replies with a list of other servers it knows about. The client
completes (if necessary) its own list of servers. The client does not connect
to these other servers with a TCP session (as with the initial connection),
but it does check if they are alive by regularly sending status-requests over
UDP. See figure 13. If a server fails to answer those status-requests after a
certain limit, the server is considered dead and removed from the clients
server list.
6.2.1.3 eDonkey in action The client is now ready to seek and fetch files.
It queries its server, the server replies with the available sources that match
the search criteria. A human selects the sources it wishes to download, and
the client then asks the server which peers can offer these sources.
19
Detecting P2P dataleakage
Server
Server
D
I’m looking for You can find
↑ ↓
these files them here
Client
This transaction is shortened to just one question and one reply in figure 14.
In this figure the server tells the client that the sources are available at peers
A, B and C. The diode symbols, , are used to indicate that peers A and B
are unable to receive incoming connections. We will see the consequences
of that in the next figure.
These transactions are not suitable for detecting outgoing files. The re-
quests for files on the corporate clients are sent to a server which is not
on the corporate network. Even when running a eDonkey server inside the
corporate network, there is no way to force the clients to connect with this
server.
It also seems that an eDonkey server has no insight in files shared by clients
on other servers. Although we have not looked into this, the greatly vary-
ing number of files per server is an indication of this[20]. If it would know
about files on other servers, running your own server would be a way to
determine what files are shared from your corporate network, because all
files shared on the eDonkey network would be known by it.
In figure 15 we see the client actually fetching the file from the peers. Ap-
parently it is a big file, because it requests different parts of the file from
different peers. Big files in eDonkey are split up in parts of 9.28MB. Hashes
are calculated over those parts by the.peer offering the file, and provided
to the server during its startup (See figure 13). Clients can then retrieve
different parts of the file from different peers at the same time, to improve
download speed.
Peer C is connectable from the client, therefor the client fetches the part
from that peer immediately. Because peer A and peer B are not directly
connectable (they have a Low ID), the client has to ask the servers on which
they are connected to ask the peers to call the client back. The servers can do
this through the initial TCP connection from the peers (the lifeline) which
20
Detecting P2P dataleakage
Server fetch
Server ↑
←
D part
fetc
hp
←
↑Ask B to
art
fetch A
↑ sk
part ca A call me
ll to
m
e
Client
stays open throughout the time the peers are on the eDonkey network. Our
client is directly connectable from the internet (the client has a High ID), so
the peers can connect to the client directly. Peer A and peer B initiate a TCP
connection to the client, and the client sends the “fetch part” requests to
those peers over those connections.
It is clear that with this mechanism it is not possible for a peer with a Low
ID to fetch files (or parts) from other peers with a Low ID. It used to be
so, that two peers with a Low ID on the same server could still - via the
server - fetch files from each other, but most servers no longer support this
because of the overhead this inflicts on the server. Interchanging files from
two peers with a Low ID on different servers was never possible. This is
one of the major restrictions a peer with a Low ID has.
The added value of this to our detection process is that: if we know what
files are shared by peers on our corporate network, and we know what
hashes they are identified by, we can determine when a certain offered file
is actually downloaded. It might be good to know that when a certain file
containing sensitive information is offered, it is not actually leaked by the
time this fact is observed. And when it is actually leaked, measurements
can be taken to minimize the damage.
6.2.1.4 What do you share? There is one other interesting feature in eDon-
key that deserves our attention: One can ask a peer what files it shares (fig-
ure 16). Intercepting and interpreting the offered files during startup (see
figure 13), is not that easy, as we will see later. It might be easier to just
detect what eDonkey clients are on the corporate network and actively ask
them what files they share. However, in the investigation of the different
eDonkey client software (see 3) it was not possible to list the shared files on
21
Detecting P2P dataleakage
Peer
Server
C
↓those
What files do
↑
you share? Client
0 1 2 3 4 5
0xE3
or Message size (-5) 0x15
0xD4
Data
6 7 8 9
Number of files
6.2.1.5 The gory details What does the “I have these files” message in
figure 13 look like exactly? In figure 6.2.1.5 is a representation of the struc-
ture of this message. The squares with numbers above them, are the in-
dexed bytes of the packet. The first byte defines the protocol. 0xE3 means
eDonkey protocol. 0xD4 means eMule compressed data. When this is the
22
Detecting P2P dataleakage
first byte, the Data block should be decompressed (using the deflate algo-
rithm, See RFC1951).
Bytes 1 until 4 countain the message size excluding the first five bytes. Be-
cause the Data block starts on byte 6, size of the Data block is thus, message
size minus one. The message size is in little-endian format, which is not the
conventional network byte order.
The sixth byte (at position 5) is the message type. 0x15 means this message
is a “Offer files” message.
The (if necessary deflated) Data block contains the summary of the offered
files. The first four bytes contain the “Number of files” in the list (also
in little-endian format). Following that are “Number of files” times a File
entry.
0 1 2 3 4 5 6 7
A file entry, as shown in figure 18, consists of a 16 bytes long MD4 hash
‘uniquely’ identifying the content of the file, followed by the Client ID and
TCP Port. When the client has a Low ID, the TCP Port is 0, otherwise it is
the port on which the peer can receive connections on. The IP address of
a High ID client is deducible from the client ID, as we saw in paragraph
6.2.1.1 A hybrid network. Much like the Data block in figure , the Tag block
consists of 4 bytes specifying the number of tags, followed by those tags.
0 1
Type Name Value
A tag has the format . The second byte is the name
of the tag, represented by a number. A value of 0x01 would indicate that
the Value is a filename; 0x02 that the Value is the filesize; 0x03 the file
type (Audio, Video, Image, Doc, etc.). Many more of those names are pre-
defined.
23
Detecting P2P dataleakage
Value of Type
nibble 1 nibble 2 Format of Value
0
0x8 0x9 Integer
0 1
0x8 0x8 Integer
0 1 2 3
0x8 0x3 Integer
0 1
0x8 0x2 Size String
The first byte indicates the type of the value. This would be <String> in
case of a filename and file type, and <Integer> in case of a filesize. A few
more types are defined. Interestingly enough, we have seen two different
kind of <String>s and three different kind of <Integer>s. Those types
were not in the documents describing the protocol and we had to figure
them out by reverse engineering.
Figure 19 is a table of the different tag-types we have been able to identify.
The Type byte is split in to two nibbles. We have done this because of the
especially quirky string type where nibble 1 has value 0x9. The Size of the
string is there specified by the second nibble.
6.2.1.6 Actually sniffing There are a few things that have to be consid-
ered when actually sniffing for the “Offer files” message.
The message is in the middle of a TCP session. When sniffing a ‘running’
TCP session, first one has to make sure the packets are defragmented, so we
can interpret whole packets. Only then we will be able to ‘see’ the packet
that starts with 0xE3 or 0xD4 and has 0x15 as its sixth byte.
The message size (bytes 1 to 4) indicates the size of the message. This might
be greater then the maximum size for the data in the TCP packet. When
the size is greater then the TCP-packet data-size, we have to collect more
TCP packets and append the data to the message, until it has the indicated
size. To do this correctly we have to take the TCP sequence numbers into
account. The TCP-packets should be interpreted in the right order!
Snort is the de facto program for intrusion detection. It does this using
24
Detecting P2P dataleakage
25
Detecting P2P dataleakage
Line 11 and 12 detect the byte values that identify the “Offer Files” mes-
sage. Note that this would probably trigger a lot of false positives, and for
all of those the complete TCP session will be logged to /var/log/ed2k.sessions.log.
There is no way to configure Snort so it logs only the complete “Offer files”
message. We can either, log the packet or log the complete TCP session.
Snort can not interpret the “Message size” bytes as a little-endian integer
and read so much more of the packet.
The /var/log/ed2k.sessions.log should later be interpreted by an
program to extract the actual offered files, and identify which clients of-
fered them.
Snort is obviously not very suitable for doing this type of information ex-
traction. Snort does offer an API to create your own modules. The best way
to passively detect eDonkey “Offer Files” messages would probably be by
an specifically for this purpose written snort module.
We did not do this in our research. In stead, we wrote a Python script which
does the interpretation of the “Offer Files” message. By making use of the
Python bindings to the libpcap2 library, pcapy3 , we empowered the script
to actually capture the message as well. This is easier and more convenient
then interpreting snort log files, but we have not taken packet defragmen-
tation or TCP sequences into account. We assessed that unnecessary for a
“Proof-Of-Concept” script. The script is in Appendix 7.1.
6.2.2 Gnutella
6.2.2.1 Pure P2P opens up a protocol In contradiction to eDonkey, Gnutella
does not use central servers. It is a so called Pure P2P protocol. There are
no central servers involved in Gnutella communication. The network con-
sists of peers only. The eserver (eDonkey server software) was not open
source because it would, according to the author: “be used to build fake
servers[26].” The author is referring here to the attempts by the record-
ing industry to pollute the P2P networks with false files, to demotivate the
sharing of copyrighted material.
The fully distributed nature of Gnutella makes it less vulnerable to legal
threats. This makes Gnutella very popular, and many open source clients
have arisen using the protocol. It is in the advantage of those clients that
the protocol is well defined. Unlike the eDonkey protocol, there is excellent
up-to-date documentation on the workings of Gnutella[5]. The RFC is dated
2002 but, as described in de draft, the protocol can be extended through a
documenten extension mechanism. The recent development of new exten-
sions and other bells and whistles, are discussed in the “Gnutella Devel-
opment Forum”[23] An excellent overview and description of all the new
2
http://www.tcpdump.org/
3
http://oss.coresecurity.com/projects/pcapy.html
26
Detecting P2P dataleakage
Peer
GNUTELLA CONNECT/0.6
GNUTELLA/0.6 200 OK
User-Agent: LimeWire
User-Agent:
Listen-IP:
145.100.104.196:9253
↑ ↓ Bearshare
←-
←-
Servant
Interesting is what the initial connection looks like. Once the TCP connec-
tion is established a handshake message is exchanged with the peer. The
handshake looks a lot like a HTTP request. The initial message consist of
lines of text. The first line is GNUTELLA CONNECT/0.6. After that a set of
headers follow which describe the servents capabilities. The headers follow
the standars described in RFC822 and RFC2616.
The servent reveals on which IP-address and port it listens for incoming
connections by a Listen-IP header. This header was first introduces by
the BearShare client on March 18 2002[22]. Although this header is not in
the RFC, and thus not mandatory, all clients we have investigated so far
provide this header during handshake.
The Listen-IP header provides an easy way to inventory all the ser-
vents on the corporate network. When sniffing, we only have to lookup
for GNUTELLA CONNECT as the first 16 characters, and then look further
in the packet for the Listen-IP header. We have accomplished the first
step in our quest (find the P2P client), the only thing remaining is detecting
what files this client if sharing.
27
Detecting P2P dataleakage
GET / HTTP/1.1
Host: 145.100.104.196
User-Agent: Active Scanner
← Accept: text/html,
application/x-gnutella-packets
Connection: close
←- Active
Servant HTTP/1.1 200 OK Scanner
Server: LimeWire
Content-Type:
→
application/x-gnutella-packets
Connection: close
←-
data
28
Detecting P2P dataleakage
7 Conclusion
Reviewing our research question from section 1.
29
Detecting P2P dataleakage
• Protocol obfuscation
The eDonkey clients have an optional possibility to obfuscate the pro-
tocol. This will break tcp packets and we were not able to detect and
unpack filelists. Is it possible to still detect filelists with protocol ob-
fuscation?
• Protocol encryption
Over the last few years protocols have grown to (semi-)decentralized
networks. Although peer to peer systems can be used to share legal
files they are still mostly used to transfer copyrighted material. We’re
expecting that in the near future clients will also encrypt their connec-
tions. Would it still be a viable solution to detect sensitive information
sharing thru some form of packet inspection?
30
Detecting P2P dataleakage
References
[1] Bram Cohen
The BitTorrent Protocol Specification
http://bittorrent.org/beps/bep_0003.html
[3] TheoryOrg
BitTorrent Specification
http://wiki.theory.org/BitTorrentSpecification
[4] Wikipedia
Edonkey URI scheme
http://en.wikipedia.org/wiki/Ed2k_link
[7] www.bittorrent.com
Official website of the BitTorrent client.
http://www.bittorrent.com/
[9] Sophos
Sophos Analyses on W32/Tibick-E.
http://www.sophos.com/security/analyses/
viruses-and-spyware/w32tibicke.html
31
Detecting P2P dataleakage
32
Detecting P2P dataleakage
33
Detecting P2P dataleakage
1 #!/usr/bin/env python
2 #
3 # This script listens on a network interface
4 # for eDonkey2000 "Offer Files" messages and
5 # prints the offered files to screen in a
6 # human readable form.
7 #
8 # It also detects a Gnutella initial handshake.
9 # If it sees one, it actively quiries the servent
10 # for the files it shares.
11 #
12
13 import pcapy
14 import impacket
15 from impacket.ImpactDecoder import EthDecoder
16 from zlib import decompress
17
18
19 def int2(bytes):
20 """
21 int2 converts the little-endian formated
22 2 byte integer to a python integer.
23 """
24 return ord(bytes[0]) | ord(bytes[1])<<8
25
26
27 def int4(bytes):
28 """
29 int4 converts the little-endian formated
30 4 byte integer to a python integer.
31 """
32 return ord(bytes[0]) | ord(bytes[1])<<8 | \
33 ord(bytes[2])<<16 | ord(bytes[3])<<24
34
35
36 # TAGNAMES is an list of all the 256 possible tag names
37 # only the first 8 are actually named.
38 TAGNAMES = ('',
39 'name' , 'size' , 'type' , 'format',
40 'collection', 'part path', 'part hash', 'copied') + \
41 ('???',) * 247
42
43
44 def tags(ntags, data):
45 """
46 tags reads <ntags> from <data> and returns a
47 dict containing the tag values keyed by their
34
Detecting P2P dataleakage
48 tagname.
49 It also returns the remaining data.
50 """
51 # The dict to return
52 tags = dict()
53 while ntags:
54 if not data:
55 break
56 type = ord(data[0])
57 name = TAGNAMES [ord(data[1])]
58 data = data[2:]
59
60 if type & 0xF0 == 0x90: # : Small string type
61 size = type & 0x0F # The second nibble
62 tags[name] = data[:size] # contains the strings
63 data = data[size:] # size.
64
65 elif type == 0x82: # : Big string type
66 size = int2(data[:2]) # The first two
67 data = data[2:] # bytes of value
68 tags[name] = data[:size] # contain the strings
69 data = data[size:] # size.
70
71 elif type == 0x89: # : 1 byte integer type
72 tags[name] = ord(data[0])
73 data = data[1:]
74
75 elif type == 0x88: # : 2 byte integer type
76 tags[name] = int2(data[:2])
77 data = data[2:]
78
79 elif type == 0x83: # : 4 byte integer type
80 tags[name] = int4(data[:4])
81 data = data[4:]
82
83 ntags -= 1
84 return data, tags
85
86 def print_offer_files(offer_files_packet_data):
87 """
88 print_offer_files takes the sniffed packet data which is
89 assumed to be a "Offer Files" message. It walks through
90 the offered files, retrieves the tags associated with it
91 and prints that to screen.
92 """
93 size = int4(offer_files_packet_data[1:5]) - 1
94 message = offer_files_packet_data[6:]
95
96 if offer_files_packet_data[0] == 0xD4: # Is the file
35
Detecting P2P dataleakage
36
Detecting P2P dataleakage
145 ip = eth.child()
146 tcp = ip.child()
147 data = tcp.get_data_as_string()
148
149 # Detect the "Offer Files" message and print
150 # it to screen.
151 #
152 if len(data) > 5 and data[5] == '\x15':
153 if data[0] == '\xE3' or data[0] == '\xD4':
154 print_offer_files(data)
155
156
157 # Detect an Gnutella handshake
158 #
159 elif data.startswith("GNUTELLA CONNECT"):
160 # It could be interesting to see the Handshake
161 # initial connection message.
162 #
163 # print data
164
165
166 # Retrieve the IP address and the port it listens
167 # on for incoming connections.
168 # It is the value of the Listen-IP header.
169 #
170 HOST, PORT= [
171 ' '.join(x.split(':')[1:]).strip().split()
172 for x in data.split('\n')
173 if x.lower().startswith('listen-ip:')
174 ][0]
175 PORT = int(PORT)
176
177 # Sent the "Browse Host Extension" message to the
178 # servant.
179 #
180 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
181 try:
182 s.connect((HOST, PORT))
183 except socket.error:
184 print 'Kon niet verbinden met', HOST, PORT
185 continue
186 to_send = """GET / HTTP/1.1\r
187 Host: %s\r
188 User-Agent: Limewire x.y.z Pro\r
189 Accept: text/html, application/x-gnutella-packets\r
190 Connection: close\r
191 \r
192 """ % HOST
193
37
Detecting P2P dataleakage
38