SILK Analysis Handbook

Pittsburgh, PA 15213-3890
Using SiLK for Network Trac Analysis ANALYSTS HANDBOOK

for SiLK versions 2.1.0 and later Timothy Shimeall Sidney Faber Markus DeShon Andrew Kompanek
September 2010
CERT R Network Situational Awareness Group
This work is sponsored by the U.S. Department of Defense. The Software Engineering Institute is a federally funded research and development center sponsored by the U.S. Department of Defense.
Copyright 2005-2010 Carnegie Mellon University. NO WARRANTY THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN AS-IS BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. Use of any trademarks in this report is not intended in any way to infringe on the rights of the trademark holder. The authors wish to acknowledge the valuable contributions of all members of the CERT Network Situational Awareness Team, past and present, to the concept and execution of the SiLK Tool Suite and to this handbook. Many individuals contributed as reviewers and evaluators of the material in this handbook. Of especial mention are Michael Collins, Ph.D., who was responsible for the initial draft of this handbook and for the development of the earliest versions of the SiLK tool suite, and Mark Thomas, Ph.D., who transitioned the handbook from Microsoft Word to LaTeX, patiently and tirelessly answered many technical questions from the authors, and shepherded the maturing of the SiLK tool suite. The many users of the SiLK tool suite have also contributed immensely to the evolution of the suite and its tools, and are gratefully acknowledged. Lastly, the authors wish to acknowledge their ongoing debt to the memory of Suresh L. Konda, Ph.D., who lead the initial concept and development of the SiLK tool suite as a means of gaining network situational awareness.
ii
Contents
Handbook Goals 1 Networking Primer and Review of UNIX 1.1 TCP/IP Networking Primer . . . . . . . . 1.1.1 IP Protocol Layers . . . . . . . . . 1.1.2 Structure of the IP Header . . . . 1.1.3 IP Addressing and Routing . . . . 1.1.4 Major Protocols . . . . . . . . . . 1.2 Review of UNIX Skills . . . . . . . . . . . 1.2.1 Using the UNIX Command Line . 1.2.2 Using Pipes . . . . . . . . . . . . . Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 3 3 3 4 4 6 11 11 13 15 15 15 16 16 18 19 19 20 21 23 23 24 25 29 30 31 31 32 32 36 37 37 38 43 43 45
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
2 The SiLK Flow Repository 2.1 What Is Network Flow Data? . . . . . . . . . . . . . . 2.1.1 Structure of a Flow Record . . . . . . . . . . . 2.2 Flow Generation and Collection . . . . . . . . . . . . . 2.3 Introduction to Flow Collection . . . . . . . . . . . . . 2.3.1 Where Network Flow Data Is Collected . . . . 2.3.2 Types of Enterprise Network Trac . . . . . . 2.3.3 The Collection System and Data Management 2.3.4 How Network-Flow Data Is Organized . . . . . 2.4 SiLK support . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
3 Essential SiLK Tools 3.1 Suite Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Selecting Records with rwfilter . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 rwfilter Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Finding Low-Packet Flows with rwfilter . . . . . . . . . . . . . . 3.2.3 Using IPv6 with rwfilter . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Using Pipes with rwfilter . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Translating Signatures Into rwfilter Calls . . . . . . . . . . . . . 3.2.6 rwfilter and Tuple Files . . . . . . . . . . . . . . . . . . . . . . . 3.3 Describing Flows with rwstats . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Creating Time Series with rwcount . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Examining Trac Over a Month . . . . . . . . . . . . . . . . . . . 3.4.2 Counting by Bytes, Packets, and Flows . . . . . . . . . . . . . . . 3.4.3 Changing the Format of Data . . . . . . . . . . . . . . . . . . . . . 3.4.4 Using the --load-scheme Parameter for Dierent Approximations 3.5 Displaying Flow Records Using rwcut . . . . . . . . . . . . . . . . . . . . 3.5.1 Pagination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
3.6 3.7
3.5.2 Selecting Fields to Display . . . . 3.5.3 Selecting Fields for Performance 3.5.4 Rearranging Fields for Clarity . . 3.5.5 Field Formatting . . . . . . . . . 3.5.6 Selecting Records to Display . . Sorting Flow Records With rwsort . . . 3.6.1 Behavioral Analysis with rwsort, Counting Flows With rwuniq . . . . . . 3.7.1 Using Thresholds with rwuniq . 3.7.2 Counting IPv6 Flows . . . . . . . 3.7.3 Counting on Compound Keys . . 3.7.4 Using rwuniq to Isolate Behavior
. . . . . . . . . . . . . . . . . . . . . . . . rwcut . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and rwfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
46 47 47 48 50 51 51 52 53 54 55 55 57 57 57 58 59 59 62 63 65 67 69 69 70 71 71 72 72 73 74 76 77 79 79 79 80 80 82 83 85 86 90 92 93 93 94 95 96 96 97
4 Using the Larger SiLK Tool Suite 4.1 Common Tool Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Structure of a Typical Command-Line Invocation . . . . . . . . . . 4.1.2 Getting Tool Help . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Manipulating Flow-Record Files . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Combining Flow Record Files with rwcat and rwappend . . . . . . 4.2.2 Merging While Removing Duplicate Flow Records with rwdedupe 4.2.3 Dividing Flow Record Files with rwsplit . . . . . . . . . . . . . . 4.2.4 Keeping Track of File Characteristics with rwfileinfo . . . . . . 4.2.5 Creating Flow-Record Files from Text with rwtuc . . . . . . . . . 4.3 Analyzing Packet Data with rwptoflow and rwpmatch . . . . . . . . . . . 4.3.1 Creating Flows from Packets Using rwptoflow . . . . . . . . . . . 4.3.2 Matching Flow Records With Packet Data Using rwpmatch . . . . 4.4 IP Masking with rwnetmask . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Summarizing Trac with IP Sets . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 What are IP Sets? . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Creating IP Sets with rwset . . . . . . . . . . . . . . . . . . . . . 4.5.3 Reading Sets with rwsetcat . . . . . . . . . . . . . . . . . . . . . 4.5.4 Manipulating Sets with rwsettool . . . . . . . . . . . . . . . . . . 4.5.5 Using rwsettool --intersect to Fine-Tune IP Sets . . . . . . . . 4.5.6 Using rwsettool --union to Examine IP Set Structure . . . . . . 4.5.7 Backdoor Analysis with IP Sets . . . . . . . . . . . . . . . . . . . . 4.6 Summarizing Trac with Bags . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 What Are Bags? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Using rwbag to Generate Bags from Data . . . . . . . . . . . . . . 4.6.3 Reading Bags Using rwbagcat . . . . . . . . . . . . . . . . . . . . 4.6.4 Using Bags: A Scanning Example . . . . . . . . . . . . . . . . . . 4.6.5 Manipulating Bags Using rwbagtool . . . . . . . . . . . . . . . . . 4.7 Labeling Related Flows with rwgroup and rwmatch . . . . . . . . . . . . . 4.7.1 Labeling Based on Common Attributes with rwgroup . . . . . . . 4.7.2 Labeling Matched Groups with rwmatch . . . . . . . . . . . . . . . 4.8 Adding IP Attributes with Prex Maps . . . . . . . . . . . . . . . . . . . 4.8.1 What are Prex Maps? . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2 Creating a Prex Map . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.3 Selecting Flow Records with rwfilter and Prex Maps . . . . . . 4.8.4 Working with Prex Values Using rwcut and rwuniq . . . . . . . . 4.8.5 Using a Country-Code Mapping via rwip2cc . . . . . . . . . . . . 4.8.6 Where to Go for More Information on Prex Maps . . . . . . . . . 4.9 Gaining More Features with Plug-Ins . . . . . . . . . . . . . . . . . . . . . iv
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 Using PySiLK For Advanced Analysis 99 5.1 rwfilter and PySiLK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2 rwcut, rwsort, and PySiLK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6 Closing 107
vi
List of Figures
1.1 1.2 1.3 1.4 1.5 2.1 2.2 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 IP Protocol Layers . . . . . Structure of the IP Header TCP Header . . . . . . . . TCP State Machine . . . . UDP and ICMP Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 8 9 10 17 18 25 28 34 37 38 39 40 41 44 44 52 52 59 59 60 61 62 64 69 70 72 73 74 78 80 81 83 86 90 93 96
From Packets to Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default Trac Type for Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwfilter Parameter Relationships . . . . . . . . . . . rwfilter Partitioning Parameters . . . . . . . . . . . Summary of rwstats . . . . . . . . . . . . . . . . . . Summary of rwcount . . . . . . . . . . . . . . . . . . Displaying rwcount Output Using gnuplot . . . . . . Focusing gnuplot Output on a Single Hour . . . . . . Improved gnuplot Output Based on a Larger Bin Size Comparison of Byte and Record Counts over Time . . Dierences Between Load Schemes . . . . . . . . . . . Summary of rwcut . . . . . . . . . . . . . . . . . . . . Summary of rwsort . . . . . . . . . . . . . . . . . . . Summary of rwuniq . . . . . . . . . . . . . . . . . . . Summary of rwcat . . . . . . . . . . . . . . . . . Summary of rwappend . . . . . . . . . . . . . . . One Display of Large Volume Flows . . . . . . . Another Display of Large Volume Flows . . . . . Summary of rwdedupe . . . . . . . . . . . . . . . Summary of rwsplit . . . . . . . . . . . . . . . Summary of rwptoflow . . . . . . . . . . . . . . Summary of rwpmatch . . . . . . . . . . . . . . . Summary of rwset . . . . . . . . . . . . . . . . . Summary of rwsetcat . . . . . . . . . . . . . . . Summary of rwsettool . . . . . . . . . . . . . . Graph of Hourly Source IP Address Set Growth . Summary of rwbag . . . . . . . . . . . . . . . . . Summary of rwbagcat . . . . . . . . . . . . . . . Summary of rwbagtool . . . . . . . . . . . . . . Summary of rwgroup . . . . . . . . . . . . . . . Summary of rwmatch . . . . . . . . . . . . . . . Summary of rwpmapbuild . . . . . . . . . . . . . Summary of rwip2cc . . . . . . . . . . . . . . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
List of Tables
1.1 1.2 1.3 3.1 3.2 3.3 3.4 3.5 3.6 4.1 IPv4 Reserved Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IPv6 Reserved Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Common UNIX Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwfilter Input Parameters . . . . . . . . . . . . . . rwfilter Selection Parameters . . . . . . . . . . . . Commonly-Used rwfilter Partitioning Parameters . rwfilter Output Parameters . . . . . . . . . . . . . Other Parameters . . . . . . . . . . . . . . . . . . . . Arguments for the --fields Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 12 25 26 27 28 29 46 97
Current SiLK Plug-ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
List of Examples
1-1 1-2 1-3 1-4 1-5 2-1 3-1 3-2 3-3 3-4 3-5 3-6 3-7 3-8 3-9 3-10 3-11 3-12 3-13 3-14 3-15 3-16 3-17 3-18 3-19 3-20 3-21 3-22 3-23 3-24 3-25 3-26 3-27 3-28 3-29 3-30 3-31 3-32 3-33 3-34 3-35 A UNIX Command Prompt . . . . . . . . . . . . . . . . . . . . . . . . Example Using Common UNIX Commands . . . . . . . . . . . . . . . A Simple Command Line . . . . . . . . . . . . . . . . . . . . . . . . . A Simple Piped Command . . . . . . . . . . . . . . . . . . . . . . . . . Using a Named Pipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using mapsid to Obtain a List of Sensors . . . . . . . . . . . . . . . . Using rwfilter to Count Trac to an External Network . . . . . . . Using rwfilter to Extract Low-Packet Flow Records . . . . . . . . . Using rwfilter to Process IPv6 Flows . . . . . . . . . . . . . . . . . . Using rwfilter to Detect IPv6 Neighbor Discovery Flows . . . . . . . rwfilter --pass and --fail to Partition Fast and Slow High-Volume rwfilter With a Tuple File . . . . . . . . . . . . . . . . . . . . . . . . Using rwstats To Count Protocols and Ports . . . . . . . . . . . . . . rwstats --sport --percentage to Prole Source Ports . . . . . . . . rwstats --dport --top --count to Examine Destination Ports . . . . rwstats --copy-input and --output-path to Chain Calls . . . . . . rwcount for Counting with Respect to Time Bins . . . . . . . . . . . . rwcount Sending Results to Disk . . . . . . . . . . . . . . . . . . . . . rwcount --bin-size to Better Scope Data for Graphing . . . . . . . . rwcount Alternate Date Formats . . . . . . . . . . . . . . . . . . . . . rwcount --start-epoch to Constrain Minimum Date . . . . . . . . . rwcount Alternative Load Schemes . . . . . . . . . . . . . . . . . . . . rwcut for Display the Contents of a File . . . . . . . . . . . . . . . . . rwcut Used With rwfilter . . . . . . . . . . . . . . . . . . . . . . . . SILK PAGER With the Empty String to Disable rwcut Paging . . . . . rwcut --pager to Disable Paging . . . . . . . . . . . . . . . . . . . . . rwcut Performance With Default --fields . . . . . . . . . . . . . . . rwcut --fields to Improve Eciency . . . . . . . . . . . . . . . . . . rwcut --fields to Rearrange Output . . . . . . . . . . . . . . . . . . rwcut ICMP Type and Code as dport . . . . . . . . . . . . . . . . . . rwcut --icmp Parameter and Fields to Display ICMP Type and Code rwcut --delim to Change the Delimiter . . . . . . . . . . . . . . . . . rwcut --no-title to Suppress Field Headers in Output . . . . . . . . rwcut --num-recs to Constrain Output . . . . . . . . . . . . . . . . . rwcut --num-recs and Title Line . . . . . . . . . . . . . . . . . . . . . rwcut --start-rec to Select Records to Display . . . . . . . . . . . . rwcut --start-rec, --end-rec, and --num-recs Combined . . . . . rwuniq for Counting in Terms of a Single Field . . . . . . . . . . . . . rwuniq --flows for Constraining Counts to a Threshold . . . . . . . . rwuniq --bytes and --packets with Minimum Flow Threshold . . . rwuniq --flows and --packets to Constrain Flow and Packet Counts xi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 13 13 14 19 24 30 30 31 31 32 33 35 35 36 36 37 37 42 42 43 45 45 45 45 47 47 47 48 48 49 49 50 50 50 50 53 53 54 54
3-36 3-37 3-38 4-1 4-2 4-3 4-4 4-5 4-6 4-7 4-8 4-9 4-10 4-11 4-12 4-13 4-14 4-15 4-16 4-17 4-18 4-19 4-20 4-21 4-22 4-23 4-24 4-25 4-26 4-27 4-28 4-29 4-30 4-31 4-32 4-33 4-34 4-35 4-36 4-37 4-38 4-39 4-40 4-41 4-42 4-43 4-44 4-45 4-46 4-47 4-48 4-49
Using rwuniq to Detect IPv6 PMTU Throttling . . . . . . . . . . . . . . . . . . . . . . . . . rwuniq --field to Count with Respect to Combinations of Fields . . . . . . . . . . . . . . Using rwuniq to Isolate Email and Non-Email Behavior . . . . . . . . . . . . . . . . . . . . A Typical Sequence of Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using --help and --version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwcat for Combining Flow-Record Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwdedupe for Removing Duplicate Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwsplit for Coarsely Parallel Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwsplit to Generate Statistics on Flow-Record Files . . . . . . . . . . . . . . . . . . rwfileinfo for Display of Data File Characteristics . . . . . . . . . . . . . . . . . . . . . . rwfileinfo for Showing Command History . . . . . . . . . . . . . . . . . . . . . . . . . . . rwtuc for Simple File Cleansing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwptoflow for Simple Packet Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwptoflow and rwpmatch for Filtering Packets Using an IP Set . . . . . . . . . . . . . . . . rwnetmask for Abstracting Source IPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwset for Generating a Set File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsetcat to Display IP Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsetcat --count-ip, --print-stat, and --network-description for Showing Structure rwsetbuild for Generating IP Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsettool --intersect and --difference . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsettool --union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsetmember to Test for an address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwset to Filter for a Set of Scanners . . . . . . . . . . . . . . . . . . . . . . . . . . . A Script for Generating Hourly Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Counting Hourly Set Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwsetbuild for Building an Address Space IP Set . . . . . . . . . . . . . . . . . . . . . . . Backdoor Filtering Based on Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbag for Generating Bags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbagcat for Displaying Bags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbagcat --mincount, --maxcount, --minkey and --maxkey to Filter Results . . . . . . . rwbagcat --bin-ips to Display Unique IPs Per Value . . . . . . . . . . . . . . . . . . . . . rwbagcat --integer-keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwbag to Filter Out a Set of Scanners . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbagtool --add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbagtool --intersect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwbagtool Combining Threshold with Set Intersection . . . . . . . . . . . . . . . . . . . . . rwbagtool --coverset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwgroup to Group Flows of a Long Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwgroup --rec-threshold to Drop Trivial Groups . . . . . . . . . . . . . . . . . . . . . . . rwgroup --summarize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwgroup to Identify Specic Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . rwmatch With Incomplete ID Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwmatch With Full TCP Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwmatch for Mating TCP Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwmatch for Mating Traceroutes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwpmapbuild to Create a Spyware Pmap File . . . . . . . . . . . . . . . . . . . . . . . . . . rwfilter --pmap-saddress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwcut --pmap-file and sval Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using rwsort to sort ow records associated with types of spyware . . . . . . . . . . . . . . Using rwuniq to Count The Number of Flows Associated With Specic Types of Spyware . rwip2cc for Looking Up Country Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rwcut ----plugin=cutmatch.so to Use a Plug-in . . . . . . . . . . . . . . . . . . . . . . . xii
55 55 55 57 58 60 63 64 65 66 67 68 70 71 71 72 73 74 74 75 76 76 76 77 77 79 79 80 81 81 82 82 83 84 84 85 85 87 88 89 90 91 91 92 92 94 94 95 95 95 96 97
5-1 5-2 5-3 5-4 5-5 5-6 5-7 5-8 5-9
ThreeOrMore.py: Using PySiLK for Memory in rwfilter partitioning . . . Calling ThreeOrMore.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vpn.py: Using PySiLK with rwfilter for Partitioning Alternatives . . . . . matchblock.py: Using PySiLK with rwfilter for Structured Conditions . Calling matchblock.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . delta.py: Using PySiLK with rwcut to Display Combined Fields . . . . . . Calling delta.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . payload.py: Using PySiLK for Conditional Fields With rwsort and rwcut Calling payload.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
100 100 101 102 103 104 104 105 105
xiii
xiv
Handbook Goals
This analysts handbook is intended to provide a tutorial introduction to network trac analysis using the System for Internet-Level Knowledge (or SiLK) tool suite (http://tools.netsa.cert.org/silk/) for acquisition and analysis of network ow data. The SiLK tool suite is a highly-scalable ow-data capture and analysis system developed by the Network Situational Awareness group (NetSA) at Carnegie Mellon Universitys Software Engineering Institute (SEI). SiLK tools provide network security analysts with the means to understand, query, and summarize both recent and historical trac data represented as network ow records. The SiLK tools provide network security analysts with a relatively complete high-level view of trac across an enterprise network, subject to placement of sensors. Analysis using the SiLK tools has lent insight into various aspects of network behavior. Some example applications of this tool suite include (but are not limited to): Support for network forensics, identifying artifacts of intrusions, vulnerability exploits, worm behavior, etc. Providing service inventories for large and dynamic networks (on the order of a CIDR/8 block). Generating proles of network usage (bandwidth consumption) based on protocols and common communication patterns. Enabling non-signature-based scan detection and worm detection, for detection of limited-release malicious software and for identication of precursors. By providing a common basis for these various analyses, the tools provide a framework on which network situational awareness may be developed. Common questions addressed via ow analyses include (but arent limited to): Whats on my network? What happened before the event? Where are policy violations occurring? What are the most popular web sites? How much volume would be reduced by applying a blacklist? Do my users browse to known infected web servers? Do I have a spammer on my network? When did my web server stop responding to queries? Am I routing undesired trac? 1
Who uses my public DNS server? This handbook contains ve chapters: 1. The Networking Primer and Review of UNIX Skills provides a very brief overview of some of the background necessary to begin using the SiLK tools for analysis. It includes a brief introduction to Transmission Control Protocol/Internet Protocol (TCP/IP) networking and covers some of the UNIX command-line skills required to use the SiLK analysis tools. 2. The SiLK Network Flow Repository describes the structure of netow data, how netow trac data is collected from the enterprise network, and how it is organized. 3. Essential SiLK Tools describes how to use the SiLK tools for common tasks including data access, display, simple counting, and statistical description. 4. Trac Analysis Using the SiLK Tool Suite builds on the previous chapter and covers use of other SiLK tools for data analysis, including manipulating ow record les, packet-level analysis, and working with aggregates of ows and of IP addresses. 5. Using PySiLK For Advanced Analysis discusses how analysts can use the PySiLK scripting capabilities to facilitate more complex analyses eciently. This Analysts handbook is intended to be tutorial in nature, but it is not an exhaustive description of all options (or even all tools) in the SiLK tool suite. A more complete description (but less tutorial material) can be found in The SiLK Reference Guide (http://tools.netsa.cert.org/silk/reference-guide.html) or in the output resulting from using the --help or --man parameters with the various tools. The handbook deals solely with the analysis of network ow record data using an existing installation of the SiLK tool suite. For information on installing and conguring a new SiLK tool suite and on the collection of network ow record data for use in these analyses, the reader should consult the SiLK Installation Handbook (http://tools.netsa.cert.org/silk/install-handbook.pdf).
Chapter 1
Networking Primer and Review of UNIX Skills

This chapter of the handbook provides a review of basic topics in Transmission Control Protocol/Internet Protocol (TCP/IP) and UNIX operation. It is not intended as a comprehensive summary of these topics, but it will help to refresh your knowledge and prepare you for using the SiLK tools for analysis. Upon completion of this chapter you will be able to: describe the structure of IP packets, and the relationship between the protocols that comprise the IP suite explain the mechanics of TCP, such as the TCP state machine and TCP ags use basic UNIX tools
1.1
TCP/IP Networking Primer
This section provides an overview of the IP networking suite. IP, sometimes called TCP/IP, is the foundation of Internetworking. All packets analyzed by the SiLK system use protocols supported by the IP suite. These protocols behave in a well-dened manner, and one of the primary signs of a security breach can be a deviation from accepted behavior. In this section, you will learn about what is specied as accepted behavior. While there are common deviations from the specied behavior, knowing what is specied forms a base for further knowledge. This section is a refresher; the IP suite is a complex collection of more than 50 protocols, and it comprises far more information than can be covered in this section. There are a number of on-line documents and printed books that provide other resources on TCP/IP to further your understanding of the IP suite.
1.1.1
IP Protocol Layers
Figure 1.1 shows a basic breakdown of the protocol layers in IP. If youre familiar with the Open Systems Interconnection (OSI) seven-layer model, you will notice that this diagram is slightly dierent. IP predates the OSI model, and the correspondence between them is not exact. 3
Figure 1.1: IP Protocol Layers As Figure 1.1 shows, IP is broken into ve layers. The lowest layer, Hardware, covers the physical connections between machines: plugs, electronic pulses, and so on. The next layer is the Link layer, and it refers to the network transport protocol, such as Synchronous Optical Networks(SONET), Ethernet, Asynchronous Transfer Model(ATM), or Fiber Distributed Data Interface(FDDI). The third layer is the Internet layer, which is the rst layer at which IP aects the passing of data. This layered representation leads to terminology such as IP over ATM or IP over SONET. The Link layer imposes several constraints on the Internet layer. The most relevant from an analysis perspective is the maximum transmission unit (MTU). The MTU imposes an absolute limit on the number of bytes that can be transferred in a single frame and, therefore, a limit on datagram and packet size. The vast majority of enterprise network data is transferred over Ethernet at some point, leading to an eective MTU of 1500 bytes. The layer above Internet Transport refers to the transport protocol, such as TCP, Internet Control Message Protocol(ICMP), or User Datagram Protocol(UDP). These three transport protocols comprise the bulk of trac crossing most enterprise networks. The nal layer, Application, refers to the service supported by the protocol. For example, Web trac is an HTTP application running on a TCP transport over IP over an Ethernet network.
1.1.2
Structure of the IP Header
IP passes collections of data as datagrams. Figure 1.2 shows the breakdown of IP datagrams. Fields that are not recorded by the SiLK data collection tools are grayed out.
1.1.3
IP Addressing and Routing
IP can be thought of as a very-high-speed postal service. If someone in Pittsburgh sends a letter to someone in New York, the letter passes through a sequence of postal workers. The postal worker who touches the mail may be dierent every time a letter is sent, and the only important address is the destination. Also, 4
Figure 1.2: Structure of the IP Header there is no reason that New York has to respond to Pittsburgh, and if it does, the sequence of postal workers could be completely dierent. IP operates in the same fashion: there is a set of routers between various sites, and packets are sent to the routers the same way that the postal system passes letters back and forth. There is no requirement that the set of routers be used to pass data in must be the same as the set used to pass data out, and the routers can change at any time. Most importantly, the only IP address that must be valid in an IP connection is the destination address. IP itself does not require a valid source address, but other protocols (e.g., TCP) cannot complete without a valid source and destination address because the source needs to receive the acknowledgment packets to complete a connection. (However, there are numerous examples of intruders using incomplete connections for malicious purposes.)
Structure of an IP Address The Internet has space for approximately 4 billion unique IP version 4 (IPv4) addresses. While these IP addresses can be represented as 32-bit integers, they are generally represented as sets of four decimal integersfor example, 128.2.118.3, where each integer is a number between 0 and 255. IPv4 addresses and ranges of addresses can also be referred to using CIDR blocks. CIDR, short for Classless Inter-Domain Routing, is a standard for grouping together addresses for routing purposes. When an entity purchases Internet Protocol address space from the relevant authorities, that entity buys a routing block, which is used to direct packets to their network. CIDR blocks are usually written in a dot/mask notation, where the dot value is the type of dotted set described above, and the mask is the number of xed bits in the address. For example, 128.2.0.0/16 would refer to all IP addresses from 128.2.0.0 to 128.2.255.255. CIDR sizes range from 0 (the whole address is a 5
network)1 to 32 (the whole address is a host). With the introduction of IP version 6 (IPv6), all of this is changing. IPv6 addresses are 128 bits in length, for a staggering 4 1038 (400 undecillion) possible addresses. IPv6 addresses are represented as sets of eight hexadecimal (base 16) integers for example: FEDC:BA98:7654:3210:FEDC:BA98:7654:3210 Each integer is a number between 0 and FFFF (the hexadecimal equivalent of decimal 65535). The address space for IPv6 is so large that the designers anticipated addresses containing strings of 0 values, so they dened a shorthand of :: that can be used once in each address to represent a string of zeros. The address FEDC::3210 is therefore equivalent to: FEDC:0:0:0:0:0:0:3210 The routing methods for IPv6 addresses are beyond the scope of this handbook see RFC 4291 (http: //www.ietf.org/rfc/rfc4291.txt) for a description. CIDR blocks are still used with IPv6 addresses, as these addresses have no predened classes in the protocol. CIDR sizes can range between 0 and 128 in IPv6 addresses. In SiLK, the support for IPv6 is controlled by conguration. If you need to use IPv6 addresses, check with the person responsible for maintaining your data repository as to the support available. Reserved IP Addresses While IPv4 has approximately 4 billion addresses available, large segments of IP space are reserved for the maintenance and upkeep of the Internet. Various authoritative sources provide lists of the segments of IP space that are reserved. One notable reservation list is maintained by the Internet Assigned Numbers Authority (IANA) at http://www.iana.org/assignments/ipv4-address-space. IANA also keeps a list of IPv6 reservations at http://www.iana.org/assignments/ipv6-address-space. In addition to this list, the Internet Engineering Task Force (IETF) maintains several request for comments (RFC) documents that specify other reserved spaces. The majority of these spaces are listed in RFC 3330, Special Use IPv4 Addresses, at http://www.ietf.org/rfc/rfc3330.txt. Table 1.1 summarizes major IPv4 reserved spaces. IPv6 reserved spaces are shown in Table 1.2. In general, private space (in IPv4, 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16), auto-cong (169.254.0.0/16), and loopback (127.0.0.0/8) destination IP addresses should not be routed across network borders. Consequently, the appearance of these address spaces at routers indicates a failure of routing policy. Similarly, trac should not come into the enterprise network from these address spaces; the Internet as a whole should not route that trac to the enterprise network.
1.1.4
Major Protocols
Transmission Control Protocol (TCP) TCP is the most commonly encountered protocol on the Internet. TCP is a stream-based protocol that reliably transmits data from the source to the destination. To maintain this reliability, TCP is very complex: the protocol is very slow and requires a large commitment of resources.
1 CIDR/0 addresses used almost exclusively for empty routing tables, and are not accepted by the SiLK tools. This eectively means the range for CIDR blocks is 1-32 for IPv4 data.
Table 1.1: IPv4 Reserved Addresses Space 0.0.0.0/8 10.0.0.0/8 127.0.0.0/8 169.254.0.0/16 172.16.0.0/12 192.0.2.0/24 192.88.99.0/24 192.168.0.0/16 198.18.0.0/18 198.19.0.0/18 224.0.0.0/4 240.0.0.0/4 255.255.255.255 Reason Current Network (self-reference) addresses Reserved for private networks Loopback (self-address) addresses Autoconguration (address unavailable) addresses Reserved for private networks Reserved for Documentation (example.com or example.net) 6to4 Relay Anycast Prex (border between IPv6 and IPv4) Reserved for private networks Reserved for Router Input Ports Reserved for Router Output Ports Multicast Addresses Future Use address Limited Broadcast Address
Table 1.2: IPv6 Reserved Addresses Space 0::0 0::1 FC01::0/16 FC00::0/16 FE80::0/64 FF01-FF0F::0/16 Reason Unspecied Address Loopback Address Reserved for Local Addresses Reserved for Future Local Addresses Reserved for Link-Local Addresses Reserved Multicast Addresses
Figure 1.3 shows a breakdown of the TCP header. A TCP header adds 20 additional bytes to the IP header. Consequently, TCP packets will always be at least 40 bytes long. As the shaded portions of Figure 1.3 shows, most of the TCP header information is not retained in SiLK ow records.
Figure 1.3: TCP Header TCP is built on top of an unreliable infrastructure. IP assumes that packets can be lost without a problem, and that responsibility for managing packet loss is incumbent on services at higher layers. TCP, which provides ordered and reliable streams on top of this unreliable packet-passing model, implements this feature through a complex state machine as shown in Figure 1.4. The transitions in this state machine are described by stimulus / action format labels, where the top value is the stimulating event and the bottom values are actions taken prior to entry into the destination state. Where no action takes place, an x is used to indicate explicit inaction. We will not thoroughly describe the state machine in this handbook, but we do want to emphasize that because of TCPs requirements, ows representing well-behaved TCP sessions will behave in certain ways. For example, a ow for a complete TCP sessions must have at least four packets: one packet that sets up the connection, one packet that contains the data, one packet that terminates the session, and one packet acknowledging the other sides termination of the session2 . TCP behavior that deviates from this provides indicators that can be used by an analyst. An intruder may send packets with odd TCP ag combinations as part of a scan (e.g., with all ags set on). Dierent operating systems handle protocol violations dierently, so odd packets can be used to elicit information that identies the operating system in use. TCP Flags used ags: TCP uses ags to transmit state information among participants. There are six commonly
SYN: Short for synchronize, the SYN ag is sent at the beginning of a session to establish initial sequence numbers. Each side sends one SYN packet at the beginning of a session.
2 It is technically possible for there to be a valid 3-packet complete TCP ow: one SYN packet, one SYN-ACK packet containing the data, and one RST packet terminating the ow. This is a very rare circumstance; most complete TCP ows have more than four packets.
Figure 1.4: TCP State Machine
ACK: Short for acknowledge, ACK ags are sent in almost all TCP connections and are used to indicate that a previously sent packet has been received. FIN: Short for nalize, the FIN ag is used to terminate a session. When a packet with the FIN ag is sent, the target of the FIN ag cleanly terminates the TCP session. RST: Short for reset, the RST ag is sent to indicate that a session is incorrect and should be terminated. When a target receives a RST ag, it terminates immediately. Some stacks terminate sessions using RST instead of the more proper FIN sequence. PSH: Short for push, the PSH ag was formerly used to inform a receiver that the data sent in the packet should immediately be sent to the target application (i.e., the sender has completed this particular send). The PSH ag is largely obsolete, but it still commonly appears in TCP trac. URG: Short for urgent data, the URG ag is used to indicate that urgent data (such as a signal from the sending application) is in the buer and should be used rst. Tricks with URG ags can be used to fool IDS systems. Reviewing the state machine will show that most state transitions are handled through the use of SYN, ACK, FIN, and RST. The PSH and URG ags are less directly relevant. There are two other rarely used ags: ECE (Explicit Congestion Notication Echo) and CWR (Congestion Window Reduced). Neither are relevant to security analysis at this time, although they can be used with the SiLK tool suite if required.
Major TCP Services Traditional TCP services have well-known ports: for example, 80 is Web, 25 is SMTP, and 53 is DNS. IANA maintains a list of these port numbers at http://www.iana.org/assignments/port-numbers. This list is useful for legitimate services, but it does not necessarily contain new services or accurate port assignments for rapidly-changing services such as those implemented via peer-to-peer networks. Furthermore, there is no guarantee that trac seen, for example, on port 80 is actually web trac, or that web trac cannot be sent on other ports.
UDP and ICMP After TCP, the most common protocols on the Internet are UDP and ICMP. UDP is a fast but unreliable message-passing mechanism used for services where throughput is more critical than accuracy. Examples include audio/video streaming, as well as heavy-use services such as the Domain Name Service(DNS). ICMP is a reporting protocol: ICMP sends error messages and status updates.
Figure 1.5: UDP and ICMP Headers
10
UDP and ICMP Packet Structure Figure 1.5 shows a breakdown of UDP and ICMP packets, as well as the elds collected by SiLK. UDP can be thought of as TCP without the additional state mechanisms; a UDP packet has both a source and destination port, assigned in the same way TCP assigns them, as well as a payload. ICMP is a straight message-passing protocol and includes a large amount of information in its rst two elds: the type and code. The type eld is a single byte indicating a general class of message, such as host unreachable. The code eld contains a byte indicating what the message is within the type, such as route to host not found. ICMP messages generally have a limited payload; most messages have a xed size based on type, with the notable exceptions being echo request (type 0, code 0) and echo reply (type 8, code 0). Ocially, ICMP is at the same protocol layer as IP, because its primary purpose is to issue IP error messages. However, it shares many similarities with transport layer protocols, such as having its own header embedded within the IP packet, and therefore is treated as a transport layer protocol in this handbook.
Major UDP Services and ICMP Messages UDP services are covered in the IANA URL listed above. As with TCP, the values given by IANA are slightly behind those currently observed on the Internet. IANA also excludes port utilization (even if common) by malicious software such as worms. Although not ocial, there are numerous port databases on the Web that can provide insight into the current port utilization by services. ICMP types and codes are well dened, and the most recent list is at http://www.iana.org/assignments/ icmp-parameters. This list is the denitive list, and includes references to RFCs explaining the types and codes.
1.2
Review of UNIX Skills
In this section, we provide a review of basic UNIX operations. SiLK is implemented on Linux and Solaris, and consequently you will need to be able to work with UNIX to use the SiLK tools.
1.2.1
Using the UNIX Command Line
When working on the command line, you should see a prompt like the following:
<1>$ Example 1-1: A UNIX Command Prompt This example shows the standard command prompt for this document. The integer between angle brackets will be used to refer to specic commands in examples. Commands can be invoked by typing them directly at the command line. UNIX commands are typically abbreviated English words, and accept space-separated parameters; some parameters are prexed by one or two dashes. Table 1.3 lists some of the more common UNIX commands. More information on these commands can be found by typing man followed by the command name. Example 1-2 (and the rest of the examples in this handbook) shows the use of some of these commands. 11
Table 1.3: Some Common UNIX Commands Command cat cp cut date echo exit le head join kill ls man mv ps rm sed sort tail time top wait wc which Description copy a stream or le onto standard output (show le content) copy a le from one name or directory to another isolate one or more columns from a le show current day and time put arguments onto standard output terminate current command interpreter (log out) identify type of content in the le show rst few lines of a les content bring together columns in two les terminate a job or process list les in current (or specied) directory -l (for long) parameter indicates show all directory information show the on-line documentation on a command or le rename a le or transfer it from one directory to another list processes on the host remove a le edit the lines on standard input and put on standard output sort content of le into lexicographic order show last few lines of a les content show execution time of a command show running processes with highest CPU utilization wait for all background commands to nish count words (or, with -l parameter, lines) in a le locate a commands executable le
12
<1>$ echo "Hello" > myfile <2>$ cat myfile Hello <3>$ ls -l myfile -rw-r--r-- 1 tshimeal none 6 Oct <4> cat <<END_NEW_LINES >>myfile a b c END_NEW_LINES <5>$ wc -l myfile 4 myfile <6>$ rm myfile
6 11:59 myfile
Example 1-2: Example Using Common UNIX Commands Some advanced examples in this handbook will use control structures available from the Bash shell (one of the UNIX command interpreters). The syntax for name in expression; do ...done indicates a loop where each of the values returned by expression is given in turn to the variable indicated by name (and referenced as $name), and the commands in between do and done are executed with that value. The syntax while expression; do ... done indicates a loop where the commands between do and done are executed as long as expression evaluates true. A backslash at the end of a line indicates that the command is continued on the following line. Example 1-3 shows how almost all SiLK applications are invoked: the user calls rwfilter (command 1), specifying some data of interest, and then the results are passed to another application (command 2).
<1>$ rwfilter --start-date=2010/08/09:00 --end-date=2010/08/09:01 \ --type=in --proto=6 --pass=aug9.raw <2>$ rwtotal --proto --sip-zero aug9.raw protocol| Records| Bytes| Packets| 6| 34428003| 114824656571| 387766604| Example 1-3: A Simple Command Line
1.2.2
Using Pipes
The SiLK tools are designed to intercommunicate via pipes, in particular the stdout (standard output) and stderr (standard error) pipes. Communication by pipes is done by redirection, where the data sent via one pipe is sent to a program, another pipe, or a le. Many of the examples in the following chapters use pipes. Example 1-4 shows the use of pipes to do the same thing as Example 1-3. 13
$ rwfilter --type=all --proto=6 --pass=stdout \ --start-date=2010/08/09:00 --end-date=2010/08/09:01 | \ rwtotal --proto --skip-zero protocol| Records| Bytes| Packets| 6| 98454957| 1675742086673| 2444828416| Example 1-4: A Simple Piped Command SiLK applications can also communicate via named pipes, which allow multiple channels of communication to be opened simultaneously. A named pipe is a special le that behaves like the stdout or stderr, and is created using the UNIX mkfo command (for MaKe First-In-First-Out). In the Example 1-5, we create a named pipe (in Command 1) that one call to rwfilter (in Command 2) uses to lter data concurrently with another call to rwfilter (in Command 3). Results of these calls are shown in Commands 4 and 5. Using named pipes, sophisticated SiLK operations can be built in parallel. However, the user needs to ensure that any command that will read from the named pipe is started after any command that writes to the named pipe.
<1>$ mkfifo /tmp/test-output <2>$ rwfilter --type=all --start-date=2010/08/09:00 --end-date=2010/08/09:01 \ --sensor=29 --proto=6 --pass=stdout --fail=/tmp/test-output | rwuniq --fields=5 > tcp.out & [1] 23695 23696 <3>$ rwfilter --input-pipe=/tmp/test-output --proto=17 --pass=stdout \ | rwuniq --fields=5 > udp.out & [2] 23697 23698 <4>$ wait [2] Done rwfilter --input-pipe=/tmp/test-output --proto=17 ... [1] + Done rwfilter --type=all --start-date=2010/08/09:00 ... <5>$ cat tcp.out pro| Records| 6| 1409344| <6>$ cat udp.out pro| Records| 17| 491309| Example 1-5: Using a Named Pipe
14
Chapter 2
The SiLK Flow Repository

This chapter introduces the tools and techniques used to store information about sequences of packets as they are collected on an enterprise network for SiLK (referred to as network ow or network ow data and occasionally just ow). This chapter will help an analyst become familiar with the structure of network ow data, how the collection system gathers network ow data from sensors, and how to access that data.
2.1
What Is Network Flow Data?
Netow is a trac-summarizing format that was rst implemented by Cisco Systems and other router manufacturing companies, primarily for billing purposes. Network ow data (or Network ow) is a generalization of netow. Network ow data is collected to support several dierent types of analyses of network trac (some of which are described later in this handbook). Network ow collection diers from direct packet capture, such as tcpdump, in that it builds a summary of communications between sources and destinations on a network. This summary covers all trac matching seven particular keys that are relevant for addressing: the source and destination IP addresses, the source and destination ports, the protocol type, the type of service, and the interface. We use ve of these attributes to constitute the ow label in SiLK: the source and destination addresses, the source and destination ports, and the protocol. These attributes, together with the start time of each network ow, distinguish network ows from each other. A network ow often covers multiple packets, which are grouped together under common labels. A ow record thus provides the label and statistics on the packets that the network ow covers, including the number of packets covered by the ow, the total number of bytes, and the duration and timing of those packets. Because network ow is a summary of trac, it does not contain packet payload data. Payload data is expensive to retain on a large, busy network. Each network ow we record is very small (it can be as low as 22 bytes, but is determined by several conguration parameters), and even at that size one may collect many gigabytes of trac daily on a busy network.
2.1.1
Structure of a Flow Record
A ow le is a series of ow records. A ow record holds all the data SiLK retains from the collection process: the ow label elds, start time, number of packets, duration of ow, and so on. 15
2.2
Flow Generation and Collection
Every day, SiLK may collect many gigabytes (GB) of network ow data from across the enterprise network. Given both the volume and complexity of this data, it is critical to understand how this data is recorded. In this section, we will review the collection process and show how data is stored as network ow records. A network ow record is generated by sensors throughout the enterprise network. The majority of these may be routers, although specialized sensors, such as yaf (http://tools.netsa.cert.org/yaf/), can also be used when it is desirable to avoid artifacts in a routers implementation of network ow or to use non-devicespecic network ow data formats, such as IPFIX (http://www.ietf.org/html.charters/ipfix-charter. html), or for more control over network ow record generation.1 A sensor generates network ow records by grouping together packets that are closely related in time and have a common ow label. Closely related is dened by the router, and is typically set to around 15 seconds. Figure 2.1 shows the generation of ows from packets. Case 1 in that gure diagrams ow record generation when all the packets for a ow are contiguous and uninterrupted. Case 2 diagrams ow record generation when there are several ows collected in parallel. Case 3 diagrams ow record generation when timeout occurs, as discussed below. Network ow is an approximation of trac, not a natural law. Routers and other sensors make a guess when they generate ow records, but these guesses are not perfect; there are several well-known phenomena in which a long-lived session will be split into multiple ow records: 1. Active timeout is the most common cause of a split network ow. Network ow records are purged and restarted after a congurable time of activity. As a result, all network ows have an upper limit on their duration that depends on the local conguration. A typical value would be around 30 minutes. 2. Cache ush is a common cause of split network ows for router-collected network ow records. Network ows take up memory resources in the router, and the router regularly purges this cache of network ows for housekeeping purposes. The cache ush takes place approximately every 30 minutes as well. A plot network ows over a long period of time shows many network ows terminate at regular 30-minute intervals, which is a result of the cache ush. 3. Router exhaustion also causes split network ows for router-collected ows. The router has limited processing and memory resources devoted to network ow. During periods of stress, the ow cache will ll and empty more often due to the number of network ows collected by the router. Use of specialized ow sensors can avoid or minimize cache-ush and router-exhaustion issues. All of these cases involve network ows that are long enough to be split. As we will show later, the majority of network ows collected at the enterprise network border are small and short-lived.
2.3
Introduction to Flow Collection
An enterprise network comprises a variety of organizations and systems. The ow data to be handled by SiLK is rst processed by the collection system, which receives ow records from the sensors and organizes them for later analysis. The collection system may collect data through a set of sensors that includes both routers and specialized sensors and is positioned throughout the enterprise network. Analysis is performed using a custom set of software called the SiLK analysis tool suite. The majority of this document provides training in the use of the SiLK tool suite. The SiLK project is active, meaning that the system is continuously improved as time passes. These improvements include new tools and revisions to existing analysis software, as well as changes in the data-collection systems.
1 yaf may also be used to convert packet data to network ow records, via a script that automates this process. See Section 4.3.
16
Figure 2.1: From Packets to Flows
17
2.3.1
Where Network Flow Data Is Collected
While complex networks may segregate ow records based on where the records were collected (e.g., the network border, major points within the border, at other points), the generic implementation of the SiLK collection system defaults to collection only at the network border, as is diagrammed in Figure 2.2. The default implementation has only one class of sensors: all. Further segregation of the data is done by type of trac.
Figure 2.2: Default Trac Type for Sensors The SiLK tool mapsid produces a list of sensors in use for a specic installation, reecting its conguration. Example 2-1 shows calls to mapsid. When mapsid is called without parameters, it produces a list of all sensors (see command 1 in Example 2-1). When called with a space-delimited list of integers, it produces a map from those values to the corresponding sensor names (see command 3 in Example 2-1). When 18
called with a list of sensor names (see command 4 in Example 2-1), it produces a map from those names to sensor numbers. For an explanation of the exact physical location of each sensor, contact the person responsible for maintaining the data repository. If the installation supports diering classes of sensors, using the --print-class parameter can also give information as to what classes of data are produced by each sensor (see commands 2 and 5 in Example 2-1).
<1> $ mapsid 0 -> SEN-CENT 1 -> SEN-NORTH 2 -> SEN-SOUTH 3 -> SEN-EAST 4 -> SEN-WEST <2> $ mapsid --print-class | head -3 0 -> SEN-CENT [c1,c2] 1 -> SEN-NORTH [c1,c2,c3] 2 -> SEN-SOUTH [c1,c2] <3> $ mapsid 0 2 4 0 -> SEN-CENT 2 -> SEN-SOUTH 4 -> SEN-WEST <4> $ mapsid SEN-NORTH SEN-EAST SEN-NORTH -> 1 SEN-EAST -> 3 <5> $ mapsid --print-class SEN-NORTH SEN-EAST SEN-NORTH -> 1 [c1,c2,c3] SEN-EAST -> 3 [c1,c2,c3] Example 2-1: Using mapsid to Obtain a List of Sensors
2.3.2
Types of Enterprise Network Trac
In SiLK, the term type refers to the direction of trac, rather than a content-based characteristic. In the generic implementation (as shown in Figure 2.2), there are six basic types: in and inweb, which is trac coming from the ISP to the enterprise network through the border router (Web trac is separated out, due to its volume); innull, which is trac from the upstream ISP that is not passed across the border router (either sent to the routers IP address, or dropped due to a router access control list); out and outweb, which is trac coming from the enterprise network to the ISP through the border router; and outnull, which is trac from the enterprise network that is not passed across the border router. These types are congurable, and congurations vary as to which types are in actual use see the discussion below on sensor class and type. There is also a constructed type all that selects all types of ows associated with a class of sensors.
2.3.3
The Collection System and Data Management
To understand how to use SiLK for analysis, it is useful to have some understanding of how data is collected, stored, and managed. Understanding how the data is partitioned can produce faster queries by reducing the amount of data searched. In addition, by understanding how the sensors complement each other, it is possible to gather trac data even when a specic sensor has failed. 19
Data collection starts when a ow is generated by one of the sensorseither a router or a dedicated sensor. Flows are generated when a packet relevant to the ow is seen, but a ow is not reported until it is complete or is ushed from the cache. Consequently, a ow can be seen some time (depending on timeout conguration, and on sensor caching, among other factors) after the start time of the rst packet in the ow. Data generated through dedicated sensors, as well as data from other routers, is sent to the central SiLK repository using transfer facilities called FloCap (ow capacitor). FloCap technology improves the reliability of ow transfer and prioritizes the ows that are sent to the repository in the case of an emergency. The primary focus of FloCap is to ensure that routed data arrives in as complete a form as possible. Once data is received by the repository, it is packed into the reduced format by the packing software.2 Packed ows are stored into les indicated by class, type, sensor and hour in which the ow started. So a sample path to a le could be /data/all/in/2005/11/01/allin-SEN1_20051101.15 for trac coming from the ISP through the border router on November 1, 2005 for ows starting between 3:00 and 3:59 p.m. Greenwich Mean Time (GMT).
Important Considerations When Accessing Flow Data While SiLK allows rapid access and analysis of network trac data, the amount of data crossing the enterprise network could be extremely large. There are a variety of techniques intended to optimize the queries and this section will go over some general guidelines for more rapid data analysis. Usually, the amount of data associated with any particular event is relatively small. All the trac from a particular workstation or server may be recorded in a few thousand records at most for a given day. Most of the time in an initial query involves simply pulling and analyzing the relevant records. As a result, query time can be reduced by simply manipulating the selection parameters, in particular --type, --start-date, --end-date, and --sensor. If it is known when a particular event occurred, then reducing the search time by using --start-date and --end-dates hour facilities will increase eciency (i.e., --start-date=2005/11/01:12 --end-date=2005/11/01:14 is more ecient than --start-date=2005/11/01:00 --end-date=2005/11/01:23). Another useful, but less-certain technique is to limit queries by sensor. Since routing is relatively static, the same IP address will generally enter or leave through the same sensor, which can be derived by using rwuniq --fields=sensor (see Section 3.7) and a short (1 hour) probe on the data to identify which sensors are associated with a particular IP address. This technique is especially applicable for long (such as multimonth) queries, and usually requires some interaction, since rerouting does occur during normal operation. To use this technique for long queries, start by identifying the sensors using rwuniq, query for some extensive period of time using those sensors, and then plot the results using rwcount. If an analyst sees a sudden drop in trac from those sensors, the analyst should check the data around the time of this drop to see if trac was routed through a dierent sensor.
2.3.4
How Network-Flow Data Is Organized
The data repository is accessed through the use of SiLK tools, particularly the rwfilter command-line application. An analyst using rwfilter should specify the type of data desired to view by using a set of ve selection parameters. This handbook will discuss selection parameters in more depth in Section 3.2; this section will briey outline how data is stored in the repository.
2 The trac between FloCap and the repository is not excluded from collection by ow sensors, but unless multiple levels of sensors are being used within the Enterprise Architectures, it occurs in a way that will not pass a sensor.
20
Dates Repository data is stored in hourly divisions, which are referred to in the form YYYY/MM/DD:HH in Greenwich Mean Time. Thus, 11a.m. on May 23, 2005, in Pittsburgh would be referred to as 2005/05/23:15 when compensating for the dierence between Greenwich Mean Time and Eastern Daylight Time. In general, a particular hour starts being recorded at that hour and will be written to until some time after the end of the hour. Under ideal conditions, the last long-lived ows will be written to the le soon after they time out (e.g., if the active timeout is 30 minutes, the ows will be written out 30 minutes plus propagation time after the end of the hour). Under adverse network conditions, however, ows could accumulate on the sensor under FloCap until they can be delivered. So, we would expect that under normal conditions the le for 2005/03/22 20:00 GMT would have data starting at 3 p.m. in Pittsburgh and would stop being updated after 4:30 p.m. in Pittsburgh. Sensors: Class and Type Data is divided by time, and by sensor. The classes of sensors that are available are determined by the installation. By default, there is only one class all but based on analytical interest, other classes may be congured as needed. As shown in Figure 2.2, each class of sensor has several types of trac associated with it: typically in, inweb, out, and outweb. To nd out what classes and types are supported by the installation, look at the output of rwfilter --help that describes --class and --type. Data types are used for two reasons: (1) they group data together into common directions, and (2) they split o major query classes. As shown in Figure 2.2, most data types have a companion web type (i.e., in, inweb, out, outweb). Web trac generally constitute about 50% of the ows in any direction; by splitting the web trac into a separate type, we reduce query time. Most queries to repository data access one class of data at a time, but multiple types.
2.4
SiLK support
The SiLK tool suite is available in open-source form from http://tools.netsa.cert.org/silk/. The CERT Network Situational Awareness group also supports FloCon, a workshop devoted to ow analysis. More information on FloCon can be found at http://www.cert.org/flocon. The primary SiLK mailing lists are described below: silk-help@cert.org: silk-help is for bug reports and general inquiries related to SiLK. It provides relatively quick response from users and maintainers of the SiLK tool suite. While a specic response time cannot be guaranteed, silk-help has proved to be a valuable asset for bugs and usage issues. flocommunity@cert.org: FloCommunity is a community of analysts built on the core of the FloCon conference (http://www.cert.org/flocon). The initial focus is on ow-based network analysis, but the scope will likely naturally expand to cover other areas of network security analysis. The list is not focused exclusively on FloCon itself, though it will include announcements of FloCon events. The general philosophy of this email list and site is inclusive: we intend to include international participants from both research and operational environments. Participants may come from universities, corporations, government entities, and contractors. Additional information is accessible via the FloCommunity Web page (http://www.cert.org/flocommunity/).
21
22
Chapter 3
Essential SiLK Tools

This chapter describes analyses with the six fundamental SiLK tools: rwfilter, rwstats, rwcount, rwcut, rwsort, and rwuniq. These tools are introduced through example analyses, with their more general usage briey described. At the end of this chapter, the analyst will be able to
use rwfilter to select records understand the basic partitioning parameters, including how to express IP addresses, times, and ports be able to perform and display basic analyses using the SiLK tools and a shell scripting language
3.1
Suite Introduction
The SiLK analysis suite consists of more than 30 command-line Unix tools that rapidly process ow records. The tools can intercommunicate with each other and with scripting tools via pipes; redirection is supported using both stdin/stdout and with named pipes. Flow analysis is generally input/output boundthe amount of time required to perform an analysis is proportional to the amount of data read o of disk. The primary goal of the SiLK tool suite is to reduce that access time to a minimum. The SiLK tools replicate many standard functions from command-line tools that are common to the UNIX operating system, and from higher-level scripting languages such as Perl. However, the SiLK tools process this data in binary form and use data structures optimized specically for analysis. Consequently, most SiLK analysis consists of a sequence of operations using the SiLK tools. These operations typically start with in initial rwfilter call to retrieve data of interest, and culminate in a nal call to a text output tool like rwcut or rwuniq to summarize the data for presentation. Once text is generated, the analyst can create and run scripts on that text output at a much higher speed than would be possible if the text were generated at an earlier stage of the analysis. In some ways, it is appropriate to think of SiLK as an awareness toolkit. The repository provides large volumes of data and the tool suite provides the capabilities needed to process this data, but the actual insights are derived from analysts. 23
3.2
Selecting Records with rwfilter
rwfilter is the most used command in the SiLK analysis tool suite. It serves as the starting point for most analyses (as will be seen in the examples that follow). It both retrieves data and partitions data to isolate ow records of interest. It also has the most parameters (by far) of any command in the SiLK tool suite. These parameters have grown as the tool has matured, driven by users needs for more expressiveness in record selection. Most of the time, rwfilter is used in conjunction with other analysis tools. However, it is also a very useful analytical tool on its own. As a simple example, consider Example 3-1, which uses rwfilter to print volume information on trac from the enterprise network to an external network of interest over an eight-hour period1 . The results show that the enterprise network sent 3,288 ows to the external network, covering an aggregate of 16,316 packets containing a total of 968,011 bytes. Over time, an analyst can use calls like this to track trac to the external network.
<1>$ rwfilter --type=out --start-date=2010/08/02:00 \ --end-date=2010/08/02:07 --daddress=10.5.0.0/16 --print-volume-stat | Recs| Packets| Bytes| Files| Total| 515359| 2722887| 1343819719| 180| Pass| 3288| 16316| 968011| | Fail| 512071| 2706571| 1342851708| | Example 3-1: Using rwfilter to Count Trac to an External Network Although parameters may occur in any order, a high-level view of the rwfilter command is rwfilter [input] [selection] [partition] [output] [other] Figure 3.1 shows a high-level abstraction of the control ows in rwfilter, as aected by its dierent parameters. Input parameters specify whether to pull ow records from one of a pipe, record les, or (the default) the repository. When pulling from the repository, selection parameters specify what parts of the repository from which to pull records. Each source accessed to pull records can be listed to standard error using --print-filenames. When pulling from a pipe or le, a restricted set of selection parameters can be used as partitioning parameters. The main eort in composing calls to rwfilter calls lies in the specication of records via partitioning parameters, and rwfilter supports a very rich library of these parameters. Once records are partitioned, those meeting or failing to meet the specied criteria can be sent either to a pipe or a le via the output parameters. Lastly, there are other parameters (such as --help) that can give useful information but do not access ow records.
1 The
command and its results have been anonymized to protect the privacy of the enterprise network
24
PIPE
INPUT PARAMETERS --print- lenames
--class --type --sensor -- owtypes
FILE PARTITIONING PARAMETERS SELECTION PARAMETERS
REPOSITORY
OUTPUT PARAMETERS PIPE
FILE OTHER PARAMETERS
Figure 3.1: rwfilter Parameter Relationships A simple example is the call to rwfilter in the initial example presented (Example 3-1). That call uses selection parameters to access all outgoing records in the default class that describe ows that started between 00:00:00 and 07:59:59 GMT on August 2, 2010. The --daddress parameter is the partitioning parameter, and the --print-volume-stat parameter is the output parameter.
3.2.1
rwfilter Parameters
Input parameters (described in Table 3.1) specify from where rwfilter obtains ow records: from the repository, from a pipe, or from ow record les. This example implicitly uses the default parameter, --data-rootdir, with its default argument (set by conguration) to pull from the repository. (Later examples will show other input parameters.) rwfilter can take input from zero or more previously generated ow record les. If a common set of input les is used several times, use the --xargs parameter, putting the list of input le names into a text le with one name per line. Calling rwfilter with zero ow record les requires that one of the other input options be specied. Table 3.1: rwfilter Input Parameters Description Read SiLK ow records from a pipe Root of data repository (default) File holding list of lenames to pull records from Name of le containing previously extracted data
Parameter --input-pipe --data-rootdir --xargs
Example stdin /data mylist.txt inle.raw
Selection parameters (described in Table 3.2) are used when rwfilter pulls data from the repository to specify what part of the repository from which to pull the data. In Example 3-1, the call to rwfilter uses 25
three selection parameters: --start-date, --end-date, and --type (--class is left to its default value, which in many implementations is all; --sensor is also left to its default value, which is all sensors of the class). The --start-date and --end-date parameters specify that this pull applies to eight hours worth of trac: 00:00:00 GMT to 07:59:59 GMT on August 2, 2010 (the parameters to --start-date and --end-date are inclusive and may be arbitrarily far apart, depending on what dates are present in the repository, although neither may be set beyond the current date and time). The --type parameter species that outgoing general ow records are to be pulled within the specied time range. Each unique combination of selection parameters (root directory, class, type, sensor, and time) maps to one or more ow record les in the repository (depending on the number of hours included in the time). In this example, 180 les are accessed. Specifying more selection parameters results in less data being examined and thus faster queries. Be sure to understand what trac is included in each available class and type, and to include all relevant types in any query, but to exclude as many irrelevant types for improved performance. --flowtypes is used to specify queries across multiple classes, while restricting the types of interest on each class. Use this parameter carefully, as it is easy to specify LOTS of records to lter, which reduces performance. Table 3.2: rwfilter Selection Parameters Example Description 2005/03/01:00 First hour of data to examine 2005/03/20:23 Final hour of data to examine all Sensor class to select data within times inweb,in,outweb,out Type of data within class and times c1/in,c2/all process data of specied classes and types 1-5 Sensor used to collect data
Parameter --start-date --end-date --class --type --flowtypes --sensor
Partitioning parameters are used to divide the input records into two groups: (1) pass records, which meet all the tests specied by the partitioning parameters, and (2) fail records, which do not meet at least one of the tests specied by the partitioning parameters. Each call to rwfilter must have at least one partitioning parameter. In Example 3-1, ow records to a specic network are desired, so the call uses a --daddress parameter with an argument of CIDR block 10.5.0.0/16 (the specic network). Occasionally, all records for a given set of selection parameters are desired, so (by convention) an analyst uses --proto with an argument of 0-255, which is a test that can be met by all IP trac, since this is the range allocated for IP protocols by IANA.2 Partitioning parameters are the most numerous, to provide a large amount of exibility in describing what ow records are desired. Later examples will show several partitioning parameters. (See rwfilter --help for a full listing; a few of the more commonly used parameters are listed in Table 3.3.) As shown in Figure 3.2, there are several groups of partitioning parameters. This section focuses on the parameters that partition based on elds of ow records. Section 4.5 discusses IP Sets and how to lter with those sets. Section 4.8 describes pmaps and country codes. Section 3.2.6 discusses tuple les and the parameters that use them. The use of dynamic libraries is dealt with in Section 4.9. Lastly, Section 5.1 describes the use of PySiLK plug-ins. Partitioning parameters specify a collection of acceptable options, such as the protocols 6 and 17 or the specic IP address 10.1.23.14. As a result, almost all partitioning parameters describe some group of values. These ranges are generally expressed in the following ways: Value range: Value ranges are used when all values in a closed interval are desired. A value range is two numbers separated by a dash, such as --proto=3-65, which indicates that ow records with protocol numbers from 3 through 65 (inclusive) are desired. Some partitioning parameters (such as --packets) demand a value range; if only a single value is desired, use the value on both sides of the
2 See http://www.iana.org/assignments/protocol-numbers; in IPv4 this is the protocol eld in the header, but in IPv6 this is the next-header eld both have the range 0-255.
26
Parameter --protocol --packets --flags-all --saddress --daddress --any-address --sport --dport --aport
Table 3.3: Example 6 1-3 R/SRF
Commonly-Used rwfilter Partitioning Parameters Description Which protocol number (6=TCP, 17=UDP, 1=ICMP) to lter Filter ow records that are in the specied range of packet counts Filter ow records that have the specied ags set and not set (TCP only) 10.2.1.3,237 Filter ow records for source address 10.2.1.3-5 Like --saddress, but for destination 10.2.1.x Like --saddress, but for either source or destination 0-1023 Filter ow records for source port 25 Like --sport, but for destination port 80,8080 Like --sport, but for either source or destination
dash (--packets=5-5). A missing value on the end of the range (e.g., --bytes=2048-) species that any value greater than or equal to the other value is desired. Missing values at the start of a range are not permitted. Value alternatives: Fields that have a nite set of values (such as ports or protocol) can be expressed using a comma-separated list. In this format a eld is expressed as a set of numbers separated by commas. When only one value is acceptable, that value is presented without a comma. Examples include --proto=3 and --proto=3,9,12. Value ranges can be used as elements of value alternative lists. For example, --proto=0,2-5,7-16,18-255 says that all ow records that are not for ICMP, TCP or UDP trac are desired. Time ranges: Time ranges are two times, potentially down to the millisecond, separated by a dash; in SiLK, these times can be expressed in their full YYYY/MM/DD:HH:MM:SS.mmm form (e.g., 2005/02/11:03:18:00.005-2005/02/11:05:00:00.243). Times may be abbreviated with their natural interpretation: 2005/02/11 is equivalent to 2005/02/11:00:00:00.000. IP addresses: IP addresses are expressed in two ways. The most common expression is a list of value alternatives, separated by appropriate punctuation as described in Section 1.1.3. For example, 113.1.1.1 would select the addresses 1.1.1.1, 2.1.1.1, and so on until 13.1.1.1. For convenience, the letter x can be used to indicate all values in a section (equivalent to 0-255 in IPv4 addresses, 0-FFFF in IPv6 addresses). CIDR notation may also be used, so 1.1.0.0/16 is equivalent to 1.1.x.x and 1.1.0255.0-255. As explained in Section 1.1.3, IPv6 addresses use a double-colon syntax as a shorthand for any sequence of zero values in the address, as well as CIDR notation. TCP ags: The --flags-all, --flags-session and --flags-initial parameters to rwfilter use a compact, yet powerful, way of specifying lter predicates based on the state of the TCP ags. The argument to this parameter has two sets of TCP ags separated by a forward slash (/). The ag-set to the right of the slash contains the mask ; this set lists the ags whose status is of interest, and the set must be non-empty. To the left of the slash is the high ag-set; it lists the ags that must be set for the ow record to pass the lter. Flags listed in the mask-set but not in the high-set must be o. The ags listed in the high-set must be present in the mask-set. (For example, --flags-initial=S/SA species a lter for ow records that initiate a TCP session.) See Example 3-2 for another sample use of this parameter. Country codes: The --scc and --dcc parameters take a comma-separated list of two-letter country codes, as specied by the Internet Assigned Names Authority3 . There are also four special codes: -- for unknown, a1 for anonymous proxy, a2 for satellite provider, and o1 for other.
3 http://www.iana.org/domains/root/db/
27
Flow Record Fields IP Sets User pmaps and Country Codes Tuples Dynamic Libs PySiLK
Figure 3.2: rwfilter Partitioning Parameters Attributes: The --attributes parameter takes any combination of the letters F, T, and C, expressed in high/mask notation just as for TCP ags. F indicates the collector saw additional packets after a packet with a FIN ag. (other than those with FIN-ACK) T indicates the collector terminated the ow collection due to time out. C indicates the collector produced the ow record to continue ow collection that was terminated due to time out. Output parameters to rwfilter specify what data should be returned from the call. There are ve output parameters, as described in Table 3.4. Each call to rwfilter must have at least one of these parameters, and may have more than one. In Example 3-1, the --print-volume-stat parameter is used to count the ow records and their associated byte and packet volumes. Table 3.4: rwfilter Output Parameters Description Send SiLK ow records matching partitioning parameters to pipe or le faildata.raw Like --pass, but for records failing to match inle.raw Like --pass, but all records Print count (default, to stderr) of records passing and failing outow-vol.txt Print counts of ows/bytes/packets read, passing and failing to named le 20 Indicate maximum number of records to return as matching partitioning parameters Example stdout
Parameter --pass --fail --all-dest --print-stat --print-vol --max-pass
One of the most useful tools available for in-depth analysis is the drilling-down capability provided by using rwfilter parameters --pass and --fail. Most analysis will involve identifying an area of interest (all the 28
IPs that communicate with address X, for example) and then combing through that data. Rather than pulling down the same base query repeatedly, store the data to a separate data le using the --pass switch. Occasionally, it is more convenient to describe the data not wanted than the desired data. The --fail switch allows saving data that doesnt match the specied conditions. Section 3.2 provides more information about these switches and explains how to select records. To help improve query eciency when only a few records are needed, the --max-pass parameter allows the analyst to specify the maximum number of records to return via the path specied by the --pass parameter. In multiprocessor installations, this is interpreted as the number per processor. In singleprocessor installations, even if multiple threads are used, this is interpreted as the maximum number overall. Other parameters are miscellaneous parameters to rwfilter that have been found to be useful in analysis or in maintaining the repository. These are somewhat dependent on the implementation, and they include those described in Table 3.5. None of these parameters are used in the example, but at times, these are quite useful. Table 3.5: Other Parameters Description Check parameters for legality without actually processing data Print description of rwfilter and its parameters Print name of each input le as it is processed Print names of missing input les to stderr Print version of rwfilter being used Specify number of threads to be used in ltering Specify whether IPv6 or IPv4 (the default) will be used
Parameter --dry-run --help --print-filenames --print-missing --version --threads --ip-version
The --threads parameter takes an integer scalar N to specify using N threads to read input les for ltering. The default value is 1, or the value of the SILK_RWFILTER_THREADS environment variable if that is set. Using multiple threads is preferable for queries that look at many les but return few records. Current experience is that performance peaks at about four threads per CPU on the host running the lter, but this result is variable with the type of query and the number of records returned from each le. The --ip-version parameter is useful if your collection structure includes both IPv6 and IPv4 data. If only one version is present, the SiLK conguration will set the appropriate default. If both are present, then this parameter allows the tools to process either IPv4 or IPv6 data. The argument is a single integer (either 4 or 6). See Example 3-3 for a sample call to rwlter using this parameter.
3.2.2
Finding Low-Packet Flows with rwfilter
The TCP state machine is complex (see Figure 1.4), and legitimate service requests require a minimum of four packets. There are several types of illegitimate trac (such as port scans and responses to spoofed-address packets) that involve TCP ow records with low numbers of packets. Occasionally, there are legitimate TCP ow records with low numbers of packets (such as continuations of previously timed-out ow records, contact attempts on hosts that dont exist, and services that are not congured), but this legitimate behavior is relatively rare. As such, it may be useful to understand where low-packet TCP network trac are coming from and when such ow records are collected most frequently. Example 3-2 shows more complex calls to rwfilter. The call to rwfilter in this examples Command 1 selects all incoming ow records in the repository that started between 00:00:00 GMT and 05:59:59 GMT, that describe TCP trac, and that had three packets or less in the ow record. The call in Command 2 partitions these ow records into those that had the SYN ag set, but the ACK, RST, or FIN ags not set, 29
and those that did not show this ag combination. The third call extracts out the ow records that have the RST ag set, but had the SYN or FIN ags not set.
<1>$ rwfilter --start-date=2010/08/06:00 --end-date=2010/08/06:05 \ --type=in,inweb --proto=6 --packets=1-3 --pass=lowpacket.raw <2>$ rwfilter lowpacket.raw --flags-all=S/SARF \ --pass=synonly.raw --fail=temp.raw <3>$ rwfilter temp.raw --flags-all=R/SRF --pass=reset.raw <4>$ rm -f temp.raw Example 3-2: Using rwfilter to Extract Low-Packet Flow Records The calls in Commands 2 and 3 use a le as an input parameter; in each case, a le produced by a preceding call to rwfilter is used. These commands show how rwfilter can be used to rene selections to isolate ow records of interest. The call in Command 1 is the only one that pulls from the repository; as such, it is the only one that uses selection parameters. This call also uses a combination of partitioning parameters (--proto and --packets) to isolate low-packet TCP ow records from the selected time range. The calls in Commands 2 and 3 use --flags-all as a partitioning parameter to pull out ow records of interest. All three calls use --pass output parameters, and the call in Command 2 also uses a --fail output parameter, to generate a temporary le that serves as input to the Command 3 and is deleted in Command 4.
3.2.3
Using IPv6 with rwfilter
To use rwfilter with IPv6 data, the --ip-version=6 parameter is used, as shown in Example 3-3. Using that parameter, rwfilter handles IPv6 address forms and IPv6-specic protocols.
<1>$ rwfilter --ip-version=6 --saddr=fe80::/16 --pass=stdout | \ rwcut --fields=1-5 sIP| dIP|sPort|dPort|pro| fe80::217:f2ff:fed4:308c| ff02::fb| 5353| 5353| 17| fe80::213:72ff:fe95:31d3| ff02::1| 0|34304| 58| fe80::213:72ff:fe95:31d3| ff02::1:ffce:93a5| 0|34560| 58| fe80::213:72ff:fe95:31d3|2001:5c0:9fbf:0:21a:a0ff:fece:93a5| 0|34560| 58| fe80::213:72ff:fe95:31d3| ff02::1| 0|34304| 58| Example 3-3: Using rwfilter to Process IPv6 Flows One specic change that the --ip-version=6 parameter causes is that ICMP options imply ICMPv6 (protocol 58) rather than ICMPv4 (protocol 1). This is shown in Example 3-4, which shows how to detect neighbor discovery solicitations and advertisements in IPv6 data. Neighbor discovery solicitations (type 135) request (among other things) the Network Interface address for the host with the given IPv6 address (serving the function of IPv4s Address Resolution Protocol). Network discovery advertisements (type 136) are the response with this information. See the last two lines of output in Example 3-4 for an example solicitation with its responding advertisement.
30
<1>$ rwfilter --ip-version=6 --icmp-type=135,136 --pass=stdout | \ rwcut --fields=1-3,5 --icmp sIP| dIP|sPort|pro| fe80::213:72ff:fe95:31d3| ff02::1:ffd4:308c| 135| 58| fe80::213:72ff:fe95:31d3|2001:5c0:9fbf:0:21a:a0ff:fece:93a5| 135| 58| fe80::213:72ff:fe95:31d3| ff02::1:ffce:93a5| 135| 58| 2001:5c0:9fbf:0:21a:a0ff:fece:93a5| fe80::213:72ff:fe95:31d3| 136| 58| Example 3-4: Using rwfilter to Detect IPv6 Neighbor Discovery Flows
3.2.4
Using Pipes with rwfilter
One problem with generating temporary les is that they are slow. All the data must be written to disk before being used by a subsequent call, and then read back from disk. A faster method is using UNIX pipes to pass records from one call to another, which allows tools to operate concurrently, using memory (if possible) to pass data between tools. Example 3-5 shows a call to rwfilter that uses an output parameter to write records to standard output, which is piped (using the UNIX pipe character |) to a second call to rwfilter that reads these records via standard input. The rst call pulls from the repository records describing incoming trac that transferred 2048 bytes or more. The second call (after the pipe) partitions these records for trac that take 30 minutes (1800 seconds) or more and for trac that takes less than 30 minutes. (Recall that 30 minutes is close to the maximum duration of ows in many congurations; trac much longer than 30 minutes will be split by the collection system.)
<1>$ rwfilter --start-date=2010/08/06:00 --end-date=2010/08/06:05 \ --type=in,inweb --bytes=2048- --pass=stdout | \ rwfilter --input-pipe=stdin --duration=1200- \ --pass=slowfile.raw --fail=fastfile.raw <2>$ ls -l slowfile.raw fastfile.raw -rw------- 1 tshimeal echo 271009 Sep 2 19:34 fastfile.raw -rw------- 1 tshimeal echo 7218 Sep 2 19:34 slowfile.raw Example 3-5: rwfilter --pass and --fail to Partition Fast and Slow High-Volume Flows
3.2.5
Translating Signatures Into rwfilter Calls
Traditional intrusion detection depends heavily on the presence of payloads and signatures: distinctive packet data that can be used to identify a particular tool. In general, the SiLK tool suite is intended for examining trends, but it is possible to identify specic intrusion tools using the suite. Intruders generally use automated tools or worms to inltrate networks. While directed intrusions are still a threat, tool-based broad-scale intrusions are more common. It will sometimes be necessary to translate a signature into ltering rules, and this section describes some standard guidelines. To convert signatures, consider the intrusion tool behavior as captured in the signature: What service is it hitting? This can be converted to a port number. What protocol does it use? This can be converted into protocol number. 31
Does it involve several protocols? Some tools, malicious and benign, will use multiple protocols, such as TCP and ICMP. What about packets? Buer overows are a (depressingly) common form of attack, and are a function of the packets size, rather than its contents. If a specic size can be identied, that can be used to identify tools. When working with packet sizes, remember that the SiLK Suite includes the packet headers, so a 376 byte UDP packet, for example, will be 404 bytes long. How long are sessions? An attack tool may use a distinctive session each time (for example, 14 packets with a total size of 2080 bytes). There is also a tool rwidsquery, which takes as input either a Snort alert log or rule le, analyzes the contents, and invokes rwfilter with the appropriate arguments to retrieve ow records that match attributes of the input le.
3.2.6
rwfilter and Tuple Files
For a variety of analyses, the partitioning criteria are specic combinations of eld values, any one of which should be considered as passing. While the analyst can do this via separate rwfilter calls and merge them later, this can be inecient as it may involve pulling the same records from the repository several times. A more ecient solution is to store the partitioning criteria as a tuple le, and then use the tuple le with rwfilter to pull all of the records in a single operation. A tuple le, as shown by command 1 in Example 3-6, is a text le consisting of ow elds delimited by a vertical bar. The rst line is a header line indicating which eld is in each column. This le can then be used with rwfilter, as shown in command 2.
<1>$ cat <<END_FILE >>tuple-file.txt \ dIP|dPort 10.0.0.1| 25 10.0.0.2| 25 10.0.0.3| 22 10.0.0.4| 25 10.0.0.5| 25 10.0.0.6| 25 10.0.0.7| 22 10.0.0.8| 22 10.0.0.9| 25 END_FILE <2>$ rwfilter --start-date=2010/08/01:00 --end-date=2010/08/01:03 \ --type=in --proto=6 --tuple-file=tuple-file.txt --print-stat Files 4. Read 1037068. Pass 3006. Fail 1034062. Example 3-6: rwfilter With a Tuple File
3.3
Describing Flows with rwstats
rwstats provides a collection of statistical summary and counting facilities that enables organizing and ranking trac by dierent attributes. The primary benets provided by rwstats are its ability to generate 32
top-N lists and to provide statistical information on the distribution of trac. These statistics can be collected for a single ow eld, or any combination of ow elds. Example 3-7 illustrates three calls to rwstats. Command 1 generates a count of ow records for the top ve protocols. In this case, there are only four protocols used by ow records in the le, so there are only four counts displayed. Since UDP (protocol 17) ows are the most common in this data, Command 2 uses rwfilter to extract all UDP ow records from the le and pass them along to a second call to rwstats, which displays the top ve destination ports. Command 3 does a combination of protocol and dport with rwstats to generate the ve most common port-protocol pairs, which includes ESP (with port 0 reecting that ESP does not use port numbers).
<1>$ rwstats --fields=protocol --count=5 --flows slowfile.raw INPUT: 414 Records for 4 Bins and 414 Total Records OUTPUT: Top 5 Bins by Records pro| Records| %Records| cumul_%| 17| 285| 68.840580| 68.840580| 1| 58| 14.009662| 82.850242| 50| 37| 8.937198| 91.787440| 6| 34| 8.212560|100.000000| <2>$ rwfilter --proto=17 --pass=stdout slowfile.raw | \ rwstats --fields=dport --count=5 --flows INPUT: 285 Records for 5 Bins and 285 Total Records OUTPUT: Top 5 Bins by Records dPort| Records| %Records| cumul_%| 123| 109| 38.245614| 38.245614| 4500| 77| 27.017544| 65.263158| 53| 48| 16.842105| 82.105263| 500| 45| 15.789474| 97.894737| 4672| 6| 2.105263|100.000000| <3>$ rwstats --fields=protocol,dport --count=5 --flows slowfile.raw INPUT: 414 Records for 12 Bins and 414 Total Records OUTPUT: Top 5 Bins by Records pro|dPort| Records| %Records| cumul_%| 17| 123| 109| 26.328502| 26.328502| 17| 4500| 77| 18.599034| 44.927536| 17| 53| 48| 11.594203| 56.521739| 17| 500| 45| 10.869565| 67.391304| 50| 0| 37| 8.937198| 76.328502| Example 3-7: Using rwstats To Count Protocols and Ports rwstats provides a columnar output. The rst eld is the key followed by a count of records and the percentage contribution of this key to the total set of records. The nal column is a cumulative percentage the percentage of all values of the total set up to this key. As with other suite applications, rwstats can read its input either from a le or a pipe, as shown in Example 3-7. Each call to rwstats must include one of the following: use the summary parameters (--overall-stats or --detail-proto-stats) specify a key containing one or more elds via the --fields parameter and specify how to determine the number of values to show (via --count or --percentage) 33
The call may also specify whether a summary of ow records, bytes, or packets is desired, and whether the top values or the bottom values should be shown. The defaults are for ow records and to show the top N. Figure 3.3 provides a brief summary of rwstats and its more common parameters.
rwstats
Description Summarize SiLK Flow records by one of a limited number of key/value pairs and display the results as a top-N or bottom-N list. Call rwstats --fields=protocol --count=20 --top --flows filterfile.rwf Parameters --overall-stats Print minima, maxima, quartiles, and intervalcount statistics for bytes, pkts, bytes/pkt across all ows --detail-proto-stats Print overall statistics for each of the specied protocols. List protocols or ranges separated by commas --fields Use the indicated elds as the key (see Table 3.6) --sip Use the source address as the key. An optional argument is the prex lengththe number of bits to consider --dip Use the destination address as the key. An optional argument is the prex lengthnumber of bits to consider --flows Use the ow record count as the value --packets Use the packet count as the value --bytes Use the byte count as the value --count Print the specied number of key/value pairs --percentage Print key/value pairs where the value is greater/less-than this percentage of the total value --top Print the top N keys and their values --bottom Print the bottom N keys and their values --no-titles --no-columns --column-separator see Section 3.5 for explanation --delimited --integer-ips --pager see Section 3.5 for explanation --output-path Specify path to send output --copy-input Specify stream to which to send a copy of the input
Figure 3.3: Summary of rwstats Example 3-8 illustrates how to show all values that exceed 1 percent of all records using rwstats. In this particular case, there were only three keys (source port values) used to send bulk data quickly: HTTP, HTTPS, and SMTP.
34
$ rwfilter --proto=6 --pass=stdout fastfile.raw \ | rwstats --fields=sport --top --flow --percentage=1 INPUT: 17461 Records for 146 Bins and 17461 Total Records OUTPUT: Top 3 bins by Records (1% == 174) sPort| Records| %Records| cumul_%| 80| 15131| 86.655976| 86.655976| 443| 2004| 11.477006| 98.132982| 25| 180| 1.030869| 99.163851| Example 3-8: rwstats --sport --percentage to Prole Source Ports As Example 3-8 indicates, distributions can be very heavily skewed. Counting the top source-port percentages in outgoing trac will skew the result towards servers, since servers will be responding to trac on a limited number of ports. This is also shown in the destination port count, as shown in Example 3-9, where it is dominated by email and web, with the high-numbered ports likely dynamic ports for TCP connections. This behavior will vary across networks, since networks with a lot of workstation trac can be expected to have more diverse (and balanced) destination port usage. The network shown in Example 3-9 acts like a border network with mainly email and web servers being accessed.
$ rwfilter --proto=6 --pass=stdout fastfile.raw \ | rwstats --fields=dport --top --flow --count=5 INPUT: 17461 Records for 14775 Bins and 17461 Total Records OUTPUT: Top 5 Bins by Records dPort| Records| %Records| cumul_%| 25| 130| 0.744516| 0.744516| 80| 9| 0.051543| 0.796060| 41538| 6| 0.034362| 0.830422| 41577| 6| 0.034362| 0.864784| 55005| 5| 0.028635| 0.893420| Example 3-9: rwstats --dport --top --count to Examine Destination Ports As Example 3-9 shows, the most active destination port (mail) comprises less than 1 percent of records. Even then, this value is very large: the line above the titles provides a summary of the number of unique keys observed, and as this line indicates, nearly all possible destination ports are seen in this le. For eciency and exibility, it can be desirable to chain together rwstats calls, or to chain rwstats with other suite tools. Two parameters are used to support this in Example 3-10. --copy-input species a pipe or le to receive a copy of the ow records supplied as input to rwstats. --output-path species a le name to receive the output from the current call to rwstats. These parameters are also available on other tools in the suite, and will be referenced in their syntax.
35
<1>$ rwfilter --proto=6 --pass=stdout fastfile.raw |\ rwstats --fields=dport --top --flow --count=5 --copy-input=stdout --output-path=top.txt \ | rwstats --fields=sip --top --flow --count=5 INPUT: 17461 Records for 478 Bins and 17461 Total Records| OUTPUT: Top 5 Bins by Records| sIP| Records| %Records| cumul_%| 10.0.0.5| 5933| 33.978581| 33.978581| 10.0.0.2| 1318| 7.548250| 41.526831| 10.0.0.1| 1080| 6.185213| 47.712044| 10.0.0.4| 799| 4.575912| 52.287956| 10.0.0.3| 694| 3.974572| 56.262528| <2>$ cat top.txt INPUT: 17461 Records for 14775 Bins and 17461 Total Records OUTPUT: Top 5 Bins by Records dPort| Records| %Records| cumul_%| 25| 130| 0.744516| 0.744516| 80| 9| 0.051543| 0.796060| 41538| 6| 0.034362| 0.830422| 41577| 6| 0.034362| 0.864784| 55005| 5| 0.028635| 0.893420| Example 3-10: rwstats --copy-input and --output-path to Chain Calls
3.4
Creating Time Series with rwcount
rwcount provides a time-binned count of the number of bytes, packets, and ow records. The rwcount call in Example 3-11 counts into 10-minute bins all ow volume information that appears in the slowfile.raw le.
$ rwcount --bin-size=600 --load-scheme=1 slowfile.raw Date| Records| Bytes| Packets| 2010/08/06T00:00:00| 14.00| 34046290.00| 247099.00| 2010/08/06T00:10:00| 13.00| 19604356.00| 51283.00| 2010/08/06T00:20:00| 13.00| 859970.00| 4612.00| 2010/08/06T00:30:00| 15.00| 39432533.00| 282075.00| (many more lines) Example 3-11: rwcount for Counting with Respect to Time Bins rwcount by default produces the table format shown in Example 3-11: the rst column is the date, followed by the number of records, bytes, and packets. The --bin-size parameter species the size of the counting bins in seconds; rwcount uses 30-second bins by default. The --load-scheme=1 parameter species to consider all of the ow records volume in the rst second of its duration, rather than averaging the volume across all bins in the ows duration, which is the default. Figure 3.4 provides a summary of this command and its options. 36
rwcount
Description Calculates volumes over time samples Call rwcount --bin-size=3600 filterfile.rwf Parameters --bin-size Number of seconds per bin --load-scheme How data lls bins --skip-zeroes Do not print empty bins --epoch-slots Print slots using epoch time --start-epoch Start printing from this time period --output-path --copy-input See Section 3.3
Figure 3.4: Summary of rwcount
3.4.1
Examining Trac Over a Month
rwcount is frequently used to provide graphs showing activity over long periods of time. An example considers TCP trac reaching a targeted server. The le mon_7.raw contains all incoming trac reaching the address 10.3.1.2 from the period between July 1 and August 1, 2010 (a total of 206067 ow records). The command in Example 3-12 counts all records in the le, splitting the count by the time in each ow record.
<1>$ rwcount mon_7.raw > mon_7.count Example 3-12: rwcount Sending Results to Disk Example 3-12 redirects output directly to disk. Count data can be read by most plotting applications; for this example, graphs are generated using the gnuplot utility. The resulting plot is shown in Figure 3.5. As this example shows, the data is noisy. Even when focusing on a representative hour, the result continues to be noisy, as shown in Figure 3.6. To make the result more readable, we change the bin size to a more manageable value using --bin-size. In Example 3-13, we change the size of the bins to an hour.
<1>$ rwcount --bin-size=3600 mon_7.raw > mon_7.count Example 3-13: rwcount --bin-size to Better Scope Data for Graphing With volumes totaled by the hour, (and shifting the vertical axis to logarithmic) regular trac patterns are more visible. In Figure 3.7 these appear as a more solid wavering line, with daily peaks corresponding to working hours (because trac on domestic networks is primarily from the United States, each day has a 12-hour peak corresponding to 8 a.m.8 p.m. Eastern Standard Time).
3.4.2
Counting by Bytes, Packets, and Flows
Counting by bytes, packets, and network ows can reveal dierent trac characteristics. As noted at the beginning of this manual, the majority of trac crossing wide area networks have very low packet counts. However, this trac, by virtue of being so small, does not make up a large volume of bytes crossing the 37
100000 Bytes 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 07/01
07/06
07/11
07/16
07/21
07/26
07/31
Figure 3.5: Displaying rwcount Output Using gnuplot enterprise network. Certain activities, such as scanning and worm propagation, are more visible when considering packets, ows, and various ltering criteria for ow records. We consider the mon_7.raw le again, this time using daily counts. In Figure 3.8, a logarithmic scale has been used to show both graphs on the same page. Under normal circumstances, the byte count will be several thousand times larger than the record count.
3.4.3
Changing the Format of Data
rwcount can alter its output format to accommodate dierent representations of time. The most important of these features are the --epoch-slots and --start-epoch commands. --epoch-slots alters output to print the results as epoch time (seconds since midnight January 1, 1970). Epoch time is easier to parse than a conventional Year/Month/Day format, making it useful when working with scripts. The Example 3-14 shows epoch time and its relationship to normal time.
38
3500 Bytes
3000
2500
2000
1500
1000
500
0 00 10 20 30 40 50 00
Figure 3.6: Focusing gnuplot Output on a Single Hour
39
10000000 Bytes
1000000
100000
10000
1000
100
10 07/01
07/06
07/11
07/16
07/21
07/26
07/31
Figure 3.7: Improved gnuplot Output Based on a Larger Bin Size
40
10000000 Bytes Records 1000000
100000
10000
1000
100
10
1 07/01
07/06
07/11
07/16
07/21
07/26
07/31
Figure 3.8: Comparison of Byte and Record Counts over Time
41
<1>$ rwcount --bin-size=3600 mon_7.raw | head -4 Date| Records| Bytes| Packets| 2010/07/01T00:00:00| 1.00| 40.00| 1.00| 2010/07/01T01:00:00| 2.00| 80.00| 2.00| 2010/07/01T02:00:00| 1.00| 404.00| 1.00| <2>$ rwcount --bin-size=3600 --epoch-slots mon_7.raw | head -4 Date| Records| Bytes| Packets| 1277942400| 1.00| 40.00| 1.00| 1277946000| 2.00| 80.00| 2.00| 1277949600| 1.00| 404.00| 1.00| <3>$ rwcount --bin-size=3600 --legacy-timestamps mon_7.raw | head -4 Date| Records| Bytes| Packets| 07/01/2010 00:00:00| 1.00| 40.00| 1.00| 07/01/2010 01:00:00| 2.00| 80.00| 2.00| 07/01/2010 02:00:00| 1.00| 404.00| 1.00| <4>$ rwcount --bin-size=3600 --bin-slots mon_7.raw | head -4 Date| Records| Bytes| Packets| 48| 1.00| 40.00| 1.00| 49| 2.00| 80.00| 2.00| 50| 1.00| 404.00| 1.00| Example 3-14: rwcount Alternate Date Formats As Example 3-14 shows, the epoch values are actually the same times as the normal results, but they are given as a single time value. Also note that the epoch slots are exactly 3600 seconds apart in each case. This spacing is normally expected for the conventional representation given above, but it is easier to see in this example. rwcount normally starts printing at the rst nonzero slot; however, when dealing with multiple data les that start at dierent times, this default behavior can result in count les that start at dierent times. To force rwcount to start each result at the same time, use the --start-epoch parameter as shown in Example 3-15. This parameter will force rwcount to start reporting at the same time period, regardless of whether the data starts at or before that time period. The parameter to --start-epoch can either be an integer epoch value or a date in year/month/day format.
<1>$ rwcount --start-epoch=1277938800 --bin-size=3600 --epoch-slots mon_7.raw | head -4 Date| Records| Bytes| Packets| 1277938800| 0.00| 0.00| 0.00| 1277942400| 1.00| 40.00| 1.00| 1277946000| 2.00| 80.00| 2.00| <2>$ rwcount --start-epoch=1277946000 --bin-size=3600 --epoch-slots mon_7.raw | head -4 Date| Records| Bytes| Packets| 1277946000| 2.00| 80.00| 2.00| 1277949600| 1.00| 404.00| 1.00| 1277953200| 0.00| 0.00| 0.00| Example 3-15: rwcount --start-epoch to Constrain Minimum Date
42
Example 3-15 shows how --start-epoch aects output. In the rst case, we start the epoch before we have data; therefore, rwcount prints out a blank slot for the rst hour. In the second case, we start the epoch after the data and the count command consequently ignores the rst few slots.
3.4.4
Using the --load-scheme Parameter for Dierent Approximations
Grouping packets as ow records results in a loss of timing information; specically it is not possible to tell how the packets covered by a ow record arrived at their destination. As a result, rwcount makes a guess as to how to record ow record volumes. This guess is controlled by the --load-scheme parameter in rwcount. --load-scheme bins records in one of four ways. The default approach is to split the bytes, packets, and record count in all bins covered by the ow record. It can also store records in the last appropriate bin (the bin covering --end-time), in a bin in the middle of that range, or it can store the ow records volume in the bin corresponding to the ow records start time. The dierences among load schemes are generally slight, but they can sometimes be signicant. To show this dierence, we rst select data as shown in Command 1 of Example 3-16, isolating 4 hours of trac. Then commands 2-4 of the example count the data into bins using three dierent load schemes: 1min.0.txt contains data split evenly across 1-minute bins, 1min.1.txt contains data loaded into the rst bin, 1min.def.txt contains data split according to the time the ow record spent in each 1-minute bin (the default). Note that we have picked large records and small bins; the smaller the binning, the more pronounced the dierences will be.
<1>$ rwfilter mon_7.raw --pass=hour_10_7_3.raw \ --stime=2010/07/03:02:00:00-2010/07/03:05:00:00 <2>$ rwcount --delimited=, --bin-size=60 --load-scheme=0 \ hour_10_7_3.raw > 1min.0.txt <3>$ rwcount --delimited=, --bin-size=60 --load-scheme=1 \ hour_10_7_3.raw > 1min.1.txt <4>$ rwcount --delimited=, --bin-size=60 \ hour_10_7_3.raw > 1min.def.txt Example 3-16: rwcount Alternative Load Schemes The resulting graph is shown in Figure 3.9. While the dierences appear slight, some are notable. The trac shown in the even binning approach (--load-scheme=0) is slightly more smoothwith the front-bin approach (--load-scheme=1), the peaks and valleys are elongated.
3.5
Displaying Flow Records Using rwcut
SiLK uses binary data to implement fast access and le manipulation; however, this data cannot be read using cat or any of the standard text-processing UNIX tools. As shown in Figure 3.10, the rwcut tool reads lter les and produces user-readable output in a pipe-delimited tabular format. rwcut both reads and formats les. Using the --fields parameter, SiLK data can be reformatted in dierent orders and in dierent structures.
43
80000 Split by Time Split Evenly Binned at Front 70000
60000
50000
40000
30000
20000
10000
0 02:00
02:20
02:40
03:00
03:20
03:40
04:00
04:20
04:40
05:00
Figure 3.9: Dierences Between Load Schemes
rwcut
Description Reads SiLK Flow data and print it to screen Call rwcut --fields=1-9 filterfile.rwf Parameters --fields Choose which elds to print --integer-ips Choose which elds to print --num-recs --start-rec --end-rec Record selection --icmp-type Print ICMP type and code --delimited Choose delimiter --output-path --copy-input See Section 3.3
Figure 3.10: Summary of rwcut
44
rwcut is invoked in two ways, either by reading a le or by connecting it with rwfilter. When reading a le, just specify the lename in the command line, as shown in Example 3-174 .
<1>$ rwcut --fields=1-6 fastfile.raw sIP| dIP|sPort|dPort|pro| 10.0.0.1| 10.0.0.2| 25|35959| 6| 10.0.0.3| 10.0.0.2| 25|50886| 6| 10.0.0.4| 10.0.0.2| 25|50896| 6| 10.0.0.3| 10.0.0.2| 25|46471| 6| 10.0.0.4| 10.0.0.2| 25|50904| 6| 10.0.0.3| 10.0.0.2| 25|50917| 6| 10.0.0.5| 10.0.0.2| 25|36035| 6| 10.0.0.6| 10.0.0.2| 25|46753| 6| 10.0.0.7| 10.0.0.2| 25|46756| 6| 10.0.0.8| 10.0.0.2| 25|46787| 6| Example 3-17: rwcut for Display the Contents of a File
packets| 137| 49| 2202| 2531| 412| 2056| 154| 162| 171| 50|
To use rwcut with rwfilter, connect them together with pipes, as illustrated in Example 3-18.
<1>$ rwfilter --pass=stdout --saddress=x.x.x.32 slowfile.raw | rwcut --fields=1-6 | head -5 sIP| dIP|sPort|dPort|pro| packets| 10.0.1.32| 10.0.0.2| 0| 768| 1| 1017| 10.0.3.32| 10.0.0.2| 0| 0| 50| 122| 10.0.4.32| 10.0.0.5|64651| 4500| 17| 2766| 10.0.6.32| 10.0.0.7| 123| 123| 17| 29| Example 3-18: rwcut Used With rwfilter
3.5.1
Pagination
When output is sent to a terminal, rwcut will automatically invoke the command listed in the users PAGER environment variable to paginate the output. The command given in the SILK PAGER environment variable will override the PAGER. If SILK PAGER is the empty string, as shown in Example 3-19 for the Bash shell, no paging will be performed. The paging program can be specied for an invocation of a tool by using its --pager parameter, as shown in Example 3-20.
<1>$ export SILK_PAGER= Example 3-19: SILK PAGER With the Empty String to Disable rwcut Paging
<1>$ rwfilter ... | rwcut --field=5 --pager= Example 3-20: rwcut --pager to Disable Paging
4 The
addresses shown in this example and those following have been anonymized.
45
3.5.2
Selecting Fields to Display
The --fields parameter provides a means to both select and rearrange elds; when elds are specied using the --fields parameter, rwcut orders them in the sequence specied in the parameter. Thus, --fields=1,2,3 will result in a display that is dierent from --fields=3,2,1. Several SiLK tools use a --fields parameter (including rwcut, rwsort and rwuniq). The argument to this parameter is a list of eld numbers, eld names, or a mix of the two. Table 3.6 shows these eld numbers and names. In this table, the eld name column may hold several names separated by commas; these names are equivalent. Where a plus character appears, this character is part of the name. In some cases, a eld name may not have a corresponding eld number, indicating that the name must be used if this eld is desired. If a pmap is required for a given eld, the --pmap-file parameter must precede the --fields parameter. (See Section 4.8 for more information on Prex Maps.)
Field Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Table 3.6: Arguments for the --fields Parameter Field Name Description sIP,sip Source IP address for ow record dIP,dip Destination IP address for ow record sPort,sport Source port (or ICMP type) for ow record(or 0) dPort,dport Destination port (or ICMP code) for ow record(or 0) protocol Protocol number for ow record packets,pkts Number of packets in ow bytes Number of bytes in ow ags Logical or of TCP ag elds of ow (or blank) sTime,stime Start date and time of ow (in seconds) dur Duration of ow (in seconds) eTime,etime End date and time of ow (in seconds) sensor Sensor that collected ow in Input interface on sensor (currently unused) out Output interface on sensor (currently unused) nhIP Next hop IP address (currently used only for annotations) stype Source group of IP addresses (pmap required) dtype Destination group of IP addresses (pmap required) scc Source Country Code (pmap required) dcc Destination Country Code (pmap required) class Class of sensor that collected ow type Type of ow for this sensor class sTime+msec,stime+msec Start date and time of ow (in milliseconds) eTime+msec,etime+msec End date and time of ow (in milliseconds) dur+msec Duration of ow (in milliseconds) icmpTypeCode ICMP type and code values InitialFlags TCP ags in Initial Packet SessionFlags TCP ags in remaining Packets attributes Constants for termination conditions application Standard port for application that produced trac sval (see Section 4.8) dval (see Section 4.8)
46
3.5.3
Selecting Fields for Performance
In general, the SiLK tools provide sucient ltering and summarizing facilities to make performance scripting rare. However, given the volume of data that can be processed, it is worth considering performance constraints between rwcut and scripts. Left to its default parameters, rwcut prints a lot of characters per record; with enough records, script execution can be quite slow. In Example 3-21, command 1 pulls a fairly long le. Command 2 uses rwcut and the UNIX line-count command wc -l to count the number of records in the le. The UNIX time command is used to determine how long the rwcut and wc run takes5 . In the output, the rst line is the record count, and the next line is the result of the time command. The most meaningful gure in the second output line is the third part, which indicates the command took 10 minutes and 14.33 seconds to execute.
<1>$ rwfilter --start-date=2010/08/01:00 --end-date=2010/08/01:00 \ --proto=6 --pass=tmp.raw <2>$ time $SHELL -c "rwcut tmp.raw | wc -l" 159531 0.815u 0.123s 0:00.88 105.6% 0+0k 0+0io 0pf+0w Example 3-21: rwcut Performance With Default --fields Compare this with the results shown in Example 3-22, where we have cut out all elds except protocol. The result is approximately 13 times faster.
<1>$ time $SHELL -c "rwcut --field=5 tmp.raw | wc -l" 159531 0.047u 0.017s 0:00.06 83.3% 0+0k 0+0io 0pf+0w Example 3-22: rwcut --fields to Improve Eciency
3.5.4
Rearranging Fields for Clarity
The --fields parameter can also be used to rearrange elds. In Example 3-23, we reorder the output elds to the form <source IP>, <source port>, <start time>, <destination IP>.
<1>$ rwfilter --proto=6 --pass=stdout fastfile.raw \ | rwcut --fields=1,3,9,2 | head -5 sIP|sPort| sTime| dIP| 10.0.0.1| 25|2010/08/06T00:00:10.063| 10.0.0.2| 10.0.0.3| 25|2010/08/06T00:00:43.133| 10.0.0.2| 10.0.0.4| 25|2010/08/06T00:01:35.393| 10.0.0.2| 10.0.0.3| 25|2010/08/06T00:01:43.274| 10.0.0.2| Example 3-23: rwcut --fields to Rearrange Output
5 the $SHELL -c "commands" syntax is used to encapsulate the piping of the two commands so that time can record the cumulative execution speed
47
3.5.5
Field Formatting
Since rwcut is the primary report and display tool for SiLK, it includes several features for reformatting and modifying output. In general, rwcuts features are minimal and are focused on the most relevant high-volume tasks. The focus of rwcut is on generating data that can then be read easily by other scripting tools.
Changing Field Format for ICMP ICMP types and codes are stored in the destination port eld for most sensors. Normally, this storage results in a value equivalent to (type * 256) + code being stored in the dport eld, as shown in Example 3-24.
<1>$ rwfilter mon_7.raw --proto=1 --pass=stdout \ | rwcut --field=1-7 | head -6 sIP| dIP|sPort|dPort|pro| 10.0.0.1| 10.0.0.2| 0| 771| 1| 10.0.0.3| 10.0.0.2| 0| 2048| 1| 10.0.0.4| 10.0.0.2| 0| 2048| 1| 10.0.0.5| 10.0.0.2| 0| 2048| 1| 10.0.0.6| 10.0.0.2| 0| 2048| 1| Example 3-24: rwcut ICMP Type and Code as dport
packets| 2| 2| 1| 2| 2|
bytes| 312| 120| 60| 120| 120|
rwcut includes a parameter --icmp, that will reformat type and code data into a more readable form. Example 3-25 shows the same data reformatted using --icmp:
<1>$ rwfilter mon_7.raw --proto=1 --pass=stdout \ | rwcut --field=1-7 --icmp | head -6 sIP| dIP|sPort|dPort|pro| packets| bytes| 10.0.0.1| 10.0.0.2| 3| 3| 1| 2| 312| 10.0.0.3| 10.0.0.2| 8| 0| 1| 2| 120| 10.0.0.4| 10.0.0.2| 8| 0| 1| 1| 60| 10.0.0.5| 10.0.0.2| 8| 0| 1| 2| 120| 10.0.0.6| 10.0.0.2| 8| 0| 1| 2| 120| <2>$ rwfilter mon_7.raw --proto=1 --pass=stdout \ | rwcut --field=1-2,25,5-7 | head -6 sIP| dIP|iTy|iCo|pro| packets| bytes| 10.0.0.1| 10.0.0.2| 3| 3| 1| 2| 312| 10.0.0.3| 10.0.0.2| 8| 0| 1| 2| 120| 10.0.0.4| 10.0.0.2| 8| 0| 1| 1| 60| 10.0.0.5| 10.0.0.2| 8| 0| 1| 2| 120| 10.0.0.6| 10.0.0.2| 8| 0| 1| 2| 120| Example 3-25: rwcut --icmp Parameter and Fields to Display ICMP Type and Code In this case, the dport value has been converted into a type (in the sport slot) and a code (in the dport slot). In command 2, rwcut was invoked with a --fields parameter that includes icmpTypeCode or 25. This produced columns that contain the ICMP type and code values. 48
Changing Character Separators The --delim parameter allows changing the separator from a pipe (|) to any other character, Example 3-26 shows this replacement.
<1>$ rwcut --field=1,2,3 --integer-ip --delim=x fastfile.raw | head -6 sIPxdIPxsPortx 167772161x 167772162x25x 167772163x 167772162x25x 167772164x 167772162x25x 167772163x 167772162x25x 167772164x 167772162x25x Example 3-26: rwcut --delim to Change the Delimiter When using the --delim parameter, spacing is removed as well. (This is particularly useful for --delim=,, which produces comma-separated-value output for easy import into Excel and other tools.) The --delim parameter doesnt allow spaces in its argument , as it uses only the rst character in the argument. To get around this problem, use the --column-separator parameter, which changes the separator without aecting the spacing6 . The --no-columns parameter suppresses spacing between columns without aecting the separator. The --integer-ip parameter species that IP addresses are to be displayed as integer values, rather than the normal dotted-decimal notation. The --no-title parameter (see Example 3-27) suppresses the column headers, which can be useful when doing further processing with text-based tools.
<1>$ rwcut --no-title slowfile.raw --num-recs=3 --fields=1-5 10.0.0.1| 10.0.0.2| 0| 768| 1| 10.0.0.3| 10.0.0.2| 0| 0| 50| 10.0.0.4| 10.0.0.5|64651| 4500| 17| Example 3-27: rwcut --no-title to Suppress Field Headers in Output
6 or
pipe through the UNIX command sed -e s/,/, /g to add a space after each comma.
49
3.5.6
Selecting Records to Display
rwcut has three output-control parameters: --num-recs, --start-rec, and --end-rec. --num-recs limits the number of records output, as shown in Example 3-28.
<1>$ rwfilter --proto=6 slowfile.raw --pass=stdout | \ rwcut --fields=1-5 --num-recs=3 sIP| dIP|sPort|dPort|pro| 10.0.0.1| 10.0.0.2|19965| 179| 6| 10.0.0.3| 10.0.0.4| 5223| 3588| 6| 10.0.0.5| 10.0.0.6|61080| 179| 6| Example 3-28: rwcut --num-recs to Constrain Output --num-recs forces a maximum number of lines of output. For example, using --num-recs=100, 100 lines of records will be shown regardless of piping or redirection. As shown in Example 3-29, the line count of 101 includes the title line. To eliminate that line, use the --no-titles option (see Example 3-27).
<1>$ rwfilter --proto=6 fastfile.raw --pass=stdout |\ rwcut --fields=1-5 --num-recs=100 | wc -l 101 Example 3-29: rwcut --num-recs and Title Line --num-recs will print out records until it reaches the specied number of records or there are no more records to print. If les are specied on the rwcut command line, --num-recs will print the specied number of records per input le. The --start-rec and --end-rec commands are used to specify the record number at which to start and stop printing. In Example 3-30, we print out the records in the fastfile.raw le starting at record 2.
<1>$ rwfilter --proto=6 fastfile.raw --pass=stdout |\ rwcut --fields=1-5 --start-rec=2 --num-recs=10 sIP| dIP|sPort|dPort|pro| 10.0.0.1| 10.0.0.2| 25|50886| 6| 10.0.0.3| 10.0.0.2| 25|50896| 6| 10.0.0.1| 10.0.0.2| 25|46471| 6| 10.0.0.3| 10.0.0.2| 25|50904| 6| 10.0.0.1| 10.0.0.2| 25|50917| 6| 10.0.0.4| 10.0.0.2| 25|36035| 6| 10.0.0.5| 10.0.0.2| 25|46753| 6| 10.0.0.6| 10.0.0.2| 25|46756| 6| 10.0.0.7| 10.0.0.2| 25|46787| 6| 10.0.0.8| 10.0.0.2| 25|50969| 6| Example 3-30: rwcut --start-rec to Select Records to Display --start-rec and --end-rec will supersede --num-recs if all three are present in a call, as shown in Example 3-31. 50
<1>$ rwfilter --proto=6 fastfile.raw --pass=stdout |\ rwcut --fields=1-5 --start-rec=2 --end-rec=8 --num-recs=99 sIP| dIP|sPort|dPort|pro| 10.0.0.1| 10.0.0.2| 25|50886| 6| 10.0.0.3| 10.0.0.2| 25|50896| 6| 10.0.0.1| 10.0.0.2| 25|46471| 6| 10.0.0.3| 10.0.0.2| 25|50904| 6| 10.0.0.1| 10.0.0.2| 25|50917| 6| 10.0.0.4| 10.0.0.2| 25|36035| 6| 10.0.0.5| 10.0.0.2| 25|46753| 6| Example 3-31: rwcut --start-rec, --end-rec, and --num-recs Combined
3.6
Sorting Flow Records With rwsort
rwsort is a high-speed sorting tool for SiLK ow records. It is faster than the standard UNIX sort command, handles ow record elds directly with understanding of the type of the elds, and is capable of handling very large numbers of SiLK ow records provided that sucient memory is available. Figure 3.11 provides a brief summary of this tool. The order of values in the argument to the --fields parameter to rwsort indicates the sorts precedence for elds. For example, --fields=1,3 results in ow records being sorted by source IP address (1) and by source port (3) for each source IP address. --fields=3,1 results in ow records being sorted by source port, and by source IP address for each source port. Since ow records are not entered into the repository in the order they were opened, analyses often involve sorting by start time (eld 9) at some point. rwsort can also be used to merge-sort SiLK record les. By default, rwsort does this by sorting each input le, then merging the results. If an appropriate order is already present on the input les, then using the --presorted-input parameter can improve eciency signicantly. In cases where rwsort is processing large input les, disk space in the default temporary system space may be insucient. To use an alternate space, use the --temp-directory parameter, with an argument specifying the alternate space. This may also improve data privacy, if this is an issue in the installation.
3.6.1
Behavioral Analysis with rwsort, rwcut and rwfilter
A behavioral analysis of protocol activity relies heavily on basic rwcut and rwfilter parameters. The analysis requires the analyst to have a thorough understanding of how protocols are meant to function. Some concept of baseline activity for a protocol on the network is needed for comparison. To monitor the behavior of protocols, take a sample of a particular protocol, use rwsort --fields=9, then convert to ASCII with rwcut. To produce byte and packet elds only, try rwcut with --fields=6 and --fields=7, then perform the UNIX commands sort and uniq -c. Cutting in this manner (sorting by eld or displaying select elds) can answer a number of questions: 1. Is there a standard byte-per-packet ratio? Are there byte-per-packet ratios that fall outside the baseline? 2. Are there sessions with byte counts, packet counts, or other elds which fall outside the norm? 51
rwsort
Description Call Sorts SiLK ow records using key eld(s) rwsort --fields=1,3 --output=sorted.rwf unsorted1.rwf unsorted2.rwf Parameters --fields Key elds for sorting (required) --output-path Output location, defaults to stdout --input-pipe Input location, defaults to stdin --presorted-input Assume input has been already sorted with same elds --temp-directory Store temporary les here while sorting
Figure 3.11: Summary of rwsort There are many such questions to ask but keep the focus of exploration on the behavior being examined. Chasing down weird cases is tempting, but can add little to the understanding of network behavior.
3.7
Counting Flows With rwuniq
The SiLK analysis suite includes a variety of tools for counting data, the most powerful of which is the rwuniq application, which is summarized in Figure 3.12. rwuniq is a general purpose counting tool: it provides counts of the records, bytes, and packets for any combination of elds, including binning by time intervals. Flow records need not be sorted before being passed to rwuniq. If the records are sorted in the same order as indicated by the --fields parameter to rwuniq, then using the --presorted-input parameter will reduce memory requirements for rwuniq.
rwuniq
Description Counts records per combination of multiple-eld keys Call rwuniq --fields=1-9 filterfile.rwf Parameters --fields Fields to use as key --flows Count ows per key --bytes Count bytes per key --packets Count packets per key --sip-distinct Count number of distinct source addresses per key --dip-distinct Count number of distinct destination addresses per key --presorted-input Reduce memory requirements for presorted ow records --sort-output Produce results in sorted order, using --fields parameter as the sort key --output-path, --copy-input See Section 3.3
Figure 3.12: Summary of rwuniq
52
Example 3-32 shows a count on source IP addresses (eld 1).
<1>$ rwuniq --field=1 fastfile.raw | head -10 sIP| Records| 10.0.0.1| 26| 10.0.0.2| 9| 10.0.0.3| 1| 10.0.0.4| 1| 10.0.0.5| 1| 10.0.0.6| 1| 10.0.0.7| 1| 10.0.0.8| 2| 10.0.0.9| 42| Example 3-32: rwuniq for Counting in Terms of a Single Field The count shown in Example 3-32 is a count of individual ow records.
3.7.1
Using Thresholds with rwuniq
rwuniq provides a capability to set thresholds on counts. For example, to show only those source IP addresses with more than 25,000 ow records, use the --flow parameter as shown in Example 3-33. In addition to providing counts of ow records, rwuniq can count bytes and packets through the --bytes and --packets parameter, as shown in Example 3-34.
<1>$ rwuniq --field=1 mon_7.raw --flows=200 sIP| Records| 10.0.0.11| 1554| 10.0.0.12| 12682| 10.0.0.13| 509| 10.0.0.14| 1529| 10.0.0.15| 565| 10.0.0.16| 2413| 10.0.0.17| 424| 10.0.0.18| 10559| 10.0.0.19| 3370| Example 3-33: rwuniq --flows for Constraining Counts to a Threshold
53
<1>$ rwuniq --field=1 mon_7.raw --bytes --packets --flows=200 sIP| Bytes| Packets| Records| 10.0.0.11| 619288| 1556| 1554| 10.0.0.12| 6619934| 16633| 12682| 10.0.0.13| 203378| 511| 509| 10.0.0.14| 611328| 1536| 1529| 10.0.0.15| 230044| 578| 565| 10.0.0.16| 962364| 2418| 2413| 10.0.0.17| 169548| 426| 424| 10.0.0.18| 4332628| 10886| 10559| 10.0.0.19| 1360364| 3418| 3370| Example 3-34: rwuniq --bytes and --packets with Minimum Flow Threshold The --bytes, --packets, and --flows parameters are all threshold operators. Without an additional argument (such as --flows=200), they will print all records. With a numeric argument, rwuniq will print out all records with a count greater than or equal to that value. When multiple threshold parameters are specied, rwuniq will print all records that meet all the threshold criteria, as shown in Example 3-35.
<1>$ rwuniq --field=1 mon_7.raw --bytes --packets=1000 --flows=200 sIP| Bytes| Packets| Records| 10.0.0.11| 619288| 1556| 1554| 10.0.0.12| 6619934| 16633| 12682| 10.0.0.13| 611328| 1536| 1529| 10.0.0.14| 962364| 2418| 2413| 10.0.0.15| 4332628| 10886| 10559| 10.0.0.16| 1360364| 3418| 3370| 10.0.0.17| 8730528| 21936| 19606| 10.0.0.18| 1482550| 3725| 3725| 10.0.0.19| 4972612| 12494| 11948| Example 3-35: rwuniq --flows and --packets to Constrain Flow and Packet Counts
3.7.2
Counting IPv6 Flows
rwuniq automatically adjusts to process IPv6 ow records if they are supplied as input. No specic parameter is needed to identify these ow records, as shown in Example 3-36. This example uses rwfilter to isolate IPv6 Packet Too Big ow records (ICMPv6 Type 2), and then uses rwuniq to prole how often each host sends these, and to how many destinations. These ow records are used for Path Maximum Transmission Unit (PMTU) negotiation, an optimization of packet sizing within IPv6 to prevent the need for frequent packet fragmentation. A low number of such ow records is considered acceptable. If a source IP address has a high count, then that host is throttling back network connections for communicating hosts.
54
<1>$ rwfilter --ip-version=6 --icmp-type=2 --pass=stdout | \ rwuniq --fields=sip --flows=2 --dip-distinct sIP| Records|Unique_DIP| 2001:6100:0:320a:9ce3:a2ff:ae0:e169| 5| 2| 2001:6100:0:3e00::2e28:0| 2| 2| 2001:6140:a401:fe00::51f6| 8| 1| 2001:655a:0:64b2::a5| 2| 1| Example 3-36: Using rwuniq to Detect IPv6 PMTU Throttling
3.7.3
Counting on Compound Keys
In addition to the simple counting shown above, rwuniq can count on combinations of elds. To use a compound key, specify it using comma/dash notation in rwuniqs --field parameter. Keys can be manipulated as in rwcut, so --field=3,1 is a dierent key from --field=1,3. In Example 3-37, --field is used to identify major communications between clients and specic services.
<1>$ rwfilter --proto=6 --pass=stdout mon_7.raw \ | rwuniq --field=1,3 --flows=20 sIP|sPort| Records| 10.0.0.21| 80| 46| 10.0.0.22|12200| 155| 10.0.0.23|12200| 23| 10.0.0.24|14602| 66| 10.0.0.25| 80| 21| 10.0.0.26|12200| 142| Example 3-37: rwuniq --field to Count with Respect to Combinations of Fields In Example 3-37, outgoing trac is used to identify those source IPs with the highest number of ow records connecting to specic TCP ports.
3.7.4
Using rwuniq to Isolate Behavior
rwuniq can prole ow records for a variety of behaviors, by rst ltering for the behavior of interest and then using rwuniq to count the records showing that behavior. This can be useful in understanding hosts that use or provide a mix of services. Example 3-38 shows how to generate data that compares hosts showing email and non-email behavior among a group of ow records. Command 1 rst isolates the set of hosts of interest, then divides their records into mail and non-mail behaviors (by protocol and port), and nally counts the mail behavior into a le, which is sorted by source address. Command 2 counts the non-mail ow records and sorts them by source address. Command 3 merges the two count les by source address, then sorts them by number of mail ows, with the results shown. Hosts with high counts in both columns appear to be either workstations or gateways. Hosts with high counts in email and low counts in non-email appear to be email servers 7 . For more complex summaries of behavior, use the bag utilities as described in Section 4.6.
7 The
full analysis to identify email servers is more complex, and will not be dealt with in this handbook
55
<1>$ rwfilter mon_7.raw --sipset=interest.set --pass=stdout | \ rwfilter --input-pipe=stdin --proto=6 --aport=25 \ --pass=stdout --fail=more-nomail.raw | rwuniq --field=1 --no-title | \ sort -nr >more-mail-saddr.txt <2>$ rwuniq --field=1 more-nomail.raw --no-title | \ sort -nr >more-nomail-saddr.txt <3>$ join more-mail-saddr.txt more-nomail-saddr.txt | sort -nr "-t|" -k2 10.0.0.9| 97| 22| 10.0.0.12| 30| 1| 10.0.0.14| 15| 1| 10.0.0.16| 6| 1| 10.0.0.17| 3| 1| Example 3-38: Using rwuniq to Isolate Email and Non-Email Behavior
56
Chapter 4
Using the Larger SiLK Tool Suite

The previous chapter described the basic SiLK tools and how to use them; with the knowledge from that chapter and a scripting language, an analyst is capable of doing many forms of trac analysis using ow records. However, to both speed up and simplify analyses, the SiLK suite includes a variety of additional analytical tools. This chapter describes the other tools in the analysis suite and explains how to use them. As in the previous chapter, well introduce these tools, present a series of example analyses, and briey summarize the function of the common parameters for each tool.
4.1
4.1.1
Common Tool Behavior

Structure of a Typical Command-Line Invocation
The SiLK suites UNIX tools are traditionally called as piped commands invoked at a standard command-line prompt. As an example, consider the sequence of commands in Example 4-1.
<1>$ date; rwfilter --start-date=2010/08/06:00 \ --end-date=2010/08/06:02 --proto=0-255 --pass=stdout | \ rwstats --protocol --top --count=5 --flows ; date Sat Aug 28 18:38:49 UTC 2010 INPUT SIZE: 353964202 records for 161 unique keys PROTOCOL Key: Top 5 flow counts protocol| Records|%_of_total| cumul_%| 6| 204829220| 57.867213| 57.867213| 17| 132632868| 37.470701| 95.337914| 1| 16374780| 4.626112| 99.964026| 50| 124719| 0.035235| 99.999261| 47| 2081| 0.000588| 99.999849| Sat Aug 28 18:48:45 UTC 2010 Example 4-1: A Typical Sequence of Commands This example command includes timing information for reference (the calls to date). As it shows, it takes approximately ten minutes to process 354 million records of trac data (this is from a RAID array, but to a single processor). 57
4.1.2
Getting Tool Help
All SiLK tools include a help screen that provides a summary of command information. The help screen can be invoked by using the --help argument with the command.
<1>$ rwset --help rwset {--sip-file=FILE | --dip-file=FILE | --nhip-file=FILE} [FILES] Read SiLK Flow records and generate binary IPset file(s). When no files are given on command line, flows are read from STDIN. SWITCHES: --help No Arg. Print this usage output and exit. Def. No --version No Arg. Print this programs version and exit. Def. No --sip-file Req Arg. Create an IP set from source addresses and write it to the named file (file must not exist) --dip-file Req Arg. Create an IP set from destination addresses and write it to the named file (file must not exist) --nhip-file Req Arg. Create an IP set from next-hop addresses and write it to the named file (file must not exist) --print-filenames No Arg. Print names of input files as they are opened. Def. No --copy-input Req Arg. Copy all input SiLK Flows to given pipe or file. Def. No --note-add Req Arg. Store the textual argument in the output SiLK files header as an annotation. Switch may be repeated to add multiple annotations --note-file-add Req Arg. Store the content of the named text file in the output SiLK files header as an annotation. Switch may be repeated. --compression-method Req Arg. Set compression for binary output file(s). Def. lzo1x. Choices: best [=lzo1x], none, zlib, lzo1x --site-config-file Req Arg. Location of the site configuration file. Def. $SILK_CONFIG_FILE or $SILK_DATA_ROOTDIR/silk.conf <2>$ rwset --version rwset: part of SiLK 2.1.0; configuration settings: * Root of packed data tree: /data * Packing logic: packlogic-gen.c * Timezone support: UTC * Available compression methods: lzo1x [default], none, zlib * IPv6 support: no * IPFIX collection support: yes * AMP support: yes * Transport encryption: GnuTLS * PySiLK support: /usr/lib64/python2.4/site-packages * Enable assert(): no Copyright (C) 2001-2009 by Carnegie Mellon University GNU General Public License (GPL) Rights pursuant to Version 2, June 1991. Some included library code covered by LGPL 2.1; see source for details. Government Purpose License Rights (GPLR) pursuant to DFARS 252.227-7013. Send bug reports, feature requests, and comments to netsa-help@cert.org. Example 4-2: Using --help and --version SiLK is distributed with conventional UNIX manual pages and with The SiLK Reference Guide, both of which explain all of the parameters and the functionality of each tool in the suite. 58
All SiLK tools also have a --version parameter (as shown in command 2 of Example 4-2), which produces an output that identies the version that is installed. Since the suite is still being extended and evolved, this version information may be quite important.
4.2
Manipulating Flow-Record Files
Once data is pulled from the repository, an analysis may require that this data be extracted, rearranged, and combined with other ow data. This section describes the group of SiLK tools that manipulate ow record les: rwcat, rwappend, rwsplit, rwdedupe, rwfileinfo, and rwtuc.
4.2.1
Combining Flow Record Files with rwcat and rwappend
Example 4-3 proles ow records for trac with large aggregate volumes by the duration of transfer and by protocol. Even though subdividing les by repeated rwfilter calls allows the analyst to drill down to specic behavior, combining ow record les aids in providing context. The SiLK tool suite provides two tools for combining ow record les: rwcat, which concatenates ow record les in the order in which they are provided (see Figure 4.1), and rwappend, which places the contents of the ow record les on the end of the rst ow record le specied (see Figure 4.2).
rwcat
Description Concatenates SiLK ow record les to standard output Call rwcat someflows.raw moreflows.raw > allflows.raw Parameters --output-path Full path name of the output le --print-filenames Print input lenames while processing --xargs Treat stdin as a list of les to read, one name per line
Figure 4.1: Summary of rwcat
rwappend
Description Append the ow records from the successive les to the rst le Call rwappend allflows.raw laterflows.raw Parameters --create Create the TARGET-FILE if it does not exist. Uses the optional SiLK le argument to determine the format of TARGETFILE.
Figure 4.2: Summary of rwappend In Example 4-3 rwcat is used to combine previously ltered ow record les to permit the counting of overall values. In this example, the calls to rwfilter in command 1 pull out records describing high-volume trac (at least 2048 bytes transferred in packets with an average size of 70 bytes or more; this last restriction is to avoid ow records that are just an accumulation of small packets). These records are then split into three les, depending on the duration of the ow record: slow (at least 30 minutes), medium (10-30 minutes) and fast (less than 10 minutes). The calls to rwfilter in commands 2 through 4 split each of the initial divisions based on protocol: UDP (17), TCP (6), ICMP (1), and all others. The call to rwcat in Command 5 combines the three UDP splits into one overall UDP le. This ltering and combining allows generation of plots such as Figure 4.3 and Figure 4.4. 59
<1>$ rwfilter --start-date=2010/08/06:00 --end-date=2010/08/06:05 \ --type=in,inweb --bytes=2048- --bytes-per=70- --pass=stdout | \ rwfilter --input-pipe=stdin --duration=1800- \ --pass=slowfile.raw --fail=stdout |\ rwfilter --input-pipe=stdin --duration=600-1799 \ --pass=medfile.raw --fail=fastfile.raw <2>$ rwfilter slowfile.raw --proto=17 --pass=slow17.raw --fail=stdout |\ rwfilter --input-pipe=stdin --proto=6 --pass=slow6.raw --fail=stdout |\ rwfilter --input-pipe=stdin --proto=1 --pass=slow1.raw --fail=slowother.raw <3>$ rwfilter medfile.raw --proto=17 --pass=med17.raw --fail=stdout |\ rwfilter --input-pipe=stdin --proto=6 --pass=med6.raw --fail=stdout |\ rwfilter --input-pipe=stdin --proto=1 --pass=med1.raw --fail=medother.raw <4>$ rwfilter fastfile.raw --proto=17 --pass=fast17.raw --fail=stdout |\ rwfilter --input-pipe=stdin --proto=6 --pass=fast6.raw --fail=stdout |\ rwfilter --input-pipe=stdin --proto=1 --pass=fast1.raw --fail=fastother.raw <5>$ rwcat slow17.raw med17.raw fast17.raw >all17.raw Example 4-3: rwcat for Combining Flow-Record Files
large-byte UDP flows by duration 100000 all 0-10 min 30 plus min 10-30 min
10000
1000
100
10 00:00
01:00
02:00
03:00
04:00
05:00
06:00
Figure 4.3: One Display of Large Volume Flows
60
large-byte 10-30 min flows by protocol 10000 TCP UDP ICMP other
1000
100
10
1 00:00
01:00
02:00
03:00
04:00
05:00
06:00
Figure 4.4: Another Display of Large Volume Flows
61
4.2.2
Merging While Removing Duplicate Flow Records with rwdedupe
When merging les that come from dierent sensors, occasionally one needs to deal with having the same ow record collected by separate sensors. While this multiple recording is sometimes useful for traceability, more often it will distort the results of analysis. rwdedupe is designed to allow analysts to remove duplicate ow records eciently (those records having identical address, port and protocol information, with close timing and size information), with the syntax and common parameters shown in Figure 4.5.
rwdedupe
Description Call Remove duplicate ow records rwdedupe --stime-delta=100 --ignore=sensor S1.raw S2.raw > S1+2.raw Parameters --ignore-fields Ignore these eld(s), treating them as being identical when comparing records --packets-delta Treat the packets eld as identical if the values dier by this number of packets or less --bytes-delta Treat the bytes eld as identical if the values dier by this number of bytes or less --stime-delta Treat the stime eld as identical if the values dier by this number of milliseconds or less --duration-delta Treat the duration eld as identical if the values dier by this number of milliseconds or less --output-path Destination for output (stdout, le, or pipe)
Figure 4.5: Summary of rwdedupe Example 4-4 shows an example of using rwdedupe. In this example, command 1 creates named pipes for ecient passing of records. Commands 2 and 3 retrieve records from two sensors SITE1 and SITE2, passing them via the named pipes. Command 4 merges the two groups of records and does protocol counts before and after applying rwdedupe. Command 5 pauses while the ltering and counting completes. Commands 6 and 7 show the results of the protocol counts with the small dierence between results due to excluding duplicate records.
62
<1>$ mkfifo ./dedupe1.fifo ./dedupe2.fifo <2>$ rwfilter --sensor=SITE1 --start-date=2010/08/30:13 \ --end-date=2010/08/30:14 --type=in,inweb --proto=0-255 \ --pass=./dedupe1.fifo & [1] 24895 <3>$ rwfilter --sensor=SITE2 --start-date=2010/08/30:13 \ --end-date=2010/08/30:14 --type=in,inweb --proto=0-255 \ --pass=./dedupe2.fifo & [2] 24896 <4>$ rwcat ./dedupe1.fifo ./dedupe2.fifo \ | rwuniq --fields=proto --flows --output=dupe-1+2.txt --copy-input=stdout \ | rwdedupe --stime-delta=500 --ignore=sensor \ | rwuniq --fields=proto --flows --output=nodupe-1+2.txt & [3] 24897 24898 24899 24900 <5>$ wait [3] Done rwcat ./dedupe1.fifo ./dedupe2.fifo ... [2] - Done rwfilter --sensor=SITE2 ... [1] + Done rwfilter --sensor=SITE1 ... <6>$ cat dupe-1+2.txt pro| Records| 17| 3221643| 50| 123690| 6| 3180567| <7>$ cat nodupe-1+2.txt pro| Records| 17| 3221601| 50| 123689| 6| 3180564| Example 4-4: rwdedupe for Removing Duplicate Records
4.2.3
Dividing Flow Record Files with rwsplit
In addition to being able to join ow record les, some analyses are facilitated by dividing or sampling ow record les. To facilitate coarse parallelism, one approach is to divide a large ow record le into pieces and concurrently analyze each piece separately. For extremely high-volume problems, analyses on a series of robustly-taken samples can produce a reasonable estimate using substantively fewer resources. rwsplit is a tool that facilitates both of these approaches to analysis. Figure 4.6 provides an overview of the syntax of rwsplit and a summary of its most common parameters. On each call, the --basename is required and one of --ip-limit, --flow-limit, --packet-limit, or --byte-limit parameters must be present. As an example of a coarsely parallelized process, consider Example 4-5. Command 1 pulls a large number of ow records and then divides them into a series of 100,000-record les. In command 2, Each of these les is then fed in parallel to an email server inventory script, which does a series of ow-based tests to identify hosts acting as email servers. Applying these in parallel decreases the execution time of the analysis. Each execution of the script yields an IP-set le with a name derived from the ow record le. Command 3 waits for the parallel executions to complete. Command 4 unions these IP-set les to produce a composite set.
63
rwsplit
Description Call Divide the ow records into successive les rwsplit allflows.raw --basename=sample --flow-limit=1000 Parameters --basename Specify base name for output sample les --ip-limit Specify IP address count at which to begin a new sample le --flow-limit Specify ow count at which to begin a new sample le --packet-limit Specify packet count at which to begin a new sample le --byte-limit Specify byte count at which to begin a new sample le --sample-ratio Specify denominator for ratio of records read to number written in sample le (e.g., 100 means to write 1 out of 100 records). --file-ratio Specify denominator for ratio of sample le names generated to total number written (e.g., 10 means 1 of every 10 les will be saved). --max-outputs Specify maximum number of les to write to disk.
Figure 4.6: Summary of rwsplit
<1>$ rwfilter --type=in,inweb --start-date=2010/08/27:13 \ --end-date=2010/08/27:22 --proto=6,17 --bytes-per=65- --pass=stdout | \ rwsplit --basename=part --flow-limit=100000 <2>$ for f in part*rwf ; do gen-email-inventory $f & done <3>$ wait <4>$ rwsettool --union part*email.set --output=email.set Example 4-5: Using rwsplit for Coarsely Parallel Analysis As an example of a sampled-ow process, consider Example 4-6. These commands estimate the percentage of UDP trac moving across a large infrastructure over a work-day. Command 1 does the initial data pull, retrieving a very large number of ow records, and then pulls 100 samples of 1,000 ow records each, with a 1% rate of sample generation (that is, of 100 samples of 1,000 records, only one sample is retained). Command 3 then summarizes each sample to isolate the percentage of UDP trac in the sample, and the resulting percentages are proled in commands 5 through 7 to report the minimum, maximum and median percentages.
64
<1>$ rwfilter --type=in,inweb --start-date=2010/08/27:13 \ --end-date=2010/08/27:22 --proto=0-255 --pass=stdout |\ rwsplit --sample-ratio=100 --flow-limit=1000 \ --basename=sample --max-output=100 <2>$ echo -n >udpsample.txt <3>$ for f in sample*; do rwstats --protocol --flows --count=30 --top | \ grep "17|" | cut -f3 "-d|" >>udpsample.txt done <4>$ sort -nr udpsample.txt >tmp.txt <5>$ echo -n "Max UDP%: "; head -1 tmp.txt Max UDP%: 58.723 <6>$ echo -n "Min UDP%: " ; tail -1 tmp.txt Min UDP%: 5.439 <7>$ echo -n "Median UDP%: "; head -50 tmp.txt | tail -1 Median UDP%: 39.422 Example 4-6: Using rwsplit to Generate Statistics on Flow-Record Files
4.2.4
Keeping Track of File Characteristics with rwfileinfo
Analyses using the SiLK tool suite can become quite complex, with several intermediate products created while isolating behavior of interest. One tool that can aid in managing these products is rwfileinfo, which displays a variety of characteristics for each le format produced by the SiLK tool suite. Some of these characteristics are shown in Example 4-7. rwfileinfo has a --fields parameter to allow analysts to specify the characteristics they are interested in seeing, as shown in command 2 of Example 4-7. For most analysts, the most important characteristics are the last three shown: the record count, le size, and command-line information. Record count is the number of ow records in the le, and the le size is the resulting size of the le. The command-lines eld shows the commands used to generate the le.
65
<1>$ rwfileinfo medfile.raw medfile.raw: format(id) FT_RWGENERIC(0x16) version 16 byte-order littleEndian compression(id) lzo1x(2) header-length 416 record-length 52 record-version 5 silk-version 1.1.1 count-records 282560 file-size 7575022 command-lines 1 rwfilter --start-date=2010/08/06:00 --end-date=2010/08/06:05 --type=in,inweb --bytes=2048- --bytes-per=70--pass=/tmp/rwfilter-tmpfifo.XXmNiFYZ 2 rwfilter --input-pipe=stdin --duration=1800--pass=slowfile.raw --fail=stdout 3 rwfilter --input-pipe=stdin --duration=600-1799 --pass=medfile.raw --fail=fastfile.raw <2>$ rwfileinfo --fields=count-records medfile.raw medfile.raw: count-records 282560 Example 4-7: rwfileinfo for Display of Data File Characteristics Flow-record les produced by rwfilter maintain a historical record that can be used to trace how a le was created and where it was generated. This information can be extracted using the rwfileinfo command. Example 4-7 shows an example of the results from an rwfileinfo command. This eld consists of a list of commands in historical order. In the current implementation, these commands are preserved only by the rwfilter command, so the command-lines are useful to record a series of rwfilter calls since either the data pull from the repository or the most recent call to a SiLK tool other than rwfilter. A future release will preserve the command history through more SiLK tools. Example 4-8 shows how the command-lines eld expands with progressive ltering, and how rwsort does not preserve this history information. There is an annotations characteristic that is supported by several tools1 as shown in command 6 of Example 4-8. Annotations can be displayed using rwfileinfo. Eventually, all le manipulation tools will be able to add and preserve annotations, but for the current release, only those tools that add annotations appear to preserve them while manipulating les.
1 Currently,
rwfilter, rwcat, rwset,rwsetbuild, rwsettool, rwpmapbuild, rwbag, rwbagbuild, and rwbagtool.
66
<1>$ rwfileinfo --field=command-lines slowfile.raw slowfile.raw: command-lines 1 rwfilter --start-date=2010/08/06:00 --end-date=2010/08/06:05 --type=in,inweb --bytes=2048- --bytes-per=70--pass=/tmp/rwfilter-tmpfifo.XXmNiFYZ 2 rwfilter --input-pipe=stdin --duration=1800--pass=slowfile.raw --fail=stdout <2>$ rwfilter slowfile.raw --dport=22 --proto=6 --pass=newfile.raw <3>$ rwfileinfo --field=command-lines newfile.raw newfile.raw: command-lines 1 rwfilter --start-date=2010/08/06:00 --end-date=2010/08/06:05 --type=in,inweb --bytes=2048- --bytes-per=70--pass=/tmp/rwfilter-tmpfifo.XXmNiFYZ 2 rwfilter --input-pipe=stdin --duration=1800--pass=slowfile.raw --fail=stdout 3 rwfilter --dport=22 --proto=6 --pass=newfile.raw slowfile.raw <4>$ rwsort --fields=9 newfile.raw >sorted.raw rwsort: Warning: Using default temporary directory /tmp <5>$ rwfileinfo --field=command-lines sorted.raw sorted.raw: command-lines 1 rwsort --fields=9 newfile.raw <6>$ rwfilter sorted.raw --sport=1024-99999 --pass=new2.raw \ --note-add="originally from slowfile.raw, filtered for dport 22/TCP" <7>$ rwfileinfo --field=command-lines,annotations new2.raw new2.raw: command-lines 1 rwsort --fields=9 newfile.raw 2 rwfilter --sport=1024-9999 --pass=new2.raw --note-add=originally from slowfile.raw, filtered for dport 22/TCP sorted.raw annotations 1 originally from slowfile.raw, filtered for dport 22/TCP Example 4-8: rwfileinfo for Showing Command History
4.2.5
Creating Flow-Record Files from Text with rwtuc
The rwtuc (Text Utility Converter) tool allows creating SiLK ow record les from columnar text information. rwtuc, eectively, is the inverse of rwcut, and its parameters are similar, although it has additional parameters to supply values not given by the columnar input. rwtuc is useful in several scenarios. Some scripting language (Perl in particular) have string-processing functions that may be used for analysis, but for compactness and speed of later processing, a binary result may be needed. Therefore, rwcut would be used to convert the binary ow record les to text, the scripting language would process it, and rwtuc would convert the text output back to the binary ow record format. However, if the scripting can be done in the Python programming language, the pysilk module contains a programming interface to allow direct manipulation of the binary structures without the preceding conversion 67
to text or the following conversion to binary. This binary manipulation is more ecient than a text-based form. Alternatively, if a le needs to be cleansed for data exchange, it is desirable to have complete control of the content of the binary representation. By converting to text and then performing any required edits on the text, then generating a binary representation from the edited text, an analyst can ensure that no unreleasable content is present in the binary form. Example 4-9 shows a sample use of rwtuc. After the rwtuc, both the header information and non-preserved elds have generic or null values.
<1>$ rwfilter --start-date=2010/08/06:00 --end-date=2010/08/06:05 \ --type=in --proto=0,2-5,7-16,18-255 --packets=10- \ --bytes-per=100- --pass=bigflows.raw <2>$ rwcut --fields=1-9 --num-recs=20 bigflows.raw |\ sed -e "s/[0-9]*\.[0-9]*\.[0-9]*\.$[0-9]*$|/10.3.2.\1|/g" > bigflw.txt <3>$ rwtuc --fields=1-9 bigflw.txt >cleansed.raw <4>$ rwfileinfo cleansed.raw cleansed.raw: format(id) FT_RWGENERIC(0x16) version 16 byte-order littleEndian compression(id) lzo1x(2) header-length 104 record-length 52 record-version 5 silk-version 1.1.1 count-records 20 file-size 548 command-lines 1 rwtuc --fields=1-9 bigflw.txt <5>$ rwcut --fields=sip,dip,stime,sensor,nhIP --num-recs=4 cleansed.raw sIP| dIP| sTime| sensor| nhIP| 10.3.2.107| 10.3.2.18|2010/08/06T03:00:31.913| S0| 0.0.0.0| 10.3.2.107| 10.3.2.18|2010/08/06T03:02:37.274| S0| 0.0.0.0| 10.3.2.77| 10.3.2.5|2010/08/06T03:00:41.556| S0| 0.0.0.0| 10.3.2.6| 10.3.2.110|2010/08/06T03:06:25.117| S0| 0.0.0.0|
Example 4-9: rwtuc for Simple File Cleansing
rwtuc expects input in the default format for rwcut output. It has a --column-separator parameter, with an argument that species the character that separates columns in the input. For debugging purposes, an analyst can specify --bad-input-lines with an argument that gives a le or pipe to which rwtuc will write input lines that it cannot parse. For values not specied in the input, an analyst can either let them default to zero (as shown in Example 4-9), or use parameters of the form --eldname=xedvalue to set a single xed values for each eld, instead of using zero. rwtuc supports the eld names and numbers for elds 1 through 25 in Table 3.6. 68
4.3
Analyzing Packet Data with rwptoflow and rwpmatch
The rwptoflow and rwpmatch tools allow an analyst to apply the SiLK analysis tools to packet data by allowing the user to generate single-packet ow records from packet-content (i.e., pcap) data, analyze and lter those ow records using the SiLK tools, and subsequently lter the packet data based on that analysis. Third-party tools, such as ngrep (http://ngrep.sourceforge.net/) may also lter packet content data based on regular expressions. Another option for processing packets is to aggregate the packets into true ow records. There is a tool rwp2yaf2silk that will do this, using the features of rwtuc and the yaf and yafascii tools (the latter are available from http://tools.netsa.cert.org/yaf). Once converted to ow records, all the SiLK tools can process them as if they were from the repository, but it is currently dicult to re-identify packets with processed ow records. For analyses that involve both packet and ow analysis, rwptoflow and rwpmatch are currently preferred.
4.3.1
Creating Flows from Packets Using rwptoflow
The rwptoflow tool generates a single-packet ow record for every IP packet in a tcpdump le. The tcpdump packet formats do not contain routing information, which is available in some ow record formats. The values for routing-information ow record elds may be set for the generated ows using the parameters --set-sensorid, --set-inputindex, --set-outputindex, and --set-nexthopip. For example, it is possible to set the sensor-id manually for a packet content source, so that network ow data that is combined from several sensors can be ltered or sorted by the sensor value later. rwptoflow is summarized in Figure 4.7. rwptoflow with --active-time can be used to specify generation of ows only for a specic time interval of interest. During this time interval, --packet-pass-out and --packet-reject-out can be used to produce packet les that either were converted to ows or not converted to ows. Finally, the --plugin parameter can be used to incorporate plug-ins for additional functionality in packet conversion, analogous to rwlter plug-ins.
rwptoow
Description Read a tcpdump le and generate a SiLK ow record for every packet. Call rwptoflow packets.dmp >flows.raw Parameters --set-sensorid Set the sensor id for all ows (0-65534) --active-time Set the time interval of interest --packet-pass-out Specify a path for valid packets in the time interval of interest --packet-reject-out Like --packet-pass-out, but for invalid packets --plugin Specify plugin to be used in the conversion.
Figure 4.7: Summary of rwptoflow There are several reasons why a packet might not be converted to a ow record: The packet is not for an IP-based protocol. LAN-based protocols (such as the Address Resolution Protocol (ARP)) are not implemented on top of IP. As such, there isnt enough information in the packet to build a ow record for it. Other tools, such as tcpdump or wireshark can be used to examine and analyze these packets. 69
The packet is erroneous, and the information used to build a ow record is inconsistent in a way that prevents record generation. This may happen because of transmission problems with the packet or because the capture le may have been corrupted. The packet capture snaplength isnt large enough to capture all of the needed elds. If a very short snaplength is used, not all of the header may be captured and, therefore, the captured packet may not contain enough information to build a ow record for it. Any of these will cause the packet to be rejected. Example 4-10 shows a simple conversion of a capture le packets.dmp into a ow record le mypkts.raw, restricting the conversion to a specic time period and producing dumps of packets converted (mypkts.dmp) and rejected (mypkts-bad.dmp).
<1>$ rwptoflow --active=2010/08/25:05:27:15-2010/08/25:05:45:22 \ --packet-pass=mypkts.dmp --packet-reject=mypkts-bad.dmp \ packets.dmp >mypkts.raw Example 4-10: rwptoflow for Simple Packet Conversion
4.3.2
Matching Flow Records With Packet Data Using rwpmatch
rwpmatch takes a tcpdump input le and lters it based on ow records from a SiLK ow record le. It is designed to allow ow records from rwptoflow (and then ltered or processed) to be matched with the packet content data that produced them. The resulting tcpdump le is output on standard output. The ow record le input to rwpmatch should contain single-packet ow records (e.g., those originally derived from a tcpdump le using rwptoflow). If a ow record is found that does not represent a corresponding packet record, rwpmatch will return an error. Both the tcpdump and the ow record le inputs must be timeordered. The syntax of rwpmatch is summarized in Figure 4.8. By default, rwpmatch will consider only the source address, destination address, and the time to the second. By using the --ports-compare parameter, the source and destination port can also be considered in the match. By using the --msec-compare time will be compared to the millisecond.
rwpmatch
Description Match a tcpdump le against a SiLK ow record le that has a ow for every packet, producing a new tcpdump le on standard output. Call rwpmatch --flow-file=flows.raw packets.dmp >flows.dmp Parameters --flow-file Specify the ow record le to be used in the match --ports-compare Use port information in the match --msec-compare Use milliseconds in the match
Figure 4.8: Summary of rwpmatch It is important to recognize that rwpmatch is Input/Output intensive. The tool works by reading an entire tcpdump capture le and the entire ow record le. It may be worthwhile to optimize an analysis process to avoid using rwpmatch until payload ltering is necessary. Saving the output from rwpmatch as a partialresults le, and comparing that le to les generated by later steps in the analysis (rather than comparing the later results against the original tcpdump le) can also provide signicant performance gains. 70
The packet-analysis tools are typically used in combination with payload-ltering tools, like ngrep, which allow an analyst to partition trac based on payload signatures prior to using the SiLK tools for analysis, or, conversely, to identify a trac phenomenon (e.g., worm propagation) through ow analysis and then lter the packets that correspond to the ow records that make up that trac. In Example 4-11, a tcpdump le data.tcp is ltered by the IP-set le sip.set by converting it to a SiLK ow record le, ltering the ows by the source IPs found in the set, and then matching the original tcpdump le against the ltered SiLK le.
<1>$ rwptoflow data.pcap > data.rwf <2>$ rwfilter --sipset=sip.set --pass=filtered.rwf data.rwf <3>$ rwpmatch --flow-file=filtered.rwf data.pcap > filtered.pcap Example 4-11: rwptoflow and rwpmatch for Filtering Packets Using an IP Set
4.4
IP Masking with rwnetmask
When working with IP addresses and utilities such as rwuniq and rwstats, an analyst will often want to analyze activity across networks rather than individual IP addresses (for example, all the activity originating from the /24s comprising the enterprise network rather than generating an individual entry for each address). To do so, SiLK provides a tool called rwnetmask, which can reduce IP addresses to their prex values. The query in Example 4-12 uses rwnetmask to mask out the last 16 bits of the source IP address.
<1>$ rwfilter --start-date=2010/08/01:00 --end-date=2010/08/01:01 \ --type=in --proto=6 --dport=25 --max-pass=3 --pass=stdout \ | rwnetmask --source=16 \ | rwcut --num-recs=3 --field=1-5 sIP| dIP|sPort|dPort|pro| 10.1.0.0| 10.0.0.2|56485| 25| 6| 10.3.0.0| 10.0.0.2|40865| 25| 6| 10.4.0.0| 10.0.0.5|58299| 25| 6| Example 4-12: rwnetmask for Abstracting Source IPs As this example shows, rwnetmask replaces the last 16 bits of the source IP address with zero, so all IP addresses in the 10.3/16 network (for example) will have the same IP address. Using rwnetmask, an analyst can use any of the standard SiLK utilities across networks in the same way the analyst would use the utilities on individual IP addresses.
4.5
Summarizing Trac with IP Sets
Up to this point, this handbook have focused exclusively on raw SiLK records: trac that can be accessed and manipulated using the SiLK tools. This section focuses on initial summary structures: IP sets. The set tools provide facilities for manipulating summaries of data. The IP-set tools describe arbitrary collections of IP addresses. These sets can be generated from network ow data or via user-created text les. 71
4.5.1
What are IP Sets?
An IP set is a data structure that represents an arbitrary collection of individual IP addresses. For example, an IP set could consist of the addresses {1.1.1.3,92.18.128.22,125.66.11.44}, or all the addresses in a single /24. IP sets are binary representations of data. Using binary representations, sets can be manipulated eciently and reliably. Because IP sets are binary objects, they are created and modied using special set tools: rwset, rwsetbuild, rwsettool, rwsetmember, and rwsetcat. These tools allow an analyst to read and modify IP-set les.
4.5.2
Creating IP Sets with rwset
IP sets are created from ow records via rwset,2 from text via rwsetbuild, or from bags via rwbagtool (more information on rwbagtool is found in section 4.6.5). rwset is summarized in Figure 4.9.
rwset
Description Generates IP-Set Files from Flows Call rwset --sip-file=flow.sip.set flows.raw Parameters --sip-file Specify an IP-set le to generate with source IP addresses from the ows records --dip-file Like --sip-file, but for destination IP addresses --daddress (deprecated) Create set from source addresses --saddress (deprecated) Create set from destination addresses --set-file (deprecated) File to write set to if --saddress or --daddress is used
Figure 4.9: Summary of rwset rwset generates sets from lter records. To invoke it, pipe output from rwfilter into rwset, as shown in Example 4-13.
<1>$ rwfilter medfile.raw --proto=6 --pass=stdout | \ rwset --dip-file=medtcp-dest.set <2>$ file medtcp-dest.set medtcp-dest.set: data Example 4-13: rwset for Generating a Set File The call to rwset shown in Example 4-13 creates an IP-set le, named medtcp-dest.set, that consists of all the destination IP addresses for TCP records in medfile.raw. The file command shows that the result is a binary data le. An alternative method for generating sets is using the rwsetbuild tool. rwsetbuild3 reads a text le containing IP addresses and generates an IP-set le with those addresses (see Example 4-16 for sample calls).
2 IP
3 This
sets can also be created from ow records using a deprecated tool called rwaddrcount. tool was previously called buildset.
72
4.5.3
Reading Sets with rwsetcat
The primary tool for reading sets is rwsetcat,4 that can read a set le, display the IP addresses in that le, and print out statistics about the le. The basic invocation of rwsetcat is shown in Example 4-14. A summary of some common parameters is shown in Figure 4.10.
<1>$ rwsetcat medtcp-dest.set | head -5 10.0.3.225 10.0.4.93 10.0.28.74 10.0.28.214 10.0.37.43 Example 4-14: rwsetcat to Display IP Sets
rwsetcat
Description Lists IP-Set Files as text on standard output Call rwsetcat low.sip.set Parameters --count-ips Print the number of IPs; disables default printing of IPs --print-ips Also print IPs when count or statistics switch is given --network-structure Print the network structure of the set Optional argument species counts by a combination of T for Total address space, A for /8, B for /16, C for /24, X for /27, and H for /32; with S for roll-up summaries. --print-statistics Print set statistics (min-/max-ip, etc)
Figure 4.10: Summary of rwsetcat Example 4-14 shows, the call to rwsetcat will print out all the addresses in the set; IP addresses are ordered in ascending order. In addition to printing out IPs, rwsetcat can also perform counting and statistical reporting, as shown in Example 4-15. These features are useful for describing the set without dumping out all the IP addresses in the set. Since sets can have up to four billion addresses, counting with rwsetcat tends to be much faster than counting via text tools such as wc.
4 This
tool was previously called readset
73
<1>$ rwsetcat --count-ip medtcp-dest.set 3865 <2>$ rwsetcat --print-stat medtcp-dest.set Network Summary minimumIP = 10.0.3.225 maximumIP = 10.255.253.217 3865 hosts (/32s), 0.000090% of 2^32 1 occupied /8, 0.390625% of 2^8 256 occupied /16s, 0.390625% of 2^16 3756 occupied /24s, 0.022388% of 2^24 3850 occupied /27s, 0.002868% of 2^27 <3>$ rwsetcat --network-structure medtcp-dest.set TOTAL| 3865 hosts in 1 /8, 256 /16s, 3756 /24s, and 3850 /27s Example 4-15: rwsetcat --count-ip, --print-stat, and --network-description for Showing Structure
4.5.4
Manipulating Sets with rwsettool
rwsettool is the primary tool used to manipulate sets, once constructed.5 It provides the most common set operations, working on arbitrary numbers of IP-set les. (See Figure 4.11 for a summary of its syntax and most common switches.
rwsettool
Description Manipulates IP-Set Files to Produce New IP-Set Files Call rwsettool --mask=16 my.set >only-16.set Parameters --union Create set containing IPs in any parameter le --intersect Create set containing IPs in all parameter les --difference Create set containing IPs from rst le not in any of the remaining les --mask Create set containing one IP from each block of the specied bitmask length when the ANY of the input IP sets have an IP in that block --sample Create an IP set containing a random sample of IPs from all input IP sets. Requires --size or --ratio --size Specify the sample size (number of IPs sampled from each input IP set) --ratio Specify the probability, as a oating point value between 0.0 and 1.0, that an IP will be sampled --seed Specify the random number seed for the sample --output-path Write the resulting IP set to this location
Figure 4.11: Summary of rwsettool rwsettool --intersect is used to intersect sets. For an example of how this works, the analysis rst create two sets using rwsetbuild (as shown in Example 4-16): one consisting of the IP addresses 1.1.1.1-5 and the other consisting of the IP addresses 1.1.1.3, 1.1.1.5, and 2.2.2.2.
5 There
are deprecated tools rwsetintersect and rwsetunion, but the functions of these tools are subsumed into rwsettool.
74
<1>$ echo "1.1.1.1-5" > set_a.txt <2>$ cat <<END_FILE >>set_b.txt 1.1.1.3 1.1.1.5 2.2.2.2 END_FILE <3>$ rwsetbuild set_a.txt a.set <4>$ rwsetbuild set_b.txt b.set Example 4-16: rwsetbuild for Generating IP Sets The example now intersects the two sets. Each set is specied by le name as a parameter. The resulting set is written to the le inter.result.set as shown in Command 1 in Example 4-17, with the results shown after Command 3. As the example shows, the resulting set consists of the IP addresses 1.1.1.3 and 1.1.1.5; the intersection of any two sets is the set of IP addresses present in each individual set. In addition to straight intersection, rwsettool can also be used to subtract the contents of one set from another, using the --difference parameter as shown in Command 2 of Example 4-17. The resulting set sub.result.set is shown after Command 4.
<1>$ rwsettool --intersect a.set b.set --output=inter.result.set <2>$ rwsettool --difference a.set b.set --output=sub.result.set <3>$ rwsetcat inter.result.set 1.1.1.3 1.1.1.5 <4>$ rwsetcat sub.result.set 1.1.1.1 1.1.1.2 1.1.1.4 Example 4-17: rwsettool --intersect and --difference The sub.result.set consists of all elements that were in a.set and were not in b.set. rwsettool will accept any number of set les as parameters, as long as there is at least one. The IP-set union command is rwsettool --union. This takes a list of set les and returns a set that consists of all IP addresses that appear in any of the les. Example 4-18, using a.set and b.set, demonstrates this capability.
75
<1>$ rwsettool --union a.set b.set --output=union.result.set <2>$ rwsetcat union.result.set 1.1.1.1 1.1.1.2 1.1.1.3 1.1.1.4 1.1.1.5 2.2.2.2 Example 4-18: rwsettool --union rwsetmember allows easy testing for the presence of an address in one or more IP-set les. Example 4-19 shows some examples of its use.
<1>$ rwsetmember b.set <2>$ rwsetmember <3>$ rwsetmember a.set b.set <4>$ rwsetmember a.set:1 b.set:1
2.2.2.2 b.set 2.2.2.2 a.set 1.1.1.3 a.set b.set
1.1.1.3 a.set b.set --count
Example 4-19: rwsetmember to Test for an address
4.5.5
Using rwsettool --intersect to Fine-Tune IP Sets
Using IP sets can focus on alternative representations of trac and identify dierent classes of activity. Example 4-20 drills down on IP sets themselves and provides a dierent view of this trac.
<1>$ rwfilter --proto=6 --packets=1-3 --pass=stdout fastfile.raw \ | rwset --sip-file=fast-low.set <2>$ rwfilter --proto=6 --packets=4- --pass=stdout fastfile.raw \ | rwset --sip-file=fast-high.set <3>$ rwsettool --difference fast-low.set fast-high.set \ --output=fast-only-low.set <4>$ rwsetcat --count-ips fast-low.set 34830 <5>$ rwsetcat --count-ips fast-only-low.set 1697 Example 4-20: Using rwset to Filter for a Set of Scanners In this example, we isolate the set of hosts that exclusively scan from a group of ow records by using rwfilter to separate the set of IP addresses that complete legitimate TCP sessions from the set of IP addresses that never complete sessions. As this example shows, the set le fast-only-low.set consists of 76
1,697 IP addresses in contrast to the set of 34,830 that produced low-packet ow recordsthese addresses are consequently suspicious.6
4.5.6
Using rwsettool --union to Examine IP Set Structure
One way to use rwsettool --union is to track common customers to a single site. Consider the following sequence of operations in Example 4-21. The example begins by generating hourly IP sets for trac reaching one server, using a BASH shell script.
for i in 1 2 3 ; do j=0 while [ $j -le 23 ] ; do rwfilter --start-date=2010/08/${i}:${j} \ --end-date=2010/08/${i}:${j} --type=in \ --daddress=10.114.200.6 --proto=17 --dport=53 \ --pass=stdout \ | rwset --sip-file=day-${i}-hour-${j}.set echo "Finished $i/$j" j=$[ ${j} + 1 ] done done Example 4-21: A Script for Generating Hourly Sets Running this script results in 72 les, one for each hour. After this, the script shown in Example 4-22 uses rwsettool to build the cumulative set of addresses across hours.
cp day-1-hour-0.set buffer n=0 for i in day*.set ; do rwsettool --union buffer $i --output=newbuffer mv newbuffer buffer d=rwsetcat --count <buffer echo "$n $d">> total_ips n=$[ ${n} + 1 ] done Example 4-22: Counting Hourly Set Records This example starts by copying the rst hour into a temporary le (buffer). It then iterate through every set le in the directory, creating a union of the result with the buffer le, and printing the total number of IP addresses from the union. The resulting le can then be plotted with gnuplot. The graph in Figure 4.12 shows the resulting image: the cumulative number of source IP addresses seen in each hour.
6 While this might be indicative of scanning activity, the task of scan detection is more complex than shown in Example 4-20. Scanners sometimes complete connections to hosts that respond (to exploit vulnerable machines); Non-scanning hosts sometimes consistently fail to complete connections to a given host (contacting a host that no longer oers a service).
77
400000 cumulative source addresses 350000
300000
250000
200000
150000
100000
50000
0 0 6 12 18 24 30 36 42 48 54 60 66 72
Figure 4.12: Graph of Hourly Source IP Address Set Growth
78
4.5.7
Backdoor Analysis with IP Sets
A backdoor, in this context, is a network route that bypasses security controls. Border routers of a monitored network should pass only those incoming packets that are outside of the IP space of the monitored network, and they should pass only those outgoing packets that are inside of the monitored networks IP space. However, a variety of routing anomalies and back doors spoil this ideal. The rst step in performing backdoor analysis is actually dening the IP space for the network being monitored. The easiest way to do this is to create a text le that describes the net blocks that make up the network using the rwsetbuild tool. For example, when monitoring two networks, 192.168.1.0/24 and 192.168.2.0/24, the analyst would create a text le called mynetwork.txt with those two CIDR blocks on separate lines. and run rwsetbuild to create a binary set le called mynetwork.set as shown in Example 423.
<1>$ cat mynetwork.txt 192.168.1.0/24 192.168.2.0/24 <2>$ rwsetbuild mynetwork.txt mynetwork.set Example 4-23: rwsetbuild for Building an Address Space IP Set Once the set exists, the analyst can use it as the basis for ltering. For example, ltering on source address identies incoming trac from internal IP addresses as shown in Command 1 of Example 4-24. An analyst might also want to identify outgoing trac originating from external IP addresses, as shown in Command 2 of Example 4-24. Similar ltering can be done using the destination IP address with the --dipset and --not-dipset parameters to rwfilter, or a combination of source IP set and destination IP set parameters.
<1>$ rwfilter --start-date=2010/08/01:00 --end-date=2010/08/01:01 \ --type=in,inweb --sipset=mynetwork.set --pass=strange_in.raw <2>$ rwfilter --start-date=2010/08/01:00 --end-date=2010/08/01:01 \ --type=out,outweb --not-sipset=mynetwork.set --pass=strange_out.raw Example 4-24: Backdoor Filtering Based on Address Space
4.6
4.6.1
Summarizing Trac with Bags

What Are Bags?
Bags are sets augmented with a volume measure for each value. Where IP sets record the presence or absence of particular key values, bags add the ability to count the number of instances of a particular key valuethat is, the number of bytes, the number of packets, or the number of ow records associated with that key. Bags also add the capability to summarize trac on characteristics other than IP addresses specically on protocols and on ports. Bags are eectively enhanced sets: like sets, they are binary structures that can be manipulated using a collection of tools. As a result, operations that are performed on sets (such as unions and intersections) have analogous bag operations, such as addition. Analysts can also extract a covering set (the set of all IP addresses in the bag) from an IP-address bag for use with rwfilter and the set tools. 79
4.6.2
Using rwbag to Generate Bags from Data
As shown in Figure 4.13, rwbag generates les as specied by a group of parameters with a specic naming scheme. In each parameter, the s and d in the prex refer to source and destination IPs, while the other letter refers to bytes(b), packets(p), or ow records(f). If the parameter has no prex, the bag is keyed by IP address. The prex port- has the bag keyed by either source or destination ports. The prex protohas the bag keyed by protocol (with no source or destination letter). Consequently, to build a bag le that counts ow records keyed on the source IP address, use --sf-file; a bag of packets keyed on the destination port would be generated using --port-dp-file; a bag of bytes keyed by protocol would be generated using --proto-b-file. As shown in Example 4-25, more than one bag can be generated in one call to rwbag, as specied by its parameters.
rwbag
Description Generate bags from ow record le Call rwbag --sp-file=x.bag --df-file=y.bag flow.raw Parameters --db-file Generate bag of destination IP addresses, counting bytes --df-file Like --db-file, but counting ow records --dp-file Like --db-file, but counting packets --sb-file Like --db-file, but for source IP addresses --sf-file Like --df-file, but for source IP addresses --sp-file Like --dp-file, but for source IP addresses --port-sb-file Generate bag of source ports, counting bytes --proto-b-file Generate bag of protocols, counting bytes
Figure 4.13: Summary of rwbag
<1>$ rwfilter --start-date=2010/08/01:00 --end-date=2010/08/01:01 \ --proto=6 --pass=stdout \ | rwbag --sp-file=x.bag --df-file=y.bag <2>$ file x.bag y.bag x.bag: data y.bag: data Example 4-25: rwbag for Generating Bags
4.6.3
Reading Bags Using rwbagcat
rwbagcat is a bag reading-and-display tool. This tool and its common parameters are summarized in Figure 4.14. The default call to rwbagcat displays the contents of a bag in sorted order, as shown in Example 4-26.
80
rwbagcat
Description Reads and displays or summarizes Bag Contents Call rwbagcat x.bag Parameters --mincount Display only entries with counts of at least argument --maxcount Display only entries with counts no larger than argument --minkey Display only entries with keys of at least argument --maxkey Display only entries with keys no larger than argument --bin-ips Summarize entries at each value of count --integer-keys Display keys as integers, rather than dotted quad
Figure 4.14: Summary of rwbagcat
<1>$ rwbagcat x.bag | head -5 10.85.214.205| 10.155.196.61| 10.192.0.80| 10.238.153.113| 10.247.226.88|
1| 1| 1| 1| 1|
Example 4-26: rwbagcat for Displaying Bags In Example 4-26, the counts (the number of elements that match a particular IP) are printed per key. rwbagcat provides additional display capabilities. For example, rwbagcat can print values within ranges of both counts and keys, as shown in Example 4-27.
<1>$ rwbagcat --mincount=500 --maxcount=505 y.bag | head -5 10.193.217.52| 503| 10.202.13.6| 501| 10.223.86.80| 505| 10.229.221.32| 504| 10.245.159.167| 502| <2>$ rwbagcat --minkey=10.50.0.0 --maxkey=10.180.0.0 y.bag | head -5 10.50.121.150| 3| 10.77.217.94| 2| 10.104.85.66| 1| 10.171.216.161| 1| 10.179.175.9| 1| Example 4-27: rwbagcat --mincount, --maxcount, --minkey and --maxkey to Filter Results These ltering values can be used in any combination. In addition to ltering, rwbagcat can also reverse the index; that is, instead of printing the number of counted elements per key, it can produce a count of the number of keys matching each count by using the --bin-ips command as shown in Example 4-28.
81
<1>$ rwbagcat --bin-ips x.bag | head -5 1| 112516| 2| 24753| 3| 74373| 4| 31638| 5| 24203| Example 4-28: rwbagcat --bin-ips to Display Unique IPs Per Value The --bin-ips command can be particularly useful for distinguishing between sites that are hit by scans (where only one or two packets may appear) versus sites that are engaged in serious activity. If the bag is not keyed by IP addresses, the --integer-keys switch makes it much easier to read the output of rwbagcat. Example 4-29 shows the dierence in output for a port-keyed bag counting bytes, where the larger port value is 65000.
<1>$ rwbagcat in.bag 0.0.0.3| 56| 0.0.253.232| 280| <2>$ rwbagcat --integer-keys in.bag 3| 56| 65000| 280| Example 4-29: rwbagcat --integer-keys
4.6.4
Using Bags: A Scanning Example
To see how bags dier from sets in a useful way, lets revisit the scanning lter presented in Example 4-20. The diculty with that code is that if a scanner completed any handshake, it would be excluded from the flow.only.low set. Many automated scanners would fall under this exclusion if any of their potential victims responded to the scan. It would be more robust to include as scanners hosts that complete only a small number of their connections (10 or less) and have a reasonable number of ow records covering incomplete connections (10 or more). By using bags, Example 4-30 is able to incorporate counts, resulting in the detection of more potential scanners. The calls to rwfilter in Commands 1 through 3 are piped to rwbag to build the initial bags (of incomplete, FIN-terminated and RST-terminated trac, respectively). The latter two bags are merged in Command 4 to form a bag of completed connections. Commands 5 and 6 trim the complete- and incompleteconnection bags to the portions described above. Commands 7 and 8 generate the cover sets for these bags, and those cover sets are subtracted in Command 9, resulting in a scanning candidate set. The three sets are counted in Commands 10 through 12.
82
<1>$ rwfilter --proto=6 --flags-all=S/SRF --packets=1-3 \ --pass=stdout fastfile.raw | rwbag --sf-file=fast-low.bag <2>$ rwfilter --proto=6 --flags-all=SAF/SARF --pass=stdout fastfile.raw \ | rwbag --sf-file=fast-fin.bag <3>$ rwfilter --proto=6 --flags-all=SR/SRF --pass=stdout fastfile.raw \ | rwbag --sf-file=fast-rst.bag <4>$ rwbagtool --add fast-fin.bag fast-rst.bag --output=fast-high.bag <5>$ rwbagtool --add --maxcount=10 fast-high.bag --output=fast-hightrim.bag <6>$ rwbagtool --add --mincount=10 fast-low.bag --output=fast-lowtrim.bag <7>$ rwbagtool --coverset fast-hightrim.bag --output=fast-high.set <8>$ rwbagtool --coverset fast-lowtrim.bag --output=fast-low.set <9>$ rwsettool --difference fast-low.set fast-high.set --output=scan.set <10>$ rwsetcat --count-ips fast-low.set 932 <11>$ rwsetcat --count-ips fast-high.set 392998 <12>$ rwsetcat --count-ips scan.set 874 Example 4-30: Using rwbag to Filter Out a Set of Scanners
4.6.5
Manipulating Bags Using rwbagtool
rwbagtool provides bag manipulation capabilities (shown previously in Example 4-30), including adding and subtracting bags (analogous to the set operations), thresholding (ltering bags on volume), intersecting a bag and a set, and extracting a cover set from a bag.
rwbagtool
Description Manipulates bags and generates cover sets Call rwbagtool --add x.bag y.bag --output=z.bag Parameters --add Add two bags together (union) --subtract Subtract two bags (dierence) --output Specify where resulting bag or set should be stored --intersect Intersect a set and a bag --mincount Cut bag to entries with count of at least argument --maxcount Cut bag to entries with count of at most argument --minkey Cut bag to entries with key of at least argument --maxkey Cut bag to entries with key of at most argument --coverset Generate IP set for bag keys
Figure 4.15: Summary of rwbagtool
Adding and Subtracting Bags Because bags associate a size with each value they contain, it is possible to add and subtract bags, as well as performing threshold selection on the contents of a bag. The result is a bag with new volumes. To add bags together, use the --add parameter. Example 4-31 shows how bag addition works. 83
<1>$ rwbagcat x.bag 10.0.81.167| 1| 10.0.122.21| 2| 10.0.177.183| 1| <2>$ rwbagcat y.bag 10.0.177.183| 2| 10.1.204.229| 1| 10.1.224.89| 3| <3>rwbagtool --add x.bag y.bag --output=z.bag <4>$ rwbagcat z.bag 10.0.81.167| 1| 10.0.122.21| 2| 10.0.177.183| 3| 10.1.204.229| 1| 10.1.224.89| 3| Example 4-31: rwbagtool --add The --output parameter species where to deposit the results. Most of the results from rwbagtool are bags themselves. The subtraction command operates in the same fashion as the addition command, except that all bags are subtracted from the rst bag specied in the command. Bags cannot contain negative values and rwbagtool will not produce a bag if one of the key values is negative. Be careful when using the bag operations: bags contain no information on what type of data they contain. Consequently, rwbagtool will add byte bags and packet bags together without warning, producing meaningless results. Intersecting Bags and Sets The --intersect and --complement-intersect commands are used to intersect an IP set with a bag. Example 4-32 shows how to use these commands to extract a specic subnet.
<1>$ <2>$ <3>$ <4>$
echo 10.0-1.0-255.0-255 > f.txt rwsetbuild f.txt f.set rwbagtool --intersect=f.set x.bag --output=xf.bag rwbagcat xf.bag | head -5 10.0.225.158| 12| 10.1.46.49| 1| 10.1.67.101| 1| 10.1.81.86| 1| 10.1.150.243| 1|
Example 4-32: rwbagtool --intersect As this example shows, xf.bag consists only of those IPs within the 10.0-1.x.x IP address range. Thresholding with Count and Key Functions The same --minkey, --maxkey, --mincount, and --maxcount parameters supported by rwbagcat are also supported by rwbagtool. In this case, they specify the minimum count and key values for output, and they 84
must be combined with one of the other manipulation functions (such as intersect, add, or subtract). As shown in Example 4-33, an analyst can combine thresholding with set intersection to get a bag holding only elements with keys in the set and values over the threshold value (5, in this example).
<1>$ rwbagtool --intersect=f.set x.bag --mincount=5 --output=xf2.bag <2>$ rwbagcat xf2.bag | head -5 10.0.225.158| 12| 10.2.177.55| 51| 10.2.188.134| 645| 10.2.192.164| 740| 10.2.224.164| 48| Example 4-33: rwbagtool Combining Threshold with Set Intersection
Using --coverset to Extract Sets Although bags cannot be used directly with rwfilter, the --coverset parameter can be used to obtain the set of IP addresses in a bag, and this set can be used with rwfilter and manipulated with any of the set commands. The --coverset parameter is used with the --output parameter, but in this case the result will be an IP set rather than a bag, as shown in Example 4-34.
<1>$ rwbagtool --coverset x.bag --output=x.set <2>$ rwsetcat x.set | head -3 10.0.81.167 10.0.122.21 10.0.177.183 <3>$ rwbagcat x.bag | head -3 10.0.81.167| 1| 10.0.122.21| 1| 10.0.177.183| 1| Example 4-34: rwbagtool --coverset An analyst needs to be careful of bag content when using --coverset. Since bags contain no information about the type of data they contain, the --coverset parameter will interpret the keys as IP addresses even if they are actually protocol or port values. This will likely lead to analysis errors.
4.7
Labeling Related Flows with rwgroup and rwmatch
rwgroup and rwmatch are grouping tools that allow an analyst to label a set of ow records that share common attributes with an identier. This identier, the group ID, is stored in the next-hop-IP eld7 and it can be manipulated as an IP address (that is, either by directly specifying a group ID or by using IP sets). The two tools generate group IDs in dierent ways. The rwgroup tool walks through a le of ow records and groups records that have common attributes, such as source/destination IP pairs. The rwmatch tool
7 Using the annotation eld supported by the SiLK tools may reduce reliance than relying on the next-hop-IP eld to preserve relationships.
85
groups records of dierent types (typically, incoming and outgoing types) creating a le containing groups that represent TCP sessions or groups that represent other behavior. For scalability purposes, the grouping tools require that the data they process be sorted using rwsort. The sorted data must be sorted on the criteria eld: in the case of rwgroup, the ID-eld and delta elds; in the case of rwmatch, start time and the elds specied in the --relate parameter.
4.7.1
Labeling Based on Common Attributes with rwgroup
The rwgroup tool provides a way to group ow records that have common eld values. (See Figure 4.16 for a summary of this tool and its common parameters.) Once grouped, records in a group can be output separately (with each record in the group having a common ID), or summarized by a single record. Example applications of rwgroup include the following. Grouping together all the ow records for a long lived session: by specifying that records are grouped together via their port numbers and IP addresses, an analyst can assign a common ID to all the ow records making up a long lived session. Reconstructing web sessions: due to diversied hosting and caching services such as Akamai, a single web page on a commercial website is usually hosted on multiple servers. For example, the images may be on one server, the HTML text on a second server, advertising images on a third server, and multimedia on a fourth server. An analyst can use rwgroup to tag web trac ow records from a single user that are closely related in time and then use that information to identify individual web page fetches. Counting conversations: an analyst can group all the communications between two IP addresses together and see how much data was transferred between both sites regardless of port numbers. This is particularly useful when one site is using a large number of ephemeral ports.
rwgroup
Description Flag ow records that have common attributes Call rwgroup --id-field=1 --delta=2 Parameters --id-field=FIELD Specify elds that need to be identical --delta-field=FIELD Specify elds that need to be close --delta-value=DELTA Specify closeness --objective Specify that all delta-values are relative to the rst record, rather than the most recent --rec-threshold=THRESHOLD Specify minimum number of records in a group --summarize Produce a single record as output for each group, rather than all ow records
Figure 4.16: Summary of rwgroup The criteria for a group are specied by using the --id-field, --delta-field, and --delta-value parameters; records are grouped when the elds specied by --id-field are identical and the elds specied by --delta-field match within a value less than or equal to the value specied by --delta-value. Records in the same group will be assigned a common group ID. The output of rwgroup is a stream of ow records, where each records next-hop-ip eld is set to the value of the group ID. It is important to note that rwgroup requires input records to be sorted by the elds used for grouping. 86
The most basic use of rwgroup is to group together ow records that comprise parts of a single longer session, such as the components of a single FTP session (or, in the case of Example 4-35, an IMAP session over TLS). To do so, the example sorts data on IPs and ports, and then groups together ow records that have closely related times. Note that the example uses rwsort to sort all of the elds that are specied to rwgroup.
<1>$ rwfilter --type=in,out --start-date=2010/08/30:13 --packets=4- \ --end-date=2010/08/30:16 --proto=6 --bytes-per=60- --pass=stdout |\ rwsort --fields=1,2,3,4,9 >sorted.raw rwsort: Warning: Using default temporary directory /tmp <2>$ rwgroup --id-field=1,2,3,4 --delta-field=9 --delta-value=3600 \ <sorted.raw >grouped.raw <3>$ rwfilter grouped.raw --next-hop-id=0.213.254.180 --pass=stdout |\ rwcut --fields=1-4,8,9 sIP| dIP|sPort|dPort| flags| sTime| 10.0.0.1| 10.0.0.2| 993|35483| S PA |2010/08/30T13:46:36.252| 10.0.0.1| 10.0.0.2| 993|35483| PA |2010/08/30T13:50:30.068| 10.0.0.1| 10.0.0.2| 993|35483| PA |2010/08/30T13:55:30.963| 10.0.0.1| 10.0.0.2| 993|35483| PA |2010/08/30T14:00:30.030| 10.0.0.1| 10.0.0.2| 993|35483| PA |2010/08/30T14:10:30.020| 10.0.0.1| 10.0.0.2| 993|35483| PA |2010/08/30T14:20:29.997| 10.0.0.1| 10.0.0.2| 993|35483| PA |2010/08/30T14:30:30.021| 10.0.0.1| 10.0.0.2| 993|35483| PA |2010/08/30T14:35:30.024| 10.0.0.1| 10.0.0.2| 993|35483| PA |2010/08/30T14:40:30.037| 10.0.0.1| 10.0.0.2| 993|35483| PA |2010/08/30T14:45:30.036| Example 4-35: rwgroup to Group Flows of a Long Session rwgroup, by default, produces one ow record for every ow record it receives. Selective record production can be specied for rwgroup by using the --rec-threshold and --summarize switches, as shown in Example 436. Using the --rec-threshold switch species that rwgroup will only pass records belong to a group with at least as many records as given in --rec-threshold.
87
<1>$ rwsort --fields=2,9 sorted.raw | \ rwgroup --id-field=2 --delta-field=9 --delta-value=3600 | \ rwcut --num-recs=5 --field=1-5,15 rwsort: Warning: Using default temporary directory /tmp sIP| dIP|sPort|dPort|pro| nhIP| 10.0.0.1| 10.0.0.2|42172| 4500| 6| 0.0.0.1| 10.0.0.4| 10.0.0.2|39992| 4500| 6| 0.0.0.1| 10.0.0.5| 10.0.0.3|46987| 4500| 6| 0.0.0.2| 10.0.0.5| 10.0.0.3|52514| 4500| 6| 0.0.0.2| 10.0.0.5| 10.0.0.3|51153| 4500| 6| 0.0.0.2| <2>$ rwsort --fields=2,9 sorted.raw | \ rwgroup --id-field=2 --delta-field=9 --delta-value=3600 --rec-threshold=30 | \ rwcut --num-recs=5 --field=1-5,15 rwsort: Warning: Using default temporary directory /tmp sIP| dIP|sPort|dPort|pro| nhIP| 10.0.0.1| 10.0.0.6| 993|50804| 6| 0.0.1.3| 10.0.0.1| 10.0.0.6| 993|50805| 6| 0.0.1.3| 10.0.0.1| 10.0.0.6| 993|50809| 6| 0.0.1.3| 10.0.0.1| 10.0.0.6| 993|50810| 6| 0.0.1.3| 10.0.0.1| 10.0.0.6| 993|50814| 6| 0.0.1.3| Example 4-36: rwgroup --rec-threshold to Drop Trivial Groups Example 4-36 shows how thresholding works. In the rst case, there are two groups: 0.0.0.1, and 0.0.0.2. When rwgroup is invoked again, both groups are discarded by rwgroup, while the rst group with 30 or more ow records is output. rwgroup can also generate summary records using the --summarize switch. When this switch is used, rwgroup will only produce a single record for each group; this record will use the rst record in the groups addressing information (IP addresses, ports and protocol) for its addressing information. The total number of bytes and packets for the group will be recorded in the summary records corresponding eld, and the start and end time for the record will be the extrema for that group. Example 4-37 shows how summarizing works. As this example shows, the 5 original records are reduced to 2 group summaries, and the byte totals for those records are equal to the sum of the byte values of all the records in the group.
88
<1>$ rwgroup --id-field=1,2,3,4 --delta-field=9 --delta-value=3600 \ --rec-threshold=3 <sorted.raw | rwcut --fields=1-7,nhIP --num-recs=5 sIP| dIP|sPort|dPort|pro| packets| bytes| nhIP| 10.0.0.1| 10.0.0.2| 2733| 22| 6| 15080| 1523787| 0.0.0.1| 10.0.0.1| 10.0.0.2| 2733| 22| 6| 11412| 1156000| 0.0.0.1| 10.0.0.1| 10.0.0.2| 2733| 22| 6| 12310| 1252735| 0.0.0.1| 10.0.0.1| 10.0.0.2| 2733| 22| 6| 8968| 913755| 0.0.0.1| 10.0.0.4| 10.0.0.2| 1766| 22| 6| 5006| 522032| 0.0.0.2| <2>$ rwgroup --id-field=1,2,3,4 --delta-field=9 --delta-value=3600 \ --rec-thres=3 --summarize <sorted.raw | rwcut --fields=1-7,nhIP --num-recs=5 sIP| dIP|sPort|dPort|pro| packets| bytes| nhIP| 10.0.0.1| 10.0.0.2| 2733| 22| 6| 47770| 4846277| 0.0.0.1| 10.0.0.4| 10.0.0.2| 1766| 22| 6| 19529| 2050730| 0.0.0.2| 10.0.0.6| 10.0.0.7| 901|15029| 6| 1925| 1050022| 0.0.0.3| 10.0.0.9| 10.0.0.10|18410| 25| 6| 254| 368818| 0.0.0.4| 10.0.0.9| 10.0.0.10|37167| 25| 6| 65| 87453| 0.0.0.5| Example 4-37: rwgroup --summarize For any data le, calling rwgroup with the same --id-field and --delta-field values will result in the same group IDs assigned to the same records. As a result, an analyst can use rwgroup to manipulate groups of ow records where the group has some specic attribute. This can be done by using rwgroup and IP sets. First, as shown in Example 4-38 Command 1, the analysis sorts the data and uses rwgroup to convert the results into a le, out.rwf grouped as FTP communications between two sites. All TCP port 20 and 21 communications between two sites are part of the same group. Then (in Command 2) the analysis lters through the collection of groups for those group IDs (as next hop IPs stored in control.set) that use FTP control. Finally (in Command 3), the analysis uses that next-hop-IP set to pull out all of the groups that had FTP control.
89
<1>$ rwfilter sorted.raw --dport=20,21 --pass=stdout \ | rwsort --field=1,2,3,4,9 \ | rwgroup --id-field=1,2 > out.rwf rwsort: Warning: Using default temporary directory /tmp <2>$ rwfilter out.rwf --dport=21 --pass=stdout \ | rwset --nhip-file=control.set <3>$ rwfilter out.rwf --nhipset=control.set --pass=stdout \ | rwcut --fields=1-5,9 --num-recs=5 sIP| dIP|sPort|dPort|pro| sTime| 10.0.0.1| 10.0.0.2|59841| 21| 6|2010/08/30T14:49:07.047| 10.0.0.1| 10.0.0.2|60031| 21| 6|2010/08/30T14:53:39.366| 10.0.0.3| 10.0.0.4|19041| 21| 6|2010/08/30T14:35:40.885| 10.0.0.5| 10.0.0.6| 1392| 21| 6|2010/08/30T13:56:03.271| 10.0.0.5| 10.0.0.6| 1394| 21| 6|2010/08/30T13:56:04.657| Example 4-38: Using rwgroup to Identify Specic Sessions
4.7.2
Labeling Matched Groups with rwmatch
rwmatch creates matched groups, where a matched group consists of an initial record (a query) followed by one or more responses. (The calling syntax and some common options to rwmatch are shown in Figure 4.17.) A response is a record that is related to the query (as specied in the rwmatch invocation) but is collected dierent direction or from a dierent router. As a result, the elds relating the two records may be dierent: for example, the source IP address in one record may match the destination IP address in another record.
rwmatch
Description Match ow records that have stimulus-response relationships Call rwmatch --relate=1,2 --relate=2,1 query response output Parameters --relate=RELATE-FIELD Specify elds that identify stimulus and response --time-delta=DELTA Identify how long may separate stimulus and response
Figure 4.17: Summary of rwmatch The most basic use of rwmatch is to group records into both sides of a bidirectional session, such as a Hypertext Transfer Protocol (HTTP) request. However, rwmatch is capable of more exible matching, such as across protocols to identify traceroute messages. A relationship in rwmatch is established using the --relate switch, which takes two eld ids separated by a comma (e.g., --relate=2,4 or --relate=6,9); the rst value corresponds to the eld id in the query le and the second value corresponds to the eld id for the response le. For example, --relate=1,2 states that the source IP for the query le matches the destination IP for the response le. The rwmatch tool will process multiple relationships, but each eld in the query eld can be related to at most one eld in the response le. --relate always species a relationship from the query to the responses, so specifying --relate=1,2 means that the records match if the source IP in the query record matches the destination IP in the response. Consequently, when working with a protocol where there are implicit relationships between the queries and 90
response, especially TCP, these relationships must be fully specied. Example 4-39 shows the impact that not specifying all the elds has on TCP data. Note that the match relationship specied (querys source IP matches responses destination IP) results in all the records in the response matching the initial query record, even though the source IP addresses in the response le may dier from the querys destination IP address.
<1>$ rwfilter sorted.raw --saddress=10.0.0.1 --proto=6 --dport=25 \ --pass=query.raw <2>$ rwfilter sorted.raw --daddress=10.0.0.1 --proto=6 --sport=25 \ --pass=response.raw <3>$ rwcut --fields=1-4,9 --num-recs=4 query.raw sIP| dIP|sPort|dPort| sTime| 10.0.0.1| 10.0.0.2|19226| 25|2010/08/30T15:45:30.389| 10.0.0.1| 10.0.0.3|10213| 25|2010/08/30T14:05:21.421| 10.0.0.1| 10.0.0.3|11328| 25|2010/08/30T14:07:18.207| 10.0.0.1| 10.0.0.3|13645| 25|2010/08/30T14:11:36.493| <4>$ rwcut --fields=1-4,9 --num-recs=4 response.raw sIP| dIP|sPort|dPort| sTime| 10.0.0.2| 10.0.0.1| 25|19226|2010/08/30T15:45:30.262| 10.0.0.3| 10.0.0.1| 25|10213|2010/08/30T14:05:21.297| 10.0.0.3| 10.0.0.1| 25|11328|2010/08/30T14:07:18.079| 10.0.0.3| 10.0.0.1| 25|13645|2010/08/30T14:11:36.301| <5>$ rwmatch --relate=1,2 query.raw response.raw stdout | \ rwcut --fields=1-4,9,nhIP --num-recs=5 sIP| dIP|sPort|dPort| sTime| nhIP| 10.0.0.1| 10.0.0.2|10142| 25|2010/08/30T16:57:59.265| 0.0.0.1| 10.0.0.1| 10.0.0.2|10701| 25|2010/08/30T15:30:07.405| 0.0.0.1| 10.0.0.2| 10.0.0.1| 25|10188|2010/08/30T16:58:03.534| 255.0.0.1| 10.0.0.1| 10.0.0.2|10801| 25|2010/08/30T16:59:09.856| 0.0.0.2| 10.0.0.1| 10.0.0.2|11315| 25|2010/08/30T14:07:16.885| 0.0.0.2| Example 4-39: rwmatch With Incomplete ID Values Example 4-40 shows the relationships that should be specied when working with TCP. this example species a relationship between the querys source IP and the responses destination IP, the querys source port and the responses destination port, and then the reexive relationships between query and response. Note that these relationships are explicitly specied in TCP; port assignment in UDP is specied by the service and will consequently vary within UDP services. rwmatch is designed to handle all of these cases.
<1>$ rwmatch --relate=1,2 --relate=2,1 --relate=3,4 --relate=4,3 \ query.raw response.raw stdout | rwcut --fields=1-4,9,nhIP --num-recs=5 sIP| dIP|sPort|dPort| sTime| nhIP| 10.0.0.1| 10.0.0.2|10701| 25|2010/08/30T15:30:07.405| 0.0.0.1| 10.0.0.2| 10.0.0.1| 25|10701|2010/08/30T15:30:07.488| 255.0.0.1| 10.0.0.1| 10.0.0.2|10801| 25|2010/08/30T16:59:09.856| 0.0.0.2| 10.0.0.2| 10.0.0.1| 25|10801|2010/08/30T16:59:09.955| 255.0.0.2| 10.0.0.1| 10.0.0.2|11315| 25|2010/08/30T14:07:16.885| 0.0.0.3| Example 4-40: rwmatch With Full TCP Fields
91
Two records are considered related if all of their related elds are equal and their start times match within a value specied by --time-delta. If no time delta is specied, rwmatch will default to a 30 second time delta. As with rwgroup, rwmatch annotates the next-hop-IP eld with an identier common to all common ow records. However, rwmatch groups records from two distinct les into single groups. To indicate the origin of a record, rwmatch uses dierent values in the next hop IP eld. Query records will have an IP address where the rst octet is set to 0, the response records will have their rst octet set to 255. rwmatch only outputs queries that have a response, and all the responses to that query. Queries that do not have a response, and responses that do not have a query will be discarded. As a result, rwmatchs output is usually fewer records than the total of the two source les. rwmatch is also greedy; the rst query record that matches a group of responses is considered the only query record for those responses - this eect is seen in Example 4-39, where the other three ow records in the query le are discarded. rwgroup can be used to compensate for this by merging all the records for a single session into one record. A simple use of rwmatch is to link together both sides of a TCP session. To do so, rst generate two les containing the data to be matched: In this case, note that the source and destination ports are opposed. The data is then sorted by time and the corresponding elds and stored in two les: initiator.rwf and responder.rwf. Note that, Example 4-40 matches all addresses and ports in both directions and sorts on time; as with rwgroup, rwmatch requires sorted data, and in the case of rwmatch, there is always an implicit time-based relationship controlled using the --time-delta switch. As a consequence, always sort rwmatch data on the start time. (Example 4-39 generated the query and response les from an already-sorted le.)
<1>$ rwfilter --proto=6 --dport=25 --pass=stdout --type=out \ | rwsort --field=1,2,9 > initiator.rwf <2>$ rwfilter --proto=6 --sport=25 --pass=stdout --type=in \ | rwsort --field=2,1,9 > responder.rwf <3>$ rwmatch --relate=1,2 --relate=2,1 --relate=3,4 --relate=4,3 \ initiator.rwf responder.rwf result.rwf Example 4-41: rwmatch for Mating TCP Sessions rwmatch can also be used to match relationships across protocols. For example, traceroutes from UNIX hosts are generally initiated by a UDP call to port 33434 and followed by an ICMP TTL expired response message (type 11, code 0). A le of traces can then be composed by matching the ICMP responses to the UDP source as shown in Example 4-42.
<1>$ rwfilter --proto=17 --dport=33434 --pass=stdout \ | rwsort --field=1,9 > queries.rwf <2>$ rwfilter --proto=1 --icmp-type=11 --icmp-code=0 --pass=stdout \ | rwsort --field=2,9 > responses.rwf <3>$ rwmatch --relate=1,2 queries.rwf responses.rwf traces.rwf Example 4-42: rwmatch for Mating Traceroutes
4.8
Adding IP Attributes with Prex Maps
Sometimes it becomes necessary to associate a specic value to a range of IP addresses, and lter or sort on the value rather than the address. One popular example is country codes: a common requirement would be 92
rwpmapbuild
Description Call Creates a prex map from a text le rwpmapbuild --inputfile=sample.pmap.txt --output-file=sample.pmap Parameters --input-file Specify the text le that contains the mapping between addresses and prexes --output-file File to create as the binary prex map le
Figure 4.18: Summary of rwpmapbuild to examine ow records associated with specic countries. An arbitrary association of addresses to labels is known as a prex map.
4.8.1
What are Prex Maps?
Prex maps, or pmaps, dene an association between IP address ranges and text labels. Where IP sets perform a binary association between an address and a value (an address is either in the set or not in the set), the prex map more generally assigns dierent values to many dierent address ranges. These arbitrary attributes can then be used in sorting, printing and ltering ow records. In order to use prex maps, the map le itself must rst be created. This is done by compiling a text-based mapping le containing the mapping of addresses and their labels. The pmap le can then be used by rwfilter, rwcut, rwsort and rwuniq. This example of the use of prex maps shows how to build and use a map of suspected spyware distribution hosts.
4.8.2
Creating a Prex Map
Binary prex maps are created from text les using the rwpmapbuild utility. Each line of the text le has an address value, space or tab characters as separator, and the label to be assigned to the address value. An address value can be a single address, a range specied by a pair of addresses with a space between them, or a CIDR block. The text le also supports a default value, specied by a line default deabel, to be applied when no address matches the range. The common options are summarized in Figure 4.18. Example 4-43 shows how to create the spyware prex map.
93
<1>$ cat <<END_FILE >spyware.pmap.txt default None # Spyware related address ranges 64.94.137.0/24 180solutions 205.205.86.0/24 180solutions 209.247.255.0/24 Alexa 209.237.237.0/24 Alexa 209.237.238.0/24 Alexa 216.52.17.12/32 BargainBuddy 198.65.220.221/32 Comet Cursor 64.94.162.0 64.94.162.100 Comet Cursor 64.94.89.0/27 Gator 64.162.206.0/25 Gator 82.137.0.0/16 Searchbar END_FILE <2>$ rwpmapbuild --input-file=spyware.pmap.txt \ --output-file=spyware.pmap Example 4-43: rwpmapbuild to Create a Spyware Pmap File
4.8.3
Selecting Flow Records with rwfilter and Prex Maps
There are three pmap parameters to rwlter. --pmap-file species the compiled prex map le to use. --pmap-saddress and --pmap-daddress specify the set of labels used for ltering records. Suppose the analyst wants to retrieve a sample collection of ow records to just those that come from spyware hosts. This can be done using rwfilter with the options shown in Example 4-44.
<1>$ rwfilter --type=in,inweb --start-date=2010/08/30:13 \ --end-date=2010/08/30:22 --proto=6 --pass=stdout |\ rwfilter --input-pipe=stdin --pmap-file=spyware.pmap \ --pmap-saddress=None --fail=spyware.raw <2>$ rwfileinfo --fields=count-records spyware.raw spyware.raw: count-records 4907 Example 4-44: rwfilter --pmap-saddress For common separation of addresses into specic types, normally internal vs. external, a special pmap le may be built in the share directory underneath the SiLK install directory. This le, address_types.pmap, is created from a list of CIDR blocks, each labeled internal, external, or non-routable. This pmap can then be used in an rwlter query via the --stype or --dtype parameters, and used for record display via rwcut with a --fields parameter that includes stype, dtype, 17, or 18. A value of 0 indicates non-routable, 1 is internal, 2 is external. The default value is external. 94
4.8.4
Working with Prex Values Using rwcut and rwuniq
In order to display the actual value of a prex, rwcut can be used with the --pmap-file parameter; this adds the sval and dval for arguments to --fields. These elds report the prex associated with the source and destination IP addresses. Example 4-45 shows how to print out the type of spyware associated with an outbound ow record.
<1>$ rwcut --pmap-file=spyware.pmap --fields=sval,sport,dip,dport,stime \ --num-recs=5 spyware.raw sval|sPort| dIP|dPort| sTime| 180solutions| 80| 10.0.0.1| 1132|2010/08/30T14:00:44.091| 180solutions| 80| 10.0.0.1| 1137|2010/08/30T14:00:19.457| 180solutions| 80| 10.0.0.2| 2746|2010/08/30T13:02:51.932| Searchbar| 3406| 10.0.0.3| 25|2010/08/30T16:02:12.258| 180solutions| 80| 10.0.0.2| 2746|2010/08/30T13:02:54.901| Example 4-45: rwcut --pmap-file and sval Field The tools rwsort and rwuniq also work with prex maps. Options are the same as for rwcut, and perform sorting and counting operations as expected. Examples 4-46 and 4-47 demonstrate using these two tools with prex maps.
<1>$ rwsort spyware.raw --pmap-file=spyware.pmap --fields=sval,bytes |\ rwcut --pmap-file=spyware.pmap --fields=sval,sport,dcc --num-recs=5 rwsort: Warning: Using default temporary directory /tmp sval|sPort|dcc| 180solutions| 80| us| 180solutions| 80| us| 180solutions| 80| us| 180solutions| 80| us| 180solutions| 80| us| Example 4-46: Using rwsort to sort ow records associated with types of spyware
<1>$ rwuniq spyware.raw --pmap-file=spyware.pmap --fields=sval,dport \ --flows --dip-distinct | head -5 rwuniq: Warning: Using default temporary directory /tmp sval|dPort| Records|Unique_DIP| 180solutions| 1792| 4| 1| Searchbar| 3072| 6| 6| Searchbar|32512| 4| 1| 180solutions|64000| 2| 1| Example 4-47: Using rwuniq to Count The Number of Flows Associated With Specic Types of Spyware
95
rwip2cc
Description Shows country code associated with IP address Call rwip2cc --address=10.1.2.3 Parameters --map-file Specify the pmap that contains the mapping between addresses and country codes --address Address for which country code is desired --input-file File holding list of addresses for which country code is desired --print-ips Control whether output will contain the IP address as well as the country code (--print-ips=1, which is the default) or only the country code (--print-ips=0)
Figure 4.19: Summary of rwip2cc
4.8.5
Using a Country-Code Mapping via rwip2cc
rwip2cc uses the prex map library to associate countries with IP addresses. The pmap le to be used with this tool must be in a specic format a general pmap le will not work. More information on how to get this pmap le is found in Section 5 of the SiLK Installation Handbook. Figure 4.19 shows a summary of this command. Example 4-48 shows an example of its use.
<1>$ rwip2cc --address=10.1.2.3 10.1.2.3|--| <2>$ cat <<END_FILE >ips_to_find 192.88.209.244 128.2.10.163 127.0.0.1 END_FILE <3>$ rwip2cc --input-file=ips_to_find 192.88.209.244|us| 128.2.10.163|us| 127.0.0.1|--| <4>$ rwip2cc --input-file=ips_to_find --print-ips=0 us us -Example 4-48: rwip2cc for Looking Up Country Codes
4.8.6
Where to Go for More Information on Prex Maps
Prex maps are an evolving part of the SiLK tool suite. The on-line documentation will have the latest information for the current version. Documentation is available through man pages (man rwpmapbuild and man libpmapfilter). If you build useful maps in the course of your work, or nd useful references for pmap information, please feel free to share them with us via email to netsa-contact@cert.org. 96
4.9
Gaining More Features with Plug-Ins
The SiLK tool suite is constantly expanding, with new tools and new features being added frequently. One of the ways that new features are being added is via dynamic library plug-ins for various tools. Table 4.1 provides a list of the current plug-ins distributed with SiLK. Example 4-49 shows the use of a plug-in, in this case cutmatch.so. Once the plug-in is invoked using the --plugin parameter, it denes a match eld, which formats the rwmatch results as shown.
<1>$ rwcut matched.raw --plugin=cutmatch.so --fields=1,3,match,2,4,5 sIP|sPort| <->Match#| dIP|dPort|pro| 192.168.251.79|49636|-> 1| 10.10.10.65| 80| 6| 10.10.10.65| 80|<1| 192.168.251.79|49636| 6| 192.168.251.79|49637|-> 2| 10.10.10.65| 80| 6| 10.10.10.65| 80|<2| 192.168.251.79|49637| 6| Example 4-49: rwcut ----plugin=cutmatch.so to Use a Plug-in Further documentation on these plug-ins is found in Section 3 of The SiLK Reference Guide. (http:// tools.netsa.cert.org/silk/reference-guide.html) Table 4.1: Current SiLK Plug-ins Description Lets rwcut present the rwmatch results in an easier-to-follow manner as Match eld Provides bytes/packet, bytes/second and packets/second elds to rwcut, rwsort, and rwuniq; adds --bytes-per-second and --packets-per-second parameters to rwfilter. Allows minimum bytes/packet ltering with rwptoflow
Name cutmatch.so owrate.so
rwp2f minbytes.so
97
98
Chapter 5
Using PySiLK For Advanced Analysis

PySiLK is an extension to the SiLK tool suite that allows additional functionality expressed via scripts written in Python. This chapter presents how to use PySiLK scripts to gain this additional functionality, but does not discuss the issues involved in composing new PySiLK scripts or to code in Python. Several example scripts are shown, but the detailed design of each script will not be presented here. A brief guide to coding PySiLK plug-ins is found in the silkpython manual page (found on-line as man silkpython and in Section 3 of The Silk Reference Guide http://tools.netsa.cert.org/silk/reference-guide.pdf. Detailed description of the PySiLK structures is found in PySiLK: SiLK in Python (http://tools.netsa. cert.org/silk/pysilk.pdf). Generic programming in Python is described in many locations on the World Wide Web, with numerous resources available on the Python ocial web site (www.python.org). For some larger PySiLK examples, see the PySiLK tooltips page (https://tools.netsa.cert.org/wiki/display/ tt/Writing+PySiLK+scripts) Generally, to access PySiLK, both the appropriate version of Python and the PySiLK library must be loaded on your system. Contact your system administrator to verify this. In general, the PYTHONPATH environment variable must be set to the directory containing the PySiLK library.
5.1
rwfilter and PySiLK
For a single execution, PySiLK is much slower than using a series of rwfilter parameters, and somewhat slower than using a plug-in. However, there are several ways in which using PySiLK can replace a series of several rwfilter execution with a single execution, which will speed up the overall process; for analyses that will not be repeated often, or that are expected to evolve over time, PySiLK is an ecient alternative. The specic cases where PySiLK is useful for rwfilter include: Some information from prior records may help in partitioning future records for pass or fail. A series of alternatives form the partitioning condition. The partitioning condition employs a control or data structure. For an example of where some information (or state) from prior records may help in partitioning future records, consider Example 5-1. This script (ThreeOrMore.py) passes all records that have a source IP address that has occurred on two or more prior records. This can be useful if you want to eliminate casual or inconsistent sources of particular behavior. The StateBuffer variable is the record of how many times 99
each source IP has been seen in prior records. The rwfilter function holds the Python code to partition the records. If it determines the record should be passed, it returns True, and otherwise returns False.
import silk StateBuffer={} def rwfilter(rec): global StateBuffer val = rec.sip # change this to count on different field bound = 3 # change this to set the threshold higher or lower if val in StateBuffer: StateBuffer[val] = StateBuffer[val]+1 if StateBuffer[val] >= bound: return True else: StateBuffer[val] = 1 return False register_filter(rwfilter) Example 5-1: ThreeOrMore.py: Using PySiLK for Memory in rwfilter partitioning One could use a PySiLK script with rwfilter by rst having a call to rwfilter that retrieves the records that satisfy a given set of conditions, and then pipe those records to a second execution of rwfilter that uses the --python-file parameter to invoke the script. This is shown in Example 5-2. This syntax is preferred to simply including the --python-file parameter on the rst call since its behavior is more consistent across execution environments. If rwfilter is running on a multiprocessor conguration, running the script on the rst rwfilter call cannot be guaranteed to behave consistently for a variety of reasons. So running PySiLK scripts via a piped rwfilter call is more consistent.
<1>$ rwfilter --type=in --start-date=2010/08/27:13 --end-date=2010/08/27:22 \ --proto=6 --dport=25 --bytes-per=65- --packets=4- --flags-all=SAF/SAF,SAR/SAR \ --pass=stdout | \ rwfilter --input-pipe=stdin --python-file=ThreeOrMore.py --pass=email.raw Example 5-2: Calling ThreeOrMore.py Example 5-3 shows an example of using PySiLK to lter for a condition with several alternatives. This code is designed to identify VPN trac in the data, either using IPSEC or OpenVPN or VPNz. This involves having several alternatives, each matching trac for various protocols and ports. This could be done using a pair of rwfilter calls, one for UDP and one for AH or ESP, and then using rwcat to put them together, but this is less ecient than using PySiLK.
100
import silk def rwfilter(rec): if rec.protocol == 17: if (rec.dport == 500) or (rec.sport == 500) or (rec.dport == 1194) or \ (rec.sport == 1194) or (rec.sport == 1224) or (rec.dport == 1224): return True if (rec.protocol == 50) or (rec.protocol == 51): return True return False register_filter(rwfilter) Example 5-3: vpn.py: Using PySiLK with rwfilter for Partitioning Alternatives Example 5-4 shows the use of a data structure in a rwfilter condition. In this particular case, internal IP addresses are being contacted by IP addresses in external blocks, and we wish to identify any responses to these contacts. The diculty is that the response is unlikely to go back to the contacting address, and likely instead to go to another address on the same network. Matching this with conventional rwfilter parameters is very slow and repetitive. But if we can build up a list of internal IPs and the networks theyve been contacted by, we can then lter based on this list using the PySiLK script in Example 5-4, which we will refer to as matchblock.py. The rst block of code (before the rwlter function denition) loads the list from a le. The rwlter function then progressively matches against this list.
101
import sys import os import silk blockname=blocks.csv def do_blockname(block_str) global blockfile, blockname try: blockname=block_str blockfile=file(blockname,r) except: print cannot open +blockname sys.exit(1) load_blockdict() def load_blockdict() global blockdict, blockfile, blockname blockdict=dict() for line in blockfile: fields=line[:-1].strip().split(,) if len(fields)<2: continue try: idx = IPAddr(fields[0].strip()) if idx in blockdict: blockdict[idx].append(IPWildcard(fields[1].strip())) else: blockdict[idx]=list([IPWildcard(fields[1].strip())]) except: continue blockfile.close() def rwfilter(rec): global blockdict if (rec.dip in blockdict): for pattern in blockdict[rec.dip]: if rec.sip in pattern: return True return False try: blockfile=file(blockname,r) load_blockdict() except: continue register_filter(rwfilter) register_switch("blockfile", handler=do_blockname, help="Name of file that holds CSV block map. Def blocks.csv") Example 5-4: matchblock.py: Using PySiLK with rwfilter for Structured Conditions
102
The example here uses command-line parameters to pass information to the script (specically the name of the le holding the block map). Example 5-5, creates (Command 1) a le in the form that the script expects. Command 2 then invokes the script using the syntax introduced previously, augmented by the new parameter, and Command 3 displays the results.
<1>$ cat <<END_FILE >blockfile.csv 10.0.0.1,10.1.0.0/16 10.0.0.2,10.2.1.0/24 END_FILE <2>$ rwfilter --type=out --start-date=2010/08/30:12 \ --end-date=2010/08/30:14 --proto=6 --dport=25 --pass=stdout | \ rwfilter --input-pipe=stdin --python-file=matchblock.py \ --blockfile=blockfile.csv --pass=out.raw <3>$ rwcut --num-recs=1 --fields=1-6 out.raw sIP| dIP|sPort|dPort|pro| packets| 10.0.0.1| 10.1.0.4|41935| 25| 6| 8| Example 5-5: Calling matchblock.py
5.2
rwcut, rwsort, and PySiLK
Two specic cases where PySiLK is useful with rwcut and rwsort are: 1. Where you want to use a value based on a combination of elds, possibly from a number of records. 2. Where what you want to use is a function on one or more elds, possibly conditioned by the value of one or more elds. Example 5-6 shows use of PySiLK to calculate a value from the same eld of two dierent records, to provide a new column to display with rwcut. In this particular case, which will be referred to as delta.py, it introduces a delta column, with the dierence between the start time of two successive records. There are a number of interesting uses for this, including ready identication of ows that occur at very stable intervals, such as keep-alive trac or beaconing. The plug-in uses a global to save the start time between records, then returns to rwcut the number of seconds (to the millisecond) between start times. The register plugin field call allows the use of delta as a new eld name, and gives rwcut the information that it needs to process the new eld.
103
import silk last_time = None def output_pps (rec): global last_time if last_time == None: last_time = rec.stime_epoch_secs rslt = "%15.3f" % (rec.stime_epoch_secs - last_time) last_time = rec.stime_epoch_secs return rslt register_field ("delta", column_width=15, rec_to_text=output_pps) Example 5-6: delta.py: Using PySiLK with rwcut to Display Combined Fields To use delta.py, it is necessary to sort the ow records after pulling them from the repository. After sorting, the example passes them to rwcut with the --python-file=delta.py parameter before the --fields parameter, so that the delta eld name is dened. The results are shown, with the negative value showing the start of records with a dierent source IP address.
<1>$ rwfilter --type=out --start-date=2010/08/30:00 \ --end-date=2010/08/30:00 --proto=17 --packets=1 --pass=stdout |\ rwsort --fields=sip,dip,stime | \ rwcut --python-file=delta.py --fields=sip,dip,stime,delta --num-recs=5 rwsort: Warning: Using default temporary directory /tmp sIP| dIP| sTime| delta| 10.0.0.1| 10.0.0.2|2010/08/30T00:07:22.585| 0.000| 10.0.0.1| 10.0.0.3|2010/08/30T00:08:21.563| 58.978| 10.0.0.1| 10.0.0.4|2010/08/30T00:36:39.778| 1698.215| 10.0.0.8| 10.0.0.5|2010/08/30T00:17:44.752| -1135.026| 10.0.0.8| 10.0.0.6|2010/08/30T00:25:31.038| 466.286| Example 5-7: Calling delta.py Example 5-8 shows the use of a PySiLK plug-in for both rwsort and rwcut, which supplies a value which is a combination of several elds of a single record. In this example, the new value is the number of bytes of payload conveyed by the ow. The number of bytes of header depends on the protocol being used (IP has a 20-byte header, and TCP adds 20 further bytes, while UDP adds only 8 and ICMP only 4, etc.). The header len variable holds a map between protocol number and header length. This is then multiplied by the number of packets and subtracted o of the overall length. (This code assumes no packet fragmentation is occurring.) The same function is used both to produce a value for rwsort to compare and to produce a value for rwcut to display, as indicated by the register plugin field call.
104
import silk header_len={1:24, 2:28, 6:40, 17:28, 41:40, 46:28, 47:19, 50:28, 51:32, \ 28:24, 132:32} def bin_payload(rec): global header_len if rec.protocol in header_len: return (rec.bytes-(header_len[rec.protocol]*rec.packets)) else: return (rec.bytes-(20*rec.packets)) register_int_field("payload", bin_payload, 0, (1<<32 - 1), 12) Example 5-8: payload.py: Using PySiLK for Conditional Fields With rwsort and rwcut Example 5-9 shows how to use Example 5-8 with both rwsort and rwcut. The records are sorted into payload-size order, then output showing both the bytes and payload values.
<1>$ rwfilter --type=in --start-date=2010/08/30:03 \ --end-date=2010/08/30:03 --proto=0-255 --pass=tmp.raw <2>$ rwsort tmp.raw --python-file=payload.py --fields=payload,1,2 \ | rwcut --python-file=payload.py --fields=1,2,5,bytes,payload --num-recs=10 rwsort: Warning: Using default temporary directory /tmp sIP| dIP|pro| bytes| payload| 10.0.0.1| 10.0.0.2| 6| 40| 0| 10.0.0.3| 10.0.0.4| 6| 168| 8| 10.0.0.5| 10.0.0.6| 1| 56| 32| 10.0.0.7| 10.0.0.8| 50| 68| 40| 10.0.0.7| 10.0.0.8| 50| 68| 40| 10.0.0.9| 10.0.0.4| 17| 75| 47| 10.0.0.10| 10.0.0.4| 17| 78| 50| 10.0.0.11| 10.0.0.12| 1| 112| 64| 10.0.0.13| 10.0.0.14| 47| 327649| 277470| 10.0.0.15| 10.0.0.16| 47| 1176474| 1055463| Example 5-9: Calling payload.py As has been shown in this chapter, PySiLK simplies several previously-dicult analyses, without requiring coding large scripts. While the programming involved in creating these scripts has not be described here, it is hoped that the scripts shown (or simple modications of these scripts) will prove useful to you as an analyst.
105
106
Chapter 6
Closing
This handbook has been designed to provide an overview of data analysis with SiLK on an enterprise network. This overview has included the denition of network ow data, the collection of that data on the enterprise network, and the analysis of that data using the SiLK tool suite. We concluded with a discussion on how to extend the SiLK tool suite to support additional analyses. At this point, you are quailed to conduct analyses using the SiLK tools in whatever fashion you see t. This handbook provides a large group of analyses in the examples, but these examples are only a small part of the set of analyses that SiLK can support. We hope that you will contribute to the SiLK community by developing new analytical approaches and providing new insights into how analysis should be done. Good luck!
107

SILK Analysis Handbook

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

SILK Analysis Handbook

Hochgeladen von

Copyright:

Verfügbare Formate

Pittsburgh, PA 15213-3890

Using SiLK for Network Trac Analysis ANALYSTS HANDBOOK

CERT R Network Situational Awareness Group

Current SiLK Plug-ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-1 5-2 5-3 5-4 5-5 5-6 5-7 5-8 5-9

100 100 101 102 103 104 104 105 105

Networking Primer and Review of UNIX Skills

TCP/IP Networking Primer

Structure of the IP Header

IP Addressing and Routing

Figure 1.4: TCP State Machine

Figure 1.5: UDP and ICMP Headers

Review of UNIX Skills

Using the UNIX Command Line

The SiLK Flow Repository

What Is Network Flow Data?

Structure of a Flow Record

Flow Generation and Collection

Introduction to Flow Collection

Figure 2.1: From Packets to Flows

Where Network Flow Data Is Collected

Types of Enterprise Network Trac

The Collection System and Data Management

How Network-Flow Data Is Organized

Essential SiLK Tools

Selecting Records with rwfilter

INPUT PARAMETERS --print- lenames

--class --type --sensor -- owtypes

FILE PARTITIONING PARAMETERS SELECTION PARAMETERS

OUTPUT PARAMETERS PIPE

FILE OTHER PARAMETERS

Parameter --input-pipe --data-rootdir --xargs

Example stdin /data mylist.txt inle.raw

Parameter --start-date --end-date --class --type --flowtypes --sensor

Table 3.3: Example 6 1-3 R/SRF

Parameter --pass --fail --all-dest --print-stat --print-vol --max-pass

Parameter --dry-run --help --print-filenames --print-missing --version --threads --ip-version

Finding Low-Packet Flows with rwfilter

Using IPv6 with rwfilter

Using Pipes with rwfilter

Translating Signatures Into rwfilter Calls

rwfilter and Tuple Files

Describing Flows with rwstats

Creating Time Series with rwcount

Figure 3.4: Summary of rwcount

Examining Trac Over a Month

Counting by Bytes, Packets, and Flows

Changing the Format of Data

Figure 3.6: Focusing gnuplot Output on a Single Hour

Figure 3.7: Improved gnuplot Output Based on a Larger Bin Size

10000000 Bytes Records 1000000

Figure 3.8: Comparison of Byte and Record Counts over Time

Using the --load-scheme Parameter for Dierent Approximations

Displaying Flow Records Using rwcut

80000 Split by Time Split Evenly Binned at Front 70000

Figure 3.9: Dierences Between Load Schemes

Figure 3.10: Summary of rwcut

Selecting Fields to Display

Selecting Fields for Performance

Rearranging Fields for Clarity

bytes| 312| 120| 60| 120| 120|

Selecting Records to Display

Sorting Flow Records With rwsort