Beruflich Dokumente
Kultur Dokumente
concocted clever code for scripts. But while most logs are encoded in ASCII (or at least halfASCII), there are still devices and applications out there which produce binary-encoded logfiles.
What if you have to Splunk such a beast? Getting this data into Splunk requires a little extra
work, but is a straight-forward process. It will require some scripting skills (in your favorite
language, such as Perl, Python or Java), access to vendor reference manuals and hexadecimal
conversions, perseverance, and ready supply of your favorite code-slinging beverage.
Binary Coded Decimal Each decimal digit occupies four bits, and only hex
values 0-9 are used. Thus, a two-byte value which looks like this:
1st byte
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
Value
2nd byte
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
Value
Little Endian This means that the least-significant byte is first backwards
from the conventional view of how things should be laid out. A four-byte
little-endian value looks like this:
1st byte
2nd byte
3rd byte
4th byte
04
00
00
00
Big Endian No, not Tonto. This means that the most-significant byte is
first. A four-byte big-endian value looks like this:
1st byte
2nd byte
3rd byte
4th byte
04
00
00
00
ASCII Occasionally the vendor will slip up and encode fields in plain-oleASCII. But not often.
Take input data as STDIN, and output converted data directly to STDOUT.
This will allow Splunk to stream the data inputs, in much the same way that it
handles compressed (i.e., gzipd) files.
Output the data in key-value pairs, so that Splunk will auto-magically extract
the field names and corresponding values. Use a format like this:
Gotchas
Here are some potential problem areas and recommendations:
Make sure to build in a way to sanity-check if your utility gets off on record
boundaries. Since we are processing a sea of bits, it is not intuitively
obviously if our script were to derail and start processing the wrong data from
the wrong places in a record. Choose a field which has a predictable value,
such as sequential record numbers or the year portion of a date stamp or
something similar. Add a valid_record field or some other means of
detecting that the conversion has derailed and that the field values may be
bogus for this record.
Example
Lets look at an example, based on binary-encoded Call Data Records from a Well Known
Switch Vendor, and a Perl script to convert it. For the sake of brevity, we will assume a record
which contains only four fields, rather than the actual record which contains 100 fields (youre
welcome). I have written the conversion script in Perl, my hack-ware of choice. I have done just
enough software development to appreciate Real Software Developers, in the same way that I
have sweated just enough copper pipes to appreciate competent plumbers. Thus, I make no
claims that my code is the most elegant or the most efficient.
Let us assume that each binary record is 17 bytes long, and contains the following fields:
2nd byte
3rd byte
4th byte
01
02
03
04
Decimal Value
0
IWFQNC
PDSN_BILL
ROAM
anything else
Unknown
2nd byte
3rd byte
4th byte
5th byte
6th byte
7th byte
CE
07
0A
0B
08
16
1A
1st byte
2nd byte
3rd byte
4th byte
5th byte
30
32
92
37
76
The following script makes extensive use of the pack and unpack functions; we leave it as an
Exercise For The Student to learn the nuances of these perl functions.
Here is a sample Perl script to process the binary-coded records described above.
Test It
From the command line, test the script in this way:
# cat binary_logfile | cdr2text.pl
Looking good!
Splunk It
Add a stanza to inputs.conf, similar to this:
[monitor:///var/log/cdr_data/*]
disabled = 0
followTail = 0
host = voice_switch
index = cdr
sourcetype = cdr_binary
Add a stanza to props.conf to pre-process the binary logs. Make sure to use the same sourcetype
as the inputs.conf entry:
[cdr_binary]
invalid_cause = archive
unarchive_cmd = /usr/local/bin/cdr2text.pl
Conclusion
Although the majority of logged event data is text/ASCII, there are still systems which generate
binary-encoded log data, including a number of voice switch products. Pulling this data into
Splunk can yield extremely valuable insights, such as call volumes, per-user trends and
fraud/abuse analysis. With a little scripting, such data can be readily converted and streamed into
Splunk. Amaze your coworkers, and possibly even your boss!