Symantec Brightmail syslog message parser

Ok, this will not be interesting to most of you folks that are subscribed (all three of you [hi Mom!]) but I’m hoping Google will get it and then if anyone needs this script, it’ll be there to help them.

This is just a simple log parser for the really, really annoying multi-line/multi-message format that Symantec Brightmail insists on using when it sends syslog information.

The key points: set your $delimiter and $nullvalue appropriately, and notice that, on fields where Brightmail may have multiple messages (like the IRCPTACTION field, where it basically says if something was delivered, to an individual recipient on the message) the field is sub-divided with commas. This is ok, I verified over a large sampling that those fields do not ever have a comma normally, so you should be able to deal with that just fine if you want to script against the results.

Questions? Comment away. I check ’em.

use strict;
use Carp;

my ($in, $out) = @ARGV;
my $DEBUG=0;
my $line;

croak "\nPlease specify input & output files.  Usage\n\n\t$0 infile outfile\n\n" if (!$in or !$out);
croak "\nABORTED: Input and output files are the same: $in\n\n" if ($in eq $out);

open INFILE, $in or die $!;
open OUTFILE, ">$out" or die $!;

my %result_hash = ();
my $delimiter = "~!^!~"; # I use something weird because the subject line could have anything
my $nullvalue = "NULL";

foreach $line (<INFILE>) {
  # print "\$line = $line\n";

  # Discard lines that are not from bmserver or ecelerity (the two Brightmail components)
  unless ($line =~ /bmserver:/ || $line =~ /ecelerity:/) { next; }

  # split on pipes "|" to process further
  my ($timestuff, $UID, $msgtype, $therest) = split(/\|/, $line, 4);

  # do some basic validation of UID and msgtype fields, throwaway outliers
  if ($UID =~ /\Q[^0-9a-z\-]\E/ || $msgtype =~ /\Q[^A-Z]\E/) { next; }

  # now we parse all of this crap into a big hash
  if (exists($result_hash{$UID})) {
     if (exists($result_hash{$UID}{$msgtype})) {
        $result_hash{$UID}{$msgtype} = $result_hash{$UID}{$msgtype}.",".$therest;
     } else {
        $result_hash{$UID}{$msgtype} = $therest;
  } else {
     my @timefields = split(/ +/, $timestuff);
	 $result_hash{$UID}{"TIMESTAMPINT"} = $timefields[-1];
     $result_hash{$UID}{$msgtype} = $therest;

my @recs_to_sort = ();
for my $key (keys %result_hash) {
  my @tmp_line = ();
  push(@tmp_line, $result_hash{$key}{"TIMESTAMPINT"});
  push(@tmp_line, $key);
  foreach my $element (@hash_elements) {
     if (exists($result_hash{$key}{$element})) {
        push(@tmp_line, $result_hash{$key}{$element});
     } else {
        push(@tmp_line, $nullvalue);
  push(@recs_to_sort, join($delimiter,@tmp_line));

# sort by time for our database inserts
my @sorted_recs = sort @recs_to_sort;

foreach (@sorted_recs) {
  print OUTFILE "$_\n";

3 responses

  1. What is &quot? Is it ” ?
    The script has errors with this one..
    What is the proper way to run this one?

    perl inputfile.txt outputfile.txt?

    1. Hiya! It looks like the script got messed up somehow when I posted it as code. When I get to a computer tomorrow I’ll fix the post and comment here. Thanks for the note! And yes, you would run it as you said once fixed.

    2. Hello, the code should be fixed now. Go ahead and try again. Remember to set your field delimiter and null value on lines 15-16 to whatever you’d like to use in the output. Cheers!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: