Category Archives: Perl

Symantec Brightmail syslog message parser

Ok, this will not be interesting to most of you folks that are subscribed (all three of you [hi Mom!]) but I’m hoping Google will get it and then if anyone needs this script, it’ll be there to help them.

This is just a simple log parser for the really, really annoying multi-line/multi-message format that Symantec Brightmail insists on using when it sends syslog information.

The key points: set your $delimiter and $nullvalue appropriately, and notice that, on fields where Brightmail may have multiple messages (like the IRCPTACTION field, where it basically says if something was delivered, to an individual recipient on the message) the field is sub-divided with commas. This is ok, I verified over a large sampling that those fields do not ever have a comma normally, so you should be able to deal with that just fine if you want to script against the results.

Questions? Comment away. I check ’em.

use strict;
use Carp;

my ($in, $out) = @ARGV;
my $DEBUG=0;
my $line;

croak "\nPlease specify input & output files.  Usage\n\n\t$0 infile outfile\n\n" if (!$in or !$out);
croak "\nABORTED: Input and output files are the same: $in\n\n" if ($in eq $out);

open INFILE, $in or die $!;
open OUTFILE, ">$out" or die $!;

my %result_hash = ();
my $delimiter = "~!^!~"; # I use something weird because the subject line could have anything
my $nullvalue = "NULL";

foreach $line (<INFILE>) {
  chomp($line);
  chomp($line);
  # print "\$line = $line\n";

  # Discard lines that are not from bmserver or ecelerity (the two Brightmail components)
  unless ($line =~ /bmserver:/ || $line =~ /ecelerity:/) { next; }

  # split on pipes "|" to process further
  my ($timestuff, $UID, $msgtype, $therest) = split(/\|/, $line, 4);

  # do some basic validation of UID and msgtype fields, throwaway outliers
  if ($UID =~ /\Q[^0-9a-z\-]\E/ || $msgtype =~ /\Q[^A-Z]\E/) { next; }

  # now we parse all of this crap into a big hash
  if (exists($result_hash{$UID})) {
     if (exists($result_hash{$UID}{$msgtype})) {
        $result_hash{$UID}{$msgtype} = $result_hash{$UID}{$msgtype}.",".$therest;
     } else {
        $result_hash{$UID}{$msgtype} = $therest;
     }
  } else {
     my @timefields = split(/ +/, $timestuff);
	 $result_hash{$UID}{"TIMESTAMPINT"} = $timefields[-1];
     $result_hash{$UID}{$msgtype} = $therest;
  }
}

my @recs_to_sort = ();
my @hash_elements = qw(ACCEPT ATTACH ATTACHFILTER DELIVER DELIVERY_FAILURE IRCPTACTION MSGID ORCPTS SENDER SOURCE SUBJECT TRACKERID UNSCANNABLE UNTESTED VERDICT VIRUS);
for my $key (keys %result_hash) {
  my @tmp_line = ();
  push(@tmp_line, $result_hash{$key}{"TIMESTAMPINT"});
  push(@tmp_line, $key);
  foreach my $element (@hash_elements) {
     if (exists($result_hash{$key}{$element})) {
        push(@tmp_line, $result_hash{$key}{$element});
     } else {
        push(@tmp_line, $nullvalue);
     }
  }
  push(@recs_to_sort, join($delimiter,@tmp_line));
}

# sort by time for our database inserts
my @sorted_recs = sort @recs_to_sort;

foreach (@sorted_recs) {
  print OUTFILE "$_\n";
}