Passive DNS mining from PCAP with dpkt & Python

Update 04/14: A friend pointed me to dnssnarf, a project that looks like it was written at a DojoSec meeting by Christopher McBee and then updated a bit later on by Grant Stavely. It uses Scapy (which I hear is really neat if you haven’t played with it). Check Grant’s blog post about dnssnarf out.

So, here is another quickie in case anyone needs it out there in the Intertubes. Say you have a .pcap file, or many .pcap files, and you want to mine the DNS responses out of them so you can build up a passive DNS database and track malicious resolutions to build a list of ban-able IP addresses. This script aims to parse a given .pcap file (tcpdump/wireshark libpcap format) and returns the results of the query types you have interest in.

This script is built around dpkt, a tool by Dug Song, and the contents are heavily inspired by the tutorials present at Jon Oberheide’s site (also a developer of dpkt). Honestly, most of the time writing this was spent understanding how dpkt handled its internal data structures and how to get to the data. The documentation on dpkt is not the most mature, but the source is pretty readable, if you keep the references I mention in the comments at hand. Also, this script was only tested with Python 2.6 and dpkt 1.7 on Linux, it was confirmed to not work on Windows as dpkt appears to have some serious problems with Windows at the moment.

#!/usr/bin/env python

import dpkt, socket, sys

if len(sys.argv) < 2 or len(sys.argv) > 2:
 print "Usage:\n", sys.argv[0], "filename.pcap"
 sys.exit()

f = open(sys.argv[1])
pcap = dpkt.pcap.Reader(f)

for ts, buf in pcap:
 # make sure we are dealing with IP traffic
 # ref: http://www.iana.org/assignments/ethernet-numbers
 try: eth = dpkt.ethernet.Ethernet(buf)
 except: continue
 if eth.type != 2048: continue
 # make sure we are dealing with UDP
 # ref: http://www.iana.org/assignments/protocol-numbers/
 try: ip = eth.data
 except: continue
 if ip.p != 17: continue
 # filter on UDP assigned ports for DNS
 # ref: http://www.iana.org/assignments/port-numbers
 try: udp = ip.data
 except: continue
 if udp.sport != 53 and udp.dport != 53: continue
 # make the dns object out of the udp data and check for it being a RR (answer)
 # and for opcode QUERY (I know, counter-intuitive)
 try: dns = dpkt.dns.DNS(udp.data)
 except: continue
 if dns.qr != dpkt.dns.DNS_R: continue
 if dns.opcode != dpkt.dns.DNS_QUERY: continue
 if dns.rcode != dpkt.dns.DNS_RCODE_NOERR: continue
 if len(dns.an) < 1: continue
 # now we're going to process and spit out responses based on record type
 # ref: http://en.wikipedia.org/wiki/List_of_DNS_record_types
 for answer in dns.an:
   if answer.type == 5:
     print "CNAME request", answer.name, "\tresponse", answer.cname
   elif answer.type == 1:
     print "A request", answer.name, "\tresponse", socket.inet_ntoa(answer.rdata)
   elif answer.type == 12:
     print "PTR request", answer.name, "\tresponse", answer.ptrname

12 thoughts on “Passive DNS mining from PCAP with dpkt & Python”

  1. Without doing too much investigating, it seems that not all A responses come back with 4 bytes, making socket.inet_ntoa throw an error on some pcaps.

    Adding:

    if len(answer.rdata) == 4:

    before attempting to print the output appears to fix it.

    1. Lou,

      First, sorry for the delay in response.

      Thanks for the tip. I’m trying to figure out why an A record wouldn’t be 4-bytes. Do you have a sample you can show me? I haven’t found this on our network at work, which is pretty large, but that’s not saying it couldn’t happen.

      There is mention on the Wikipedia page here of some RFC’s that might explain the issue, http://en.wikipedia.org/wiki/List_of_DNS_record_types. Notably, they talk about “Returns a 32-bit IPv4 address, most commonly used to map hostnames to an IP address of the host, but also used for DNSBLs, storing subnet masks in RFC 1101, etc.”. A subnet mask should still be 4 bytes. But in reading the RFC 1035, I came across this in section 3.4.1:

      Hosts that have multiple Internet addresses will have multiple A
      records.

      Maybe you’re seeing packets with multiple A record addresses in one response? I haven’t tested that, I might have to iterate over answer.rdata, maybe it comes back as an array if there are multiple responses or something. Will test it and report back here.

      An example of this:

      8 0.658817 192.168.1.1 192.168.1.5 DNS Standard query response CNAME http://www.l.google.com A 72.14.204.147 A 72.14.204.104 A 72.14.204.99 A 72.14.204.103

      And:

      31 242.110444 192.168.1.1 192.168.1.5 DNS Standard query response A 74.200.243.253 A 76.74.255.123 A 72.233.2.58 A 72.233.2.59 A 76.74.254.123 A 74.200.243.251

      Whereas:

      21 106.408197 192.168.1.1 192.168.1.5 DNS Standard query response A 64.85.164.40

      Thanks again!

  2. Traceback (most recent call last):
    File “./parserdns.py”, line 12, in
    for ts, buf in pcap:
    File “build/bdist.linux-i686/egg/dpkt/pcap.py”, line 141, in __iter__
    File “build/bdist.linux-i686/egg/dpkt/dpkt.py”, line 75, in __init__
    dpkt.dpkt.NeedData

  3. Thanks! you really helped me in a project I’m working on!

    Here is a question for you – I need to generate a list of all domain-names and their corresponding IP-addresses from a pcap file, so from your example I used this code:
    if answer.type == 1:
    print “A request”, answer.name, “\tresponse”, socket.inet_ntoa(answer.rdata)

    The problem is that answer.name is not good enough for me, because I need the original domain name requested, and not its’ CNAME representation. For example, one of the original DNS requests was for http://www.paypal.com, but the CNAME representation of it is paypal.112.2o7.net.

    I looked closely at your code and on dpkt’s dns.py, and realized we are extracting the information from the DNS Response (and not the query).
    I looked at the response packet in wireshark and saw that the original domain is there, under ‘queries’ and under ‘answers’, so my question is how can I extract it?

    Thanks!
    Matan

    1. Matan,

      I’m glad you found it helpful. If you are still working with my code directly, you could acquire the name from the Questions section of the DNS response via the dns.qd object provided by dpkt.dns. So if you wanted to get the question name(s) you would simply do this:

      for qname in dns.qd:
      print qname.name

      Or whatever else you’d like to do with it. Thanks for reading and commenting.

      –Madds

Leave a reply to mmishou Cancel reply