Reading DNS traffic captures: what your queries look like at the wire

Most DNS observability tooling shows you a summary — query volume, response codes, top queried names. That’s useful, but the truth is in the packets. When DNS misbehaves, you eventually have to look at the wire.

These are the tools and filters that handle most cases.

tcpdump for DNS

The simplest capture filter for DNS is port 53:

sudo tcpdump -i any -n port 53

You see queries and responses, with tcpdump’s built-in DNS decoder rendering the question and answer sections inline. Useful for quick checks; verbose for serious analysis.

For more detail, use -vv and write to a pcap:

sudo tcpdump -i any -n -vv -w /tmp/dns.pcap port 53

Read it back with tcpdump -r /tmp/dns.pcap or open in Wireshark.

Filters that come up regularly:

# Only queries (the QR bit is 0)
sudo tcpdump -i any -n 'udp port 53 and udp[10] & 0x80 = 0'

# Only responses
sudo tcpdump -i any -n 'udp port 53 and udp[10] & 0x80 = 0x80'

# Only NXDOMAIN responses (RCODE 3)
sudo tcpdump -i any -n 'udp port 53 and udp[10] & 0x80 = 0x80 and udp[11] & 0x0f = 3'

# Only TCP DNS (truncated UDP fallback or DoT)
sudo tcpdump -i any -n 'tcp port 53'

The bit math is annoying but stable. Memorize the QR-bit-and-RCODE filters; they handle most debugging.

dnstap for the recursor’s view

tcpdump shows you DNS on the wire. dnstap shows you DNS inside the recursor — including cache hits, validation steps, and outbound recursive queries. It’s a structured logging format produced by BIND, Unbound, Knot Resolver, and others.

Setup looks like (Unbound example):

dnstap:
    dnstap-enable: yes
    dnstap-socket-path: "/var/run/unbound/dnstap.sock"
    dnstap-log-resolver-query-messages: yes
    dnstap-log-resolver-response-messages: yes
    dnstap-log-client-query-messages: yes
    dnstap-log-client-response-messages: yes

Then read the socket with dnstap-ldns -u /var/run/unbound/dnstap.sock.

Output is one event per query/response, with full DNS message bytes plus the recursor’s metadata: cache hit/miss, upstream queried, response time. For debugging recursor behavior, dnstap is irreplaceable.

Wireshark filters for DNS

Once you’re in Wireshark, the filter language is more usable than tcpdump’s bit math. Common ones:

# All DNS
dns

# Only queries
dns.flags.response == 0

# Only responses
dns.flags.response == 1

# Specific query name
dns.qry.name == "example.com"

# Specific query name, contains
dns.qry.name contains "google"

# NXDOMAIN responses
dns.flags.rcode == 3

# Truncated responses (UDP too small, fallback to TCP)
dns.flags.truncated == 1

# DNSSEC-related queries (DNSKEY, RRSIG, DS, NSEC, NSEC3)
dns.qry.type == 48 or dns.qry.type == 46 or dns.qry.type == 43 or dns.qry.type == 47 or dns.qry.type == 50

# Slow responses (>500ms)
dns.time > 0.5

Wireshark also has a “DNS statistics” view (Statistics → DNS) that summarizes a capture: query types, response codes, response times. For a sample of 5-10 minutes of capture, this shows you what’s normal and what’s an outlier.

What problems show up where

Some examples from captures we’ve actually had to debug:

TLD lookups failing intermittently. tcpdump on the recursor showed DNS queries to root servers that timed out. Root cause: an upstream provider was rate-limiting outbound port 53 to “anti-DNS-amplification” levels, breaking our recursor’s recursive lookups under load.

DNSSEC validation failures spiking after a zone update. dnstap on the recursor showed RRSIG validation failures. The zone owner had rolled their ZSK without proper key rollover, leaving signed records that didn’t match the published DNSKEY.

Slow first queries. Wireshark showed cold cache misses taking 800ms+ because the recursor was choosing a geographically distant authoritative server. Adjusting the recursor’s RTT preference for prefer-fastest-server improved cold queries.

Mysterious NXDOMAINs. tcpdump showed responses with aa flag NOT set — meaning they came from cache, not from authoritative. The local recursor had cached a stale negative response. Flush, retry, fixed.

The common thread: the summary tooling tells you something is wrong. The packet capture tells you what.

Storage and retention

DNS traffic is high-volume. A busy recursor can do millions of queries per day. Capturing everything is expensive in disk and analysis time.

Reasonable approach:

Run continuous dnstap to disk with a 3-7 day rotation. Compresses well.
Run targeted tcpdump only when investigating a specific problem.
Keep query/response statistics indefinitely (low cardinality).

Capturing full packets continuously is a discovery question, not a normal operations one. Save the capacity for the times you need it.