Tshark to the rescue

2009-06-29 00:30

wiresharkIn my line of work as a sysadmin at one of the largest sites in Norway, it happens once in a while that I have to inspect HTTP traffic for some more or less urgent reason.

One of the tools I really love working with is tshark.
Tshark is the console version of wireshark and enables you to sniff and dissect just about any protocol in realtime.

One of the problems I had recently was to identify webtraffic originating from our webserver.
Over the years code has accumulated server initiated fetches. Stuff like file_get_content("http://somesite/someurl) in the presentation code. This is bad since it creates external dependencies to deliver a page and keeps apache/nginx/lighttd threads/processes busy

This roughly says:

  • Listen on the eth0 interface for 60 seconds.
  • Write out two different sets of statistics about the traffic.
  • Write out the "Host:" header, the URL and the request method. (GET/POST).
  • Try to interpret the traffic as a HTTP.
  • write timestamps in a readable format (not used)
  • Only look at traffic from my IP to port 80 (HTTP) and port 443 (HTTPS)

This little trick helped me identify loads of external dependencies and pinpointed some ugly code that needed some care.

And while I was at it. I figured out I could do something similar with mysql queries. Instead of turning on full Query-logging in mysql (which probably means a restart of a running production mysql) I could just sniff it


tshark -i eth0 -aduration:60 -d tcp.port==3306,mysql -T fields -e mysql.query 'port 3306'

Which roughly says:

  • Listen on eth0 for 60 seconds
  • Interpret port 3306 as mysql
  • write out queries
  • Only look at traffic on port 3306

Have fun.

Other usefull options to -T fields -e
http.response.code
http.server
http.content_type
ip.src
ip.dst
tcp.port
http.user_agent

Pipestat

2009-06-28 17:51

Some days it feels like reading and analysing logs is the only thing I do at work.
And when I analyse logs it's usually something like


zcat -f *log* | grep -v something | awk '/something else/ { print $3 }' | sed 's/something/else/g' | sort | uniq -c | sort -gr > output

And when this does not fully comply to what i need to do i fall back to perl and write a ugly oneliner of the type


tail -f foo | perl -lane 's/foo(.*)/bar$1bar/g'

So when my wife and daughter got a visit from Mr Noro I had some spare time and wrote tailstat. It's a small perlprogram that does what I manualy do every day

It has a few different modes

Plain

[root@wildbill logs]# for i in foo foo bar bar foo foobar;do echo $i;done | pipestat
foo : (50.0%) (107546.3 hits/s) (3 hits / 6)
bar : (33.3%) (71697.5 hits/s) (2 hits / 6)
foobar : (16.7%) (35848.8 hits/s) (1 hits / 6)

Parsed 6 lines in 0.00 secs (46863.7 lines/s)

a single regexp


[root@wildbill logs]# for i in foo foo bar bar foo foobar;do echo $i;done | pipestat foo
foo : (66.7%) (100462.4 hits/s) (4 hits / 6)
: (33.3%) (50231.2 hits/s) (2 hits / 6)

Parsed 6 lines in 0.00 secs (25317.7 lines/s)

More real examples,


[root@wildbill logs]# cat www-access_log | pipestat 'css' 'jpe?g' 'gif' 'png'
jpe?g : (30.3%) (24590.5 hits/s) (122851 hits / 405830)
css : (11.7%) (9494.6 hits/s) (47434 hits / 405830)
gif : (6.6%) (5355.4 hits/s) (26755 hits / 405830)
png : (0.4%) (341.3 hits/s) (1705 hits / 405830)
: (51.0%) (41451.2 hits/s) (207085 hits / 405830)

Parsed 405830 lines in 5.00 secs (81230.8 lines/s)

Or if you just want field number 5…


[root@wildbill logs]# cat www-access_log | pipestat --field=9 --limit=5
200 : (88.2%) (77506.3 hits/s) (350721 hits / 397549)
HTTP/1.1" : (3.3%) (2858.5 hits/s) (12935 hits / 397549)
404 : (2.9%) (2530.8 hits/s) (11452 hits / 397549)
301 : (2.4%) (2092.8 hits/s) (9470 hits / 397549)
403 : (1.5%) (1321.3 hits/s) (5979 hits / 397549)

: (1.8%) (1545.2 hits/s) (6992(1311) hits(uniq) / 397549)

Parsed 397549 lines in 4.53 secs (87795.5 lines/s)

This takes a apache accesslogg and print out the 9th whitespace seperated field and writes out the 5 most common hits, the rest of the hits are sumed up in

But as you can see there is something in the logs that confuses the fieldseperator

So then we go into matchmodus.


[root@wildbill logs]# cat www-access_log | pipestat -m 'HTTP/1.1" (\d+)'
200 : (92.3%) (218082.1 hits/s) (375934 hits / 407426)
404 : (3.1%) (7229.9 hits/s) (12463 hits / 407426)
301 : (2.5%) (5910.1 hits/s) (10188 hits / 407426)
403 : (1.4%) (3339.1 hits/s) (5756 hits / 407426)
302 : (0.4%) (873.1 hits/s) (1505 hits / 407426)
405 : (0.0%) (37.1 hits/s) (64 hits / 407426)
206 : (0.0%) (19.1 hits/s) (33 hits / 407426)
304 : (0.0%) (4.6 hits/s) (8 hits / 407426)
: (0.4%) (855.7 hits/s) (1475 hits / 407426)

Parsed 407426 lines in 1.72 secs (236328.4 lines/s)

Or we could run it on the live log for say 60 seconds and what article is the most popular

[ay@joanie ~]$ varnishlog -i RxURL -I artid | pipestat --runtime=60 --limit 5 -m 'artid=(d+)'
542689 : (7.9%) (1.5 hits/s) (91 hits / 1154)
542739 : (7.5%) (1.4 hits/s) (86 hits / 1154)
542568 : (4.4%) (0.8 hits/s) (51 hits / 1154)
542723 : (4.1%) (0.8 hits/s) (47 hits / 1154)
542718 : (3.7%) (0.7 hits/s) (43 hits / 1154)

: (72.4%) (13.5 hits/s) (835(244) hits(uniq) / 1154)
: (0.1%) (0.0 hits/s) (1 hits / 1154)

Parsed 1154 lines in 61.80 secs (18.7 lines/s)

Have fun. The code is here

10 Years of Blogging

2008-06-18 21:41

I've been blogging since 27 august 1998. Time to celebrate with.. yet another blogpost. During these 10 years on the internet I've written 47 blogposts. That averages to 4.7 posts a year.

I'm pretty good.

Since my last post, I've changed jobs (now working at VG Multimedia http://www.vg.no as head of IT Operations) , got married (with Elisabeth) and got a daughter (Aurora, now 10 months).

The joy of the information society

I'll keep you posted :-)

Ok.We're back

2006-05-02 20:19

We are back home. And have been for a while. This is posted wirh my beloved E70

ITN 681 – Exam

2005-11-17 08:30

ITN681 TRUSTED SYSTEMS AND NETWORKS 17 Nov 2005 08:30 am 03:10

ITN673 – Exam

2005-11-14 13:30

ITN673 COMPUTER FORENSICS 14 Nov 2005 01:30 pm 03:10

ITB644 – Exam

2005-11-12 13:30

ITB644 WINDOWS ADMINISTRATION 12 Nov 2005 01:30 pm 02:30

Relocated to Australia

2005-11-03 14:17

Elisabeth and Me has relocated to South Bank, Brisbane, Queensland, Australia to Study for Masters in IT for a year. We’ve been here for quite a few months already and live in a nice apartment at the 7th floor of River Plaza together with Desmond (Singapore), Alana (Tasmania, Australia) and Kelvin (Zimbabwe) and Usually Desmonds girlfriend Monica (Hong-Kong).
We both study for our Masters Degrees at Queensland University of Technology (QUT) and will hopefully be Masters of IT june 2006

ITN245 – Exam

2005-11-02 13:30

ITN245 R/3 SYSTEM ADMINISTRATION 02 Nov 2005 01:30 pm 02:10

ITN245 – SAP Handbook

2005-09-23 21:00

This is utterly boring.