Find all unique url's from Apache log files

Posted on Tue 05 February 2013 in misc

I needed to build a list of all unique hits that had been made on a website in Apache.

Here's what I came up with using awk and sed.  This should match any HTTP 2xx or 3xx requests and strip of any GET request parameters.

awk '\$9 \~/\^(2|3)/ {print \$7}' somelogs\* | sed 's/\\?.\*\$//' | sort | uniq