Analyzing a Trac SPAM Attempt
One of the best web interfaces for visualizing Subversion repositories (as well as providing
integrated project management and ticketing functionality) is the
Trac Project, and all Cipherdyne software projects
use it. With Trac's success, it also becomes
a target for those that would try to subvert it for purposes such as creating spam. Because
I deploy Trac with the Roadmap, Tickets, and Wiki features
disabled,
my deployment essentially does not accept user-generated content (like comments in tickets
for example) for display. This minimizes my exposure to Trac spam, which has become a
problem significant enough for various spam-fighting strategies to be developed such as
configuring mod-security and
plugin from the Trac project itself.
Even though my Trac deployment does not display user-generated content, there are still places where Trac accepts query strings from users, and spammers seem to try and use these fields for their own ends. Let's see if we can find a few examples. Trac web logs are informative, but sifting through huge logfiles can be tedious. Fortunately, simply sorting the logfiles by line length (and therefore by web request length) allows many suspicious web requests to bubble up to the top. Below is a simple perl script line_len.pl that sorts any ascii text file by line length, and precedes each printed line with the line length followed by the number of lines equal to that length. That is, not all lines are printed since the script is designed to handle large files - we want the unusually long lines to be printed but the many shorter lines (which represent the vast majority of legitimate web requests) to be summarized. This is an important feature considering that at this point there is over 2.5GB of log data specifically from my Trac server.
$ cat line_len.pl
#!/usr/bin/perl -w
#
# prints out a file sorted by longest lines
#
# $Id: index.html 1741 2008-07-05 15:30:00Z mbr $
#
use strict;
my %url = ();
my %len_stats = ();
my $mlen = 0;
my $mnum = 0;
open F, "< $ARGV[0]" or die $!;
while (<F>) {
my $len = length $_;
$url{$len} = $_;
$len_stats{$len}++;
$mlen = $len if $mlen < $len;
$mnum = $len_stats{$len}
if $mnum < $len_stats{$len};
}
close F;
$mlen = length $mlen;
$mnum = length $mnum;
for my $len (sort {$b <=> $a} keys %url) {
printf "[len: %${mlen}d, tot: %${mnum}d] %s",
$len, $len_stats{$len}, $url{$len};
}
exit 0;
To illustrate how it works, below is the output of the line_len.pl script used against
itself. Note that at the top of the output the more interesting code appears whereas
the most uninteresting code (such as blank lines and lines that contain closing "}"
characters) are summarized away at the bottom:
$ ./line_len.pl line_len.pl
[len: 51, tot: 1] # $Id: index.html 1741 2008-07-05 15:30:00Z mbr $
[len: 50, tot: 1] printf "[len: %${mlen}d, tot: %${mnum}d] %s",
[len: 48, tot: 1] $len, $len_stats{$len}, $url{$len};
[len: 44, tot: 1] # prints out a file sorted by longest lines
[len: 43, tot: 1] for my $len (sort {$b <=> $a} keys %url) {
[len: 37, tot: 1] if $mnum < $len_stats{$len};
[len: 34, tot: 1] $mlen = $len if $mlen < $len;
[len: 32, tot: 1] open F, "< $ARGV[0]" or die $!;
[len: 29, tot: 1] $mnum = $len_stats{$len}
[len: 25, tot: 1] my $len = length $_;
[len: 24, tot: 1] $len_stats{$len}++;
[len: 22, tot: 2] $mnum = length $mnum;
[len: 21, tot: 1] $url{$len} = $_;
[len: 20, tot: 2] my %len_stats = ();
[len: 19, tot: 1] #!/usr/bin/perl -w
[len: 14, tot: 3] while (<F>) {
[len: 12, tot: 1] use strict;
[len: 9, tot: 1] close F;
[len: 8, tot: 1] exit 0;
[len: 2, tot: 5] }
[len: 1, tot: 6]
Now, let's execute the line_len.pl script against the trac_access_log file and look
at one of the longest web requests. (The line_len.pl script was able to reduce
the 12,000,000 web requests in my Trac logs to a total of 610 interesting lines.) This
particular request is 888 characters long, but there were some other similar suspicious
requests that had over 4,000 characters that are not displayed for brevity:
[len: 888, tot: 1] 195.250.160.37 - - [02/Mar/2008:00:30:17 \ -0500] "GET /trac/fwsnort/anydiff?new_path=%2Ffwsnort%2Ftags%2Ffwsnort\ -1.0.3%2Fsnort_rules%2Fweb-cgi.rules&old_path=%2Ffwsnort%2Ftags%2F\ fwsnort-1.0.3%2Fsnort_rules%2Fweb-cgi.rules&new_rev=http%3A%2F%2Ff\ 1234.info%2Fnew5%2Findex.html%0Ahttp%3A%2F%2Fa1234.info%2Fnew4%2F\ map.html%0Ahttp%3A%2F%2Ff1234.info%2Fnew2%2Findex.html%0Ahttp%3A%2F\ %2Fs1234.info%2Fnew9%2Findex.html%0Ahttp%3A%2F%2Ff1234.info%2Fnew6%2F\ map.html%0A&old_rev=http%3A%2F%2Ff1234.info%2Fnew5%2Findex.html%0Ahttp\ %3A%2F%2Fa1234.info%2Fnew4%2Fmap.html%0Ahttp%3A%2F%2Ff1234.info%2Fnew2\ %2Findex.html%0Ahttp%3A%2F%2Fs1234.info%2Fnew9%2Findex.html%0Ahttp%3A\ %2F%2Ff1234.info%2Fnew6%2Fmap.html%0A HTTP/1.1" 200 3683 "-" \ "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; \ Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR \ 1.1.4322; .NET CLR 2.0.50727; InfoPath.2)"My guess is that the above request is a bot that is trying to do one two things: 1) force Trac to accept the content in the request (which contains a bunch of links to pages like "http://f1234.info/new/index.html" - note that I altered the domain so as to not legitimize the original content) and display it for other Trac users or to search engines, or 2) force Trac itself to generate web requests to the provided links (perhaps as a way to increase hit or referrer counts from domains - like mine - that are not affiliated with the spammer). Either way, the strategy is flawed because the request is against the Trac "anydiff" interface which doesn't accept user content other than svn revision numbers, and (at least in Trac-0.10.4) such requests do not cause Trac to issue any external DNS or web requests - I verified this with tcpdump on my Trac server after generating similar requests against it.
Still, in all of my Trac web logs, the most suspicious web requests are against the "anydiff" interface, and specifically against the "web-cgi.rules" file bundled within the fwsnort project. But, the requests never come from the same IP address, the "anydiff" spam attempts never hit any other link besides the web-cgi.rules page, and they started with regularity in March, 2008. This makes a stronger case for the activity coming from bot that is unable to infer that its activities are not actually working (no surprise there). Finally, I left the original IP address 195.250.160.37 of the web request above intact so that you can look for it in your own web logs. Although 195.250.160.37 is not listed in the Spamhaus DNSBL service, a rudimentary Google search indicates that 195.250.160.37 has been noticed before by other sites as a comment spammer.

Port Knocking and
its big brother, Single Packet Authorization (SPA), can provide a robust additional layer
of protection for services such as SSH, but there are many competing Port Knocking and SPA
implementations. This talk will present practical usages of fwknop in Port Knocking and SPA
modes, and discuss what works and what doesn't from a protocol perspective. Integration
points for both iptables and ipfw firewalls on Linux and FreeBSD systems will be highlighted,
and client-side support on Windows will be demonstrated. Finally, advanced functionality
such as inbound NAT support for authenticated connections, sending SPA packets over the
Tor anonymity network, and covert channel usages will be discussed. With SPA deployed,
anyone scanning for a service with Nmap cannot even tell that it is listening; let alone
target it with an exploit (zero-day or not).


As you can see, at the beginning of the graph around July 1st Google steadily shows
about 28,000 results for "cipherdyne", but towards the end of July this dips to well below 20,000 only to
rebound in August to about 30,000. Then, beginning around March 1st, 2008 the results shoot
up to over 100,000 briefly and then back to around 70,000 in May. How does one interpret this
data? It seems unlikely that these fluctuations can be entirely explained by "actual"
day-by-day changes in how external sites reference the term "cipherdyne" - there must be some
index updating component that is internal to Google at work here, and we'll see a better example
of this below.
The most obvious feature of the gpgdir graph above is the large spike to around 60,000 results
around May 1st. It turns out that an
Again, we see a dramatic spike in search results - from around 5,000 to well over 50,000 and
settling down to about 10,000 around the beginning of June. Although there has been some
activity related to SPA in the




