cipherdyne.org

Michael Rash, Security Researcher



Bing Indexing of gitweb.cgi Links

Bing indexing of gitweb.cgi links In June, 2011, all of the cipherdyne.org software projects were switched over to git from svn, and at the same time the web interface was switched to gitweb (along with hosting at github) from trac. Given the switch, I knew there would be a change to how search engines indexed the code/data, and one question would be whether any particular search engine would take a specific interest in the code provided via git and/or gitweb. Note that each of the fwknop, psad, fwsnort, and gpgdir projects have raw git repositories that can be cloned directly over HTTP from cipherdyne.org (a nice feature of git), or viewed with any browser through gitweb. (Personally, I like the "links2" text-based browser rendering of gitweb pages - nice and clean.)

First, here are some stats for indexing bots from major search engines across all cipherdyne.org Apache log data for hits against gitweb.cgi from June, 2011 to today:

HitsPercentageUser-Agent
50505581.01%Mozilla/5.0 (compatible; bingbot/2.0;)
502428.06%msnbot/2.0b (+http://search.msn.com/msnbot.htm)._
257074.12%Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)
65831.06%Feedfetcher-Google; (+http://www.google.com/feedfetcher.html;)
43100.69%Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
19560.31%Mozilla/5.0 (compatible; SISTRIX Crawler; http://crawler.sistrix.net/)
19050.31%Mozilla/5.0 (compatible; Purebot/1.1; +http://www.puritysearch.net/)
17510.28%Mozilla/5.0 (compatible; Yahoo! Slurp;)
16250.26%Mozilla/5.0 (compatible; MJ12bot/v1.4.0;)
14510.23%TwengaBot-Discover (http://www.twenga.fr/bot-discover.html)


Wow! So bots associated with Microsoft's Bing search engine take the top two spots for a combined hit total of well over 500,000 since June, 2011. If spread out over the entire time period (which it's not as we'll see) that would be an average of about 2,600 hits per day, and this figure is more than 20 times the third place bot. Google is in a distant forth place, even though Google used to heavily index Trac repositories.

So, let's see how the search engine hits are distributed since June, 2011. First, here is a graph of just gitweb hits by the top five crawlers: top 5 gitweb indexers Clearly, that is not a very uniform distribution from day to day. It looks like Bing has been hitting the gitweb interface at a rate of over 17,000 hits per day for a significant portion of late December and early January. The other search engines hardly even show up in the graph - you know there are big spikes when everything looks better on a logarithmic scale: top 5 gitweb indexers logarithmic With some additional work, it looks like the gitweb.cgi links that Bing is indexing are not all unique. That is, one might expect that Bing would hit a link, grab the content, and then not return to the same link for a while. Some gitweb.cgi links were hit more than 10 times and more than 100,000 links were hit more than once during this time period.

How does this compare with hits across other portions of cipherdyne.org? Bing indexing is still far and away the largest outlier: top 5 indexers of cipherdyne.org Given that 1) all of the information gitweb displays is derived from the underlying git repositories, and 2) the git repositories are directly accessible via HTTP anyway, it would seem that a better way for search engines to behave would be to just ignore gitweb altogether and pull directly from git. That would certainly cut down on the server-side resources necessary to service search engine requests. Perhaps though the general strategy of search engines is not to be too smart about such things - they probably just want access to data, and when they see a link they go after it. Either way, the kind of dedicated and repetitive indexing the Bing is doing against gitweb is a bit much, and it certainly seems as though they could implement a less intensive crawler. I'm curious if other server admins are seeing similar behavior.

Update 01/23: There are tons of web analysis tools out there, but I wrote a couple of quick scripts to generate the data in this blog post. The first "user_agent_stats.pl" parses Apache logs and produces user-agent graphs with Gnuplot as shown in this post. The second "uniq_hits.pl" is extremely simple and just counts the number of hits against the same links within the Apache log data. Both scripts accept log data via TDIN - here is an example where user agents who hit any "index.html" link are plotted (graph is not shown): $ zcat ../logs/cipherdyne.org*.gz |grep "index.html" | ./user_agent_stats.pl -p index_hits
[+] Parsing Apache log data...
[+] Total agents: 1769 (abbreviated to: 174 agents)
[+] Executing gnuplot...
Plot file: index_hits.gif
Agent stats: index_hits.agents

Software Release - fwknop-2.0

fwknop-2.0 released After a long development cycle, fwknop-2.0 has been released. This is the first production release of the fully re-written C version of fwknop, and is the culmination of an effort to provide Single Packet Authorization to multiple open source firewalls, embedded systems, mobile devices, and more. On the "server" side, supported firewalls now include iptables on Linux, ipfw on FreeBSD and Mac OS X, and pf on OpenBSD. The fwknop client is known to run on all of these platforms, and also functions on Windows systems running under Cygwin. There is also an Android client, and a good start on a iPhone client as well. On a personal note, I wish to thank Damien Stuart for a heroic effort to port most of the original perl code over to C. Also, several other people have made significant contributions including Jonathan Bennet, Max Kastanas, Sebastien Jeanquier, Ozmart, and others. If there are any issues, please get in touch with me directly or send an email to the fwknop mailing list.

Update 01/03: Both libfko library that powers much of fwknop operations and the fwknop client can be compiled as native Windows executables. In addition, there are perl and python bindings to libfko as well.

Update 01/07: Damien Stuart has built RPM files for fwknop on RHEL5, RHEL6, Fedora 15, 16, and 17 and for other architectures the Fedora koji build system can produce.

Software Release - fwknop-2.0rc5

fwknop-2.0rc5 released The 2.0rc5 candidate release of fwknop is available for download. There may be a few tweaks to the code before the official 2.0 release is made, but this is pretty close as-is. Significant development work has gone into fwknop since the 2.0rc4 release, and adds some major new functionality as well as fixing a few bugs. Here is a summary of the changes:

iPhone fwknop client: Max Kastanas has contributed an iPhone port of the fwknop client. He had already contributed on Android client, so the iPhone was the next natural step! We're looking for a maintainer of the iPhone code so that eventually it can be made available through the App Store. If you have iPhone development experience and are interested in taking this on, please contact me.

PF firewall support on OpenBSD: For quite a while now fwknop has brought Single Packet Authorization support to iptables firewalls on Linux and ipfw firewalls on FreeBSD and Mac OS X systems. The 2.0rc5 release now introduces support for the PF firewall on OpenBSD systems. By interfacing with the pfctl command, fwknop creates new rules to access a protected service (such as SSHD) to a PF anchor. Rules are deleted out of a running anchor with pfctl -a <anchor> -f - with the expired rule(s) removed. There is support in the fwknop test suite (see the test/ directory in the fwknop-2.0rc5 sources) to validate fwknop operations on OpenBSD systems, and if there are any issues please let me know.

Expiring SPA keys: With large SPA deployments where many different encryption keys - either Rijndael or GPG keys - are used to service lots of external users, key management can become somewhat of a burden. This feature allows an expiration date to be set in the access.conf file on a per-key basis. Any SPA packet received for an expired key is ignored by fwknopd. This feature was suggested by ozmart from the fwknop mailing list.

FORCE_NAT mode: For iptables firewalls, a new FORCE_NAT mode has been implemented that works as follows: for any valid SPA packet, force the requested connection to be NAT'd through to the specified (usually internal) IP and port value. This is useful if there are multiple internal systems running a service such as SSHD, and you want to give transparent access to only one internal system for each stanza in the access.conf file. This way, multiple external users can each directly access only one internal system per SPA key.

lsof launcher: The fwknop lsof launcher (extras/fwknop-launcher/fwknop-launcher-lsof.pl) is a lightweight daemon that allows the user to not have to manually run the fwknop client when attempting to gain access to a service that is protected by via fwknopd. This is accomplished by checking the output of lsof to look for pending connections in the SYN_SENT state, which (usually) indicate that a remote firewall is blocking the attempted connection. At this point, the launcher executes the fwknop client with the --get-key arg (so the user must place the key in the local filesystem) to generate an SPA packet for the attempted connection. The remote fwknopd daemon will reconfigure the firewall to allow temporary access, and this usually happens fast enough that the original connection attempt will then succeed as TCP retries to establish the connection. The idea for this was originally for a pcap-based connection watcher by Sebastien Jeanquier.

Several other changes and small fixes have been made as well. The fwknop test suite supports running all tests through the excellent valgrind project, and this enabled several memory handling issues to be found and corrected.

fwknop is released under the GPL version 2, and the complete fwknop-2.0rc5 ChangeLog can be found here via the fwknop gitweb interface.

WebKnock.org Single Packet Authorization Proxy

Vasilis Mavroudis has developed a web proxy called WebKnock.org that allows anyone to generate an fwknop SPA packet on their behalf with just a web browser. Although fwknop client portability has improved quite a bit in anticipation of the upcoming fwknop-2.0 release, it is a nice addition to the SPA world to not need the fwknop client installed at all. There are probably several platforms where the native client might not function but can run a web browser.

Using the webknock.org proxy requires that the user provide the SPA key over SSL to webknock.org, but this is a necessary step in exchange for not having to install the fwknop client. As of this writing, SPA via gpg keys is not yet supported, but there are plans to support this in the future. All requests to generate an SPA packet are protected by a captcha.

Behind the scenes, webknock.org executes the fwknop client on behalf of users, and Vasilis informed me that he's using the latest client code (written in C) instead of the older perl client. This is good since all recent development is done on the C version of fwknop in order to make it as small and lightweight as possible.

The service is free, and will hopefully be open-sourced at some point as well. If there are any issues, please either email me or open a ticket on the fwknop github interface. Here is a screenshot of the current webknock.org site: webknock.org SPA proxy

fwknop in the OpenWrt and Pentoo Linux Distributions

fwknop in OpenWrt The C version fwknop has now made it into the OpenWrt Linux distribution for embedded devices. Jonathan Bennett made this possible by contributing a Makefile for OpenWrt, and it was picked up the OpenWrt maintainers. It is good to see progress made towards the integration of Single Packet Authorization into operating systems that are designed to function as secure gateway devices between multiple networks.

So far, fwknop is available in the OpenWrt trunk packages feed, but will eventually become available via the opkg package manager too. Fortunately, OpenWrt makes everything available via git: $ git clone git://nbd.name/packages.git openwrt_packages.git
Initialized empty Git repository in /home/mbr/src/openwrt_packages.git/.git/
remote: Counting objects: 56118, done.
remote: Compressing objects: 100% (21342/21342), done.
remote: Total 56118 (delta 29694), reused 54875 (delta 29054)
Receiving objects: 100% (56118/56118), 11.85 MiB | 2.57 MiB/s, done.
Resolving deltas: 100% (29694/29694), done.
$ cd openwrt_packages.git
$ git ls-files |grep fwknop
net/fwknop/Makefile
$ git log net/fwknop/Makefile
commit 89475e5d6136833fa3b59c3d47c4f2be02718c7a
Author: florian <florian@3c298f89-4303-0410-b956-a3cf2f4a3e73>
Date: Wed Aug 17 10:13:20 2011 +0000

[package] add fwknopd

Signed-off-by; Jonathan Bennett <[email removed]>

git-svn-id: svn://svn.openwrt.org/openwrt/packages@28030 3c298f89-4303-0410-b956-a3cf2f4a3e73

In other news, both the perl and C versions of fwknop are also available in the Pentoo Linux distribution thanks to ozmart and the Pentoo maintainers. Pentoo is a live-cd distribution that is focused on security and derived from Gentoo. ozmart wrote a description of the use case for fwknop on Pentoo from a pentration testing perspective:

"...This is a useful script when combined with iptables and sshd. Configuration can accommodate pgp and replay attack checks. It allows the box to appear silent when running daemons if your box is deployed in say, a hostile environment.

It can also allow commands to be run without actually having to log into the box, say if you wanted to trigger something interesting from a remote location..."

Software Release - fwsnort-1.6

fwsnort-1.6 released The 1.6 release of fwsnort is available for download. This is a fairly significant release that adds support for the Snort fast_pattern keyword, makes enhancements to the --QUEUE and --NFQUEUE modes, supports the conntrack module for connection tracking, adds support for case-insensitive pattern matches using the --icase argument to the iptables string match extension, and several other things. The Snort fast_pattern keyword allows the rule author to influence the order in which Snort tries to match a pattern against network traffic. When multiple patterns are included in a rule, Snort usually tries to match the longest pattern first reasoning that the longest pattern is probably the least likely to trigger a match and therefore the remaining pattern searches would not have to be performed. But, there are times when the rule author would like to explicitly ask Snort to match on a particular pattern first, and the fast_pattern keyword is the mechanism that makes this possible. Because iptables matches are evaluated in order and a failing match short circuits a rule, fast_pattern support with the string match extension is possible through proper ordering in the iptables rule. Here is an example Snort rule from Emerging Threats with the fast_pattern keyword applied to the forth pattern: alert tcp $EXTERNAL_NET $HTTP_PORTS -> $HOME_NET any (msg:"ET WEB_CLIENT Possible Internet Explorer srcElement Memory Corruption Attempt"; flow:established,to_client; file_data; content:"document.createEventObject"; distance:0; nocase; content:".innerHTML"; within:100; nocase; content:"window.setInterval"; distance:0; nocase; content:"srcElement"; fast_pattern; nocase; distance:0; classtype:attempted-user; reference:url,www.microsoft.com/technet/security/bulletin/ms10-002.mspx; reference:url,tools.cisco.com/security/center/viewAlert.x?alertId=19726; reference:url,www.kb.cert.org/vuls/id/492515; reference:cve,2010-0249; reference:url,doc.emergingthreats.net/2010799; reference:url,www.emergingthreats.net/cgi-bin/cvsweb.cgi/sigs/CURRENT_EVENTS/CURRENT_MSIE; sid:2010799; rev:5;) fwsnort translates this rule as follows in iptables-save format from the /etc/fwsnort/fwsnort.save file - the original iptables commands in non-save format are also available in the /etc/fwsnort/fwsnort_iptcmd.sh script: -A FWSNORT_FORWARD_ESTAB -p tcp -m tcp --sport 80 -m string --string "srcElement" --algo bm --from 82 --icase -m string --string "document.createEventObject" --algo bm --from 64 --icase -m string --string ".innerHTML" --algo bm --to 190 --icase -m string --string "window.setInterval" --algo bm --from 74 --icase -m comment --comment "sid:2010799; msg:ET WEB_CLIENT Possible Internet Explorer srcElement Memory Corruption Attempt; classtype:attempted-user; reference:url,www.microsoft.com/technet/security/bulletin/ms10-002.mspx; rev:5; FWS:1.6;" -j LOG --log-ip-options --log-tcp-options --log-prefix "SID2010799 ESTAB " Note that the srcElement string is matched first in the iptables rule even though it is the last string in the original Snort rule. With fast_pattern support, fwsnort policies should be a bit faster at run time in the in the Linux kernel. On a final note, the iptables multiport match is also supported with the fwsnort-1.6 release, so Snort rules with lists of source or destination ports (like this: "alert tcp $HOME_NET [0:20,22:24,26:138,140:444,446:464,466:586,588:901,903:1432,1434:65535] -> any any") can be translated.

The complete fwsnort-1.6 ChangeLog can be found here via the fwsnort gitweb interface.

Switched from subversion to git

switched to git After using subversion for several years, I've switched to git for all cipherdyne.org projects. Subversion has certainly served its purpose, but it is hard to look at git and not feel a compelling draw. Further, with easy to set up web interfaces to git repositories such as gitweb and free hosting services such as github, providing a public git repository is trivial. Git itself can allow repositories to be cloned directly over HTTP without needing infrastructure like WebDAV, and here are links for the cipherdyne.org projects (github and gitweb links too):

The trac interface will remain active for a little while to see the legacy svn repositories, but the git repositories were all converted from these in order to preserve the history so trac is no longer important. If you are interested in the latest code changes in, say, fwsnort then just clone the repository and then you can make your own changes: $ git clone http://www.cipherdyne.org/git/fwsnort.git
Initialized empty Git repository in /home/mbr/tmp/git/fwsnort/.git/
$ cd fwsnort
$ git status
# On branch master
nothing to commit (working directory clean)
$ git show --summary
commit 00c4379a69975097948ed9e5ba356eeba69c0c93
Author: Michael Rash <mbr@cipherdyne.org>
Date: Mon Jun 20 21:00:57 2011 -0400

Added the --Conntrack-state argument

Added the --Conntrack-state argument to specify a conntrack state in place of
the "established" state that commonly accompanies the Snort "flow" keyword.
By default, fwsnort uses the conntrack state of "ESTABLISHED" for this. In
certain corner cases, it might be useful to use "ESTABLISHED,RELATED" instead
to apply application layer inspection to things like ICMP port unreachable
messages that are responses to real attempted communications. (Need to add
UDP tracking for the _ESTAB chains for this too - coming soon.)