cipherdyne.org

Michael Rash, Security Researcher



fwknop    [Summary View]

Next »

Android Fwknop2 Client and OpenWRT

Jonathan Bennett's new Android 'Fwknop2' client is ready for prime time. It is available now in the Google Play store as well as on F-Droid, and Jonathan put together a nice video demonstration of Fwknop2 being used to access SSHD on a router running OpenWRT. Also demonstrated is the usage of NAT to transparently access SSHD running on a system behind the router. This illustrates the ability fwknopd offers for creating inbound NAT rules for external SPA clients - something that is, to my knowledge, unique to the fwknop in the world of SPA software. Finally, in an innovative twist, the Fwknop2 client can read encryption and authentication keys from a QR code via the phone's camera. This simplifies the task of getting longer keys - particularly those that are base64-encoded - to the phone for SPA operations.

New Android Single Packet Authorization Client: Fwknop2

Fwknop2 on Android After a long while without updates to the fwknop clients on Android or iOS, I'm excited to announce that Jonathan Bennett has developed an entirely new client for Android called Fwknop2. This client adds significant features such as the ability to save configurations (including both encryption and authentication keys), proper handling of base64 encoded keys, and support for manually specified IP addresses to be allowed through the remote firewall. Further, Fwknop2 integrates with JuiceSSH so that an SSH connection can seamlessly be launched right after the SPA packet is sent. Finally, in an interesting twist, Fwknop2 will be able to read encryption and HMAC keys via a QR code via the camera. The QR code itself is generated from key material in --key-gen mode (which is currently available in the standard fwknop client and will be available in the fwknopd server in the next release).

Fwknop2 will be available on both the Google Play store and via F-Droid within the next few hours, and in the meantime the APK is available on github here.

Below are a couple of screenshots of the new Android client in action - these are from an Android VM running under Parallels on Mac OS X (Yosemite): fwknop Android app fwknop Android app

NAT and Single Packet Authorization

People who use Single Packet Authorization (SPA) or its security-challenged cousin Port Knocking (PK) usually access SSHD running on the same system where the SPA/PK software is deployed. That is, a firewall running on a host has a default-drop policy against all incoming SSH connections so that SSHD cannot be scanned, but a SPA daemon reconfigures the firewall to temporarily grant access to a passively authenticated SPA client: SPA and basic SSH access This works well enough, but both port knocking and SPA work in conjunction with a firewall, and "important" firewalls are usually gateways between networks as opposed to just being deployed on standalone hosts. Further, Network Address Translation (NAT) is commonly used on such firewalls (at least for IPv4 communications) to provide Internet access to internal networks that are on RFC 1918 address space, and also to allow external hosts access to services hosted on internal systems.

The prevalence of NAT suggests that SPA should integrate strongly with it. Properly done, this would extend the notion of SPA beyond just supporting the basic feature of granting access to a local service. To drive this point home, and to see what a NAT-enabled vision for SPA would look like, consider the following two scenarios:

  1. A user out on the Internet wants to leverage SPA to access an SSH daemon that is running on a system behind a firewall. One option is to just use SPA to access SSHD on the firewall first, and then initiate a new SSH connection to the desired internal host from the firewall itself. However, this is clunky and unnecessary. The SPA system should just integrate with the NAT features of the firewall to translate a SPA-authenticated incoming SSH connection through to the internal host and bypass the firewall SSH daemon altogether: SPA and DNAT SSH access to internal host

  2. A local user population is behind a firewall that is configured to block all access by default from the internal network out to the Internet. Any user can acquire an IP on the local network via DHCP, but gaining access to the Internet is only possible after authenticating to the SPA daemon running on the firewall. So, instead of gaining access to a specific service on a single IP via SPA, the local users leverage SPA to gain access to every service on every external IP. In effect, the firewall in this configuration does not trust the local user population, and Internet access is only granted for users that can authenticate via SPA. I.e., only those users who have valid SPA encryption and HMAC keys can access the Internet: SPA and SNAT to external hosts
(A quick note on the network diagrams above - each red line represents a connection that is only possible to establish after a proper SPA packet is sent.)

Both of the above scenarios are supported with fwknop-2.6.6, which has been released today. So far, to my knowledge, fwknop is the only SPA/PK implementation with any built-in NAT support. The first scenario of gaining access to an internal service through the firewall has been supported by fwknop for a long time, but the second is new for 2.6.6. The idea for using SPA to gain access to the Internet instead of just for a specific service was proposed by "spartan1833" to the fwknop mailing list, and is a powerful extension of SPA into the world of NAT - in this case, SNAT in iptables parlance.

Before diving into the technical specifics, below is a video demonstration of the NAT capabilities of fwknop being used within Amazon's AWS cloud. This shows fwknop using NAT to turn a VPC host into a new gateway within Amazon's own border controls for the purposes of accessing SSH and RDP running on other internal VPC hosts. So, this illustrates the first scenario above. In addition, the video shows the usage of SPA ghost services to NAT both SSH and RDP connections through port 80 instead of their respective default ports. This shows yet another twist that strong NAT integration makes possible in the SPA world.



Now, let's see a practical example. This is for the second scenario above where a system with the fwknop client installed is on a network behind a default-drop firewall running the fwknop daemon, and the new SNAT capabilities are used to grant access to the Internet.

First, we fire up fwknopd on the firewall after showing the access.conf and fwknopd.conf files (note that some lines have been abbreviated for space):
[firewall]# cat /etc/fwknop/access.conf
SOURCE                      ANY
KEY_BASE64                  wzNP62oPPgEc+k...XDPQLHPuNbYUTPP+QrErNDmg=
HMAC_KEY_BASE64             d6F/uWTZmjqYor...eT7K0G5B2W9CDn6pAqqg6Oek=
FW_ACCESS_TIMEOUT           1000
FORCE_SNAT                  123.1.2.3
DISABLE_DNAT                Y
FORWARD_ALL                 Y

[firewall]# cat /etc/fwknop/fwknopd.conf
ENABLE_IPT_FORWARDING       Y;
ENABLE_IPT_SNAT             Y;

[firewall]# fwknopd -i eth0 -f
Starting fwknopd
Added jump rule from chain: FORWARD to chain: FWKNOP_FORWARD
Added jump rule from chain: POSTROUTING to chain: FWKNOP_POSTROUTING
iptables 'comment' match is available
Sniffing interface: eth3
PCAP filter is: 'udp port 62201'
Starting fwknopd main event loop.
With the fwknopd daemon up and running on the firewall/SPA gateway system, we now run the client to gain access to the Internet after showing the "[internet]" stanza in the ~/.fwknoprc file:
[spaclient]$ cat ~/.fwknoprc
[internet]
KEY_BASE64                  wzNP62oPPgEc+k...XDPQLHPuNbYUTPP+QrErNDmg=
HMAC_KEY_BASE64             d6F/uWTZmjqYor...eT7K0G5B2W9CDn6pAqqg6Oek=
ACCESS                      tcp/22  ### ignored by FORWARD_ALL
SPA_SERVER                  192.168.10.1
USE_HMAC                    Y
ALLOW_IP                    source

[spaclient]$ nc -v google.com 80
### nothing comes back here because the SPA packet hasn't been sent yet
### and therefore everything is blocked (except for DNS in this case)
[spaclient]$ fwknop -n internet
[spaclient]$ nc -v google.com 80
Connection to google.com 80 port [tcp/http] succeeded!
Back on the server, below are the rules that are added to grant Internet access to the spaclient system. Note that all ports/protocols are accepted for forwarding through the firewall, and an SNAT rule has been applied to the spaclient source IP of 192.168.10.55:
(stanza #1) SPA Packet from IP: 192.168.10.55 received with access source match
Added FORWARD ALL rule to FWKNOP_FORWARD for 192.168.10.55 -> 0.0.0.0/0 */*, expires at 1429387561
Added SNAT ALL rule to FWKNOP_POSTROUTING for 192.168.10.55 -> 0.0.0.0/0 */*, expires at 1429387561
Conclusion
Most users think of port knocking and Single Packet Authorization as a means to passively gain access to a service like SSHD running on the same system as the PK/SPA software itself. This notion can be greatly extended through strong integration with NAT features in a firewall. If the SPA daemon integrates with SNAT and DNAT (in the iptables world), then both internal services can be accessed from outside the firewall, and Internet access can be gated by SPA too. The latest release of fwknop supports both of these scenarios, and please email me or send me a message on Twitter @michaelrash if you have any questions or comments.

Single Packet Authorization Threat Modeling

fwknop Threat Modeling Last week there was an interesting question posed by Peter Smith to the fwknop mailing list that focused on running fwknop in UDP listener mode vs. using libpcap. I thought it would be useful to turn this into a blog post, so here is Peter's original question along with my response:

"...I want to deploy fwknop on my server, but I'm not sure If I should use the UDP listener mode or libpcap. At first UDP listener mode seems to be the choice, because I don't have to compile libpcap. However, I then have to open a port in the firewall. Thinking about this, I get the feeling that I'm defeating the purpose of using SPA, by allowing Internet access to a privileged processe.

If an exploitable security issue is found, even though fwknop remains passive and undiscoverable, an attacker could blindly send his exploit to random ports on servers he suspects running fwknopd, and after maximum 65535 tries he would have root access. I'm not a programmer, so I can't review the code of fwknop or SSH daemon, but if both is equally likely of having security issues, I might as well just allow direct access to the SSH daemon and skip using SPA.

Is my point correct?..."


First, let me acknowledge your point as an important issue to bring up. If someone finds a vulnerability in fwknopd, it doesn't matter whether fwknopd is collecting packets via UDP sockets or from libpcap (assuming we're talking about a vulnerability in fwknopd itself). The mere processing of network traffic is the problem.

So, why run fwknopd at all?

This is something of a long-winded answer, but I want to be thorough. I'll likely not settle the issue with this response, but it's good to start the discussion.

Starting with your example above with SSHD open to the world, suppose there is a vulnerability in SSHD. What does the exploit model look like? Well, an attacker armed with an exploit can trivially find the SSH daemon with any scanner, and from there a single TCP connection is sufficient for exploitation. I would argue that a primary enabling factor benefiting the attacker in this scenario is that vulnerable targets that listen on TCP sockets are so easy to find. The combination of scannable services + zero-day exploits is just too nice for an attacker. Several years ago I personally had a system running SSHD compromised because of this. (At the time, it was my fault for not patching SSHD before I put it out on the Internet, but still - the system was compromised within about 12 hours of being online.)

Now, in the fwknopd world, assuming a vulnerability, what would exploitation look like? Well, targets aren't scannable as you say, so an attacker would have to guess that a target is running fwknopd and the corresponding port on which it watches/listens for SPA packets [1]. Forcing an attacker to send thousands of packets to different ports (assuming a custom non-default port instead of UDP port 62201 that fwknopd normally watches) is likely a security benefit in contrast to an obviously targetable service. That is, sending tons of SPA exploit packets to different ports is a common pattern that is quite noisy and is frequently flagged by IDS's, firewalls, SIEM's, and flow analysis engines. How many systems could an attacker target with such heavyweight scans before the attacker's own ISP would notice? Or before the attacker's IP is included within one of the Emerging Threats compromised hosts lists? Or within DShield as a known bad actor? 10 million? How many of these are actually running fwknopd? I know there are some spectacular scanning results out there, so it's really hard to quantify this, but either way there is a big difference between sending thousands of > 100-byte UDP packets to each target vs. a port sweep for TCP/22.

Further, when a system is literally blocking every incoming packet [2], an attacker can't even necessarily be sure there is any target connected to the network at all. Many attackers would likely move on fairly quickly from a system that is simply returning zero data to some other system.

In contrast, for a service that advertises itself via TCP, an attacker immediately knows - with 100% certainty - there is a userspace daemon with a set of functions that they can interact with. This presents a risk. In my view, this risk is greater than the risk of running fwknopd where an attacker gets no data.

Another fair question to ask is about the architecture of fwknopd. When an HMAC is used (this will be the default in a future release), fwknopd reads SPA packet data, computes an HMAC, and does nothing else if the HMAC does not match. This is an effort to try to stay faithful to a simplistic design, and to minimize the functions an attacker can interact with - even for packets that are blindly sent to the correct port where fwknopd is watching. Thus, most functions in fwknopd, including those that parse user-supplied data and hence where bugs are more likely to crop up, are not even accessible without first passing the HMAC which is a cryptographically strong test. When libpcap is also eliminated through the use of UDP, not even libpcap functions are in the mix [3]. In other words, the security of fwknop does not rely on not being discoverable or scannable - it is merely a nice side effect of not using TCP to gather data from the network.

To me, another way to think of fwknopd in UDP mode is that it provides a lightweight generic UDP authenticator for TCP-based services. Suppose that SSHD had a design built-in where it would bind to a UDP socket, wait for a similar authentication packet as SPA provides, and then switch over to TCP. In this case, there would not be a need for SPA because you would get the best of both worlds - non-scannability [4] plus everything that TCP provides for reliable data delivery at the same time. SPA in UDP listening mode is an effort to bridge these worlds. Further, there is nothing that ties fwknopd to SSH. I know people who expose RDP to the Internet for example. Personally, I'm quite confident there are fewer vulnerabilities in fwknopd than something like RDP. Even if there isn't a benefit in terms of the arguments above in terms of service concealment and forcing an attacker to be noisy, my bet is that fwknopd - even if it advertised itself via TCP - would provide better security than RDP or other services derived from massive code bases.

Now, beyond all of the above, there are some additional blog posts and other material that may interest some readers:


[1] If an attacker is in the position to watch legitimate SPA packets from an existing client then this guesswork can be largely eliminated. However, even in this case, if fwknopd is sniffing via libpcap (so not using the UDP only mode), there is no requirement for fwknopd to be running on the system where access is actually granted. It only needs to be able to sniff packets somewhere on the routing path of the destination IP chosen by the client. I mention this to illustrate that it may not be obvious to an attacker where a targetable fwknopd instance runs even when SPA packets can be monitored. Also, there aren't too many attackers who are in this position. At least, the number of attackers in this position is _far_ lower than those people (everyone) who are in a position to discover a vulnerable service from any system worldwide with a single TCP connection.

[2] As you point out, in UDP-only mode, the firewall must allow incoming packets to the UDP port where fwknopd is listening. But, given that fwknopd never sends anything back over this port, it remains indistinguishable to every other filtered port.

[3] There are more things built into the development process that may be worth noting too that heighten security such as the use of Coverity, AFL fuzzing, valgrind, etc., but these probably takes us too far afield from the topic at hand. Also, there are some roadmap items that have not been implemented yet (such as privilege separation) that will make the architecture even stronger.

[4] Assuming a strong firewall stance against incoming UDP packets to other ports so one cannot infer service existence because UDP/22 doesn't respond to a scan but other ports respond with an ICMP port unreachable, etc.

RAM Disks and Saving Your SSD From AFL Fuzzing

The American Fuzzy Lop fuzzer has become a critical tool for finding security vulnerabilities in all sorts of software. It has the ability to send fuzzing data through programs on the order of hundreds of millions of executions per day on a capable system, and can certainly put strain on your hardware and OS. If you are fuzzing a target program with the AFL mode where a file is written and the target binary reads from this file, then AFL is going to conduct a huge number of writes to the local disk. For a solid-state drive this can reduce its life expectancy because of write amplification.

My main workstation is a Mac running OS X Yosemite, and I run a lot of Linux, FreeBSD, and OpenBSD virtual machines under Parallels for development purposes. The drive on this system is an SSD which keeps everything quite fast, but I don't want to prematurely shorten its life through huge AFL fuzzing runs. Normally, I'm using AFL to fuzz fwknop from an Ubuntu-14.10 VM, and what is needed is a way to keep disk writes down. The solution is to use a RAM disk from within the VM.

First, from the Ubuntu VM, let's get set up for AFL fuzzing and show what the disk writes look like without using a RAM disk from the perspective of the host OS X system. This assumes AFL 0.89 has been installed already and is in the current path:
$ git clone https://github.com/mrash/fwknop.git fwknop.git
$ cd fwknop.git
$ ./autogen.sh
$ cd test/afl/
$ ./compile/afl-compile.sh
We're not running AFL yet. Now, from the Mac, launch the Activity Monitor (under Applications > Utilities) and look at current disk utilization: AFL not running disk writes So, not terrible - currently 31 writes per second at the time the snapshot was taken, and that includes OS X itself and the VM at the same time. But, now let's fire up AFL using the digest cache wrapper on the Ubuntu VM (the main AFL text UI is not shown for brevity):
$ ./fuzzing-wrappers/server-digest-cache.sh
[+] All right - fork server is up.
[+] All test cases processed.

[+] Here are some useful stats:

    Test case count : 1 favored, 0 variable, 1 total
           Bitmap range : 727 to 727 bits (average: 727.00 bits)
                   Exec timing : 477 to 477 us (average: 477 us)

[+] All set and ready to roll!
And now let's take a look at disk writes again from OS X: AFL no RAM disk writes Whoa, that's a massive difference - nearly two orders of magnitude. AFL has caused disk writes to spike to over 2,700 per second with total data written averaging at 19.5MB/sec. Long term fuzzing at this level of disk writes would clearly present a problem for the SSD - AFL frequently needs to be left running for days on end in order to be thorough. So, let's switch everything over to use a RAM disk on the Ubuntu VM instead and see if that reduces disk writes:
# mkdir /tmp/afl-ramdisk && chmod 777 /tmp/afl-ramdisk
# mount -t tmpfs -o size=512M tmpfs /tmp/afl-ramdisk
$ mv fwknop.git /tmp/afl-ramdisk
$ cd /tmp/afl-ramdisk/fwknop.git/test/afl/
$ ./fuzzing-wrappers/server-digest-cache.sh
Here is disk utilization once again from the Mac: AFL RAM disk writes We're back to less than 10 writes per second to the SSD even though AFL is going strong on the Ubuntu VM (not shown). The writes for the previous fuzzing run are still shown to the left of the graph (since they haven't quite aged out yet when the screenshot was taken), and new writes are so low they don't even make it above the X-axis at this scale. Although the total execs per second - about 2,000 - achieved by AFL is not appreciably faster under the RAM disk, the main benefit is that my SSD will last a lot longer. For those that don't run AFL underneath a VM, a similar strategy should still apply on the main OS. Assuming enough RAM is available for whatever software you want to fuzz, just create a RAM disk and run everything from it and extend the life of your hard drive in the process.

Integrating fwknop with the 'American Fuzzy Lop' Fuzzer

Over the past few months, the American Fuzzy Lop (AFL) fuzzer written by Michal Zalewski has become a tour de force in the security field. Many interesting bugs have been found, including a late breaking bug in the venerable cpio utility that Michal announced to the full-disclosure list. It is clear that AFL is succeeding perhaps where other fuzzers have either failed or not been applied, and this is an important development for anyone trying to maintain a secure code base. That is, the ease of use coupled with powerful fuzzing strategies offered by AFL mean that open source projects should integrate it directly into their testing systems. This has been done for the fwknop project with some basic scripting and one patch to the fwknop code base. The patch was necessary because according to AFL documentation, projects that leverage things like encryption and cryptographic signatures are not well suited to brute force fuzzing coverage, and fwknop definitely fits into this category. Crypto being a fuzzing barrier is not unique to AFL - other fuzzers have the same problem. So, similarly to the libpng-nocrc.patch included in the AFL sources, encryption, digests, and base64 encoding are bypassed when fwknop is compiled with --enable-afl-fuzzing on the configure command line. One doesn't need to apply the patch manually - it is built directly into fwknop as of the 2.6.4 release. This is in support of a major goal for the fwknop project which is comprehensive testing.

If you maintain an open source project that involves crypto and does any sort of encoding/decoding/parsing gated by crypto operations, then you should patch your project so that it natively supports fuzzing with AFL through an optional compile time step. As an example, here is a portion of the patch to fwknop that disables base64 encoding by just copying data manually:
diff --git a/lib/base64.c b/lib/base64.c
index b8423c2..5b51121 100644
--- a/lib/base64.c
+++ b/lib/base64.c
@@ -36,6 +36,7 @@
 int
 b64_decode(const char *in, unsigned char *out)
 {
-    int i, v;
+    int i;
     unsigned char *dst = out;
-
+#if ! AFL_FUZZING
+    int v;
+#endif
+
+#if AFL_FUZZING
+    /* short circuit base64 decoding in AFL fuzzing mode - just copy
+     * data as-is.
+    */
+    for (i = 0; in[i]; i++)
+        *dst++ = in[i];
+#else
     v = 0;
     for (i = 0; in[i] && in[i] != '='; i++) {
         unsigned int index= in[i]-43;
@@ -68,6 +80,7 @@ b64_decode(const char *in, unsigned char *out)
         if (i & 3)
             *dst++ = v >> (6 - 2 * (i & 3));
     }
+#endif

     *dst = '\0';
Fortunately, so far all fuzzing runs with AFL have turned up zero issues (crashes or hangs) with fwknop, but the testing continues.

Within fwknop, a series of wrapper scripts are used to fuzz the following four areas with AFL. These areas represent the major places within fwknop where data is consumed either via a configuration file or from over the network in the form of an SPA packet:

  1. SPA packet encoding/decoding: spa-pkts.sh
  2. Server access.conf parsing: server-access.sh
  3. Server fwknopd.conf parsing: server-conf.sh
  4. Client fwknoprc file parsing: client-rc.sh
Each wrapper script makes sure fwknop is compiled with AFL support, executes afl-fuzz with the corresponding test cases necessary for good coverage, archives previous fuzzing output data, and supports resuming of previously stopped fuzzing jobs. Here is an example where the SPA packet encoding/decoding routines are fuzzed with AFL after fwknop is compiled with AFL support:
$ cd fwknop.git/test/afl/
$ ./compile/afl-compile.sh
$ ./fuzzing-wrappers/spa-pkts.sh
+ ./fuzzing-wrappers/helpers/fwknopd-stdin-test.sh
+ SPA_PKT=1716411011200157:root:1397329899:2.0.1:1:127.0.0.2,tcp/22:AAAAA
+ LD_LIBRARY_PATH=../../lib/.libs ../../server/.libs/fwknopd -c ../conf/default_fwknopd.conf -a ../conf/default_access.conf -A -f -t
Warning: REQUIRE_SOURCE_ADDRESS not enabled for access stanza source: 'ANY'+ echo -n 1716411011200157:root:1397329899:2.0.1:1:127.0.0.2,tcp/22:AAAAA

SPA Field Values:
=================
   Random Value: 1716411011200157
       Username: root
      Timestamp: 1397329899
    FKO Version: 2.0.1
   Message Type: 1 (Access msg)
 Message String: 127.0.0.2,tcp/22
     Nat Access: <NULL>
    Server Auth: <NULL>
 Client Timeout: 0
    Digest Type: 3 (SHA256)
      HMAC Type: 0 (None)
Encryption Type: 1 (Rijndael)
Encryption Mode: 2 (CBC)
   Encoded Data: 1716411011200157:root:1397329899:2.0.1:1:127.0.0.2,tcp/22
SPA Data Digest: AAAAA
           HMAC: <NULL>
 Final SPA Data: 200157:root:1397329899:2.0.1:1:127.0.0.2,tcp/22:AAAAA

SPA packet decode: Success
+ exit 0
+ LD_LIBRARY_PATH=../../lib/.libs afl-fuzz -T fwknopd,SPA,encode/decode,00ffe19 -t 1000 -i test-cases/spa-pkts -o fuzzing-output/spa-pkts.out ../../server/.libs/fwknopd -c ../conf/default_fwknopd.conf -a ../conf/default_access.conf -A -f -t
afl-fuzz 0.66b (Nov 23 2014 22:29:48) by <lcamtuf@google.com>
[+] You have 1 CPU cores and 2 runnable tasks (utilization: 200%).
[*] Checking core_pattern...
[*] Setting up output directories...
[+] Output directory exists but deemed OK to reuse.
[*] Deleting old session data...
[+] Output dir cleanup successful.
[*] Scanning 'test-cases/spa-pkts'...
[*] Creating hard links for all input files...
[*] Validating target binary...
[*] Attempting dry run with 'id:000000,orig:spa.start'...
[*] Spinning up the fork server...
[+] All right - fork server is up.
And now AFL is up and running (note the git --abbrev-commit tag integrated into the text banner to make clear which code is being fuzzed): AFL SPA packets fuzzing If the fuzzing run is stopped by hitting Ctrl-C, it can always be resumed as follows:
$ ./fuzzing-wrappers/spa-pkts.sh resume
Although major areas in fwknop where data is consumed are effectively fuzzed with AFL, there is always room for improvement. With the wrapper scripts in place, it is easy to add new support for fuzzing other functionality in fwknop. For example, the digest cache file (used by fwknopd to track SHA-256 digests of previously seen SPA packets) is a good candidate.

UPDATE (11/30/2014): The suggestion above about fuzzing the digest cache file proved to be fruitful, and AFL discovered a bounds checking bug that has been fixed in this commit. The next release of fwknop (2.6.5) will contain this fix and will be made soon.

Taking things to the next level, another powerful technique that would make an interesting side project would be to build a syntax-aware version of AFL that handles SPA packets and/or configuration files natively. The AFL documentation hints at this type of modification, and states that this could be done by altering the fuzz_one() routine in afl-fuzz.c. There is already a fuzzer written in python for the fwknop project that is syntax-aware of the SPA data format (see: spa_fuzzing.py), but the mutators within AFL are undoubtedly much better than in spa_fuzzing.py. Hence, modifying AFL directly would be an effective strategy.

Please feel free to open an issue against fwknop in github if you have a suggestion for enhanced integration with AFL. Most importantly, if you find a bug please let me know. Happy fuzzing!

Software Release: fwknop-2.6.4

fwknop-2.6.4 software release The 2.6.4 release of fwknop is available for download. New functionality has been developed for 2.6.4, including a new UDP listener mode to remove libpcap as a dependency for fwknopd, support for firewalld on recent versions of Fedora, RHEL, and Centos (contributed by Gerry Reno), and support for Michal Zalewski's 'American Fuzzy Lop' fuzzer. Further, on systems where execvpe() is available, all system() and popen() calls have been replaced so that the shell is not invoked and no environment is used. As usual, fwknop has a Coverity Scan score of zero, and the code coverage report achieved by the 2.6.4 test suite is available here.

Here is the complete ChangeLog for fwknop-2.6.4:

  • [server] Added a UDP server mode so that SPA packets can be acquired via UDP directly without having to use libpcap. This is an optional feature since it opens a UDP port (and therefore requires the local firewall be opened for communications to this port), but fwknopd is careful to never send anything back to a client that sends data to this port. So, from the perspective of an attacker or scanner, fwknopd remains invisible. This feature is enabled in fwknopd either with a new command line argument --udp-server or in the fwknopd.conf file with the ENABLE_UDP_SERVER variable. When deployed in this mode, it is advisable to recompile fwknop beforehand with './configure --enable-udp-server' so that fwknopd does not link against libpcap.
  • [server] Replaced all popen() and system() calls with execvpe() with no usage of the environment. This is a defensive measure to not make use of the shell for firewall command execution, and is supported on systems where execvpe() is available.
  • (Gerry Reno) Added support for firewalld to the fwknopd daemon on RHEL 7 and CentOS 7. This is implemented using the current firewalld '--direct --passthrough' capability which accepts raw iptables commands. More information on firewalld can be found here: https://fedoraproject.org/wiki/FirewallD
  • [server] Added support for the 'American Fuzzy Lop' (AFL) fuzzer from Michal Zalewski. This requires that fwknop is compiled with the '--enable-afl-fuzzing' argument to the configure script as this allows encryption/digest short circuiting in a manner necessary for AFL to function properly. The benefit of this strategy is that AFL can fuzz the SPA packet decoding routines implemented by libfko. See the test/afl/ directory for some automation around AFL fuzzing.
  • (Bill Stubbs) submitted a patch to fix a bug where fwknopd could not handle Ethernet frames that include the Frame Check Sequence (FCS) header. This header is four bytes long, and is placed at the end of each Ethernet frame. Normally the FCS header is not visible to libpcap, but some card/driver combinations result in it being included. Bill noticed this on the following platform: BeagleBone Black rev C running 3.8.13-bone50 #1 SMP Tue May 13 13:24:52 UTC 2014 armv7l GNU/Linux
  • [client] Bug fix to ensure that a User-Agent string can be specified when the fwknop client uses wget via SSL to resolve the external IP address. This closes issue #134 on github reported by Barry Allard. The fwknop client now uses the wget '-U' option to specify the User-Agent string with a default of "Fwknop/<version>". In addition, a new command line argument "--use-wget-user-agent" to allow the default wget User-Agent string to apply instead.
  • [python module] When an HMAC key is passed to spa_data_final() then default to HMAC SHA256 if no HMAC mode was specified.

Code Coverage: Challenges For Open Source Projects

In this blog post I'll pose a few challenges to open source projects. These challenges, among other goals, are designed to push automated test suites to maximize code coverage. For motivation, I'll present code coverage stats from the OpenSSL and OpenSSH test suites, and contrast this with what I'm trying to achieve in the fwknop project.

Stated bluntly, if comprehensive code coverage is not achieved by your test suite, you're doing it wrong.

With major bugs like Heartbleed and "goto fail", there is clearly a renewed need for better automated testing. As Mike Bland demonstrated, the problem isn't so much about technology since unit tests can be built for both Heartbleed and "goto fail". Rather, the problem is that the whole software engineering life cycle needs to embrace better testing.

1) Publish your code coverage stats

Just as open source projects publish source code, test suite code coverage results should be published too. The mere act of publishing code coverage - just as open source itself - is a way to engage developers and users alike to improve such results. Clearly displayed should be all branches, lines, and functions that are exercised by the test suite, and these results should be published for every software release. Without such stats, how can users have confidence that the project test suite is comprehensive? Bugs still crop up even with good code coverage, but many many bugs will be squashed during the development of a test suite that strives to exercise every line of source code.

For a project written in C and compiled with gcc, the lcov tool provides a nice front-end for gcov coverage data. lcov was used to generate the following code coverage reports on OpenSSL-1.0.1h, OpenSSH-6.6p1, and fwknop-2.6.3 using the respective test suite in each project:

Hit Total Coverage
OpenSSL-1.0.1h Lines: 44604 96783 46.1 %
Full code coverage report Functions: 3518 6574 53.5 %
Branches: 22653 67181 33.7 %

OpenSSH-6.6p1 Lines: 18241 33466 54.5 %
Full code coverage report Functions: 1217 1752 69.5 %
Branches: 9362 22674 41.3 %

fwknop-2.6.3 Lines: 6971 7748 90.0 %
Full code coverage report Functions: 376 377 99.7 %
Branches: 4176 5239 79.7 %


The numbers speak for themselves. To be fair, fwknop has a much smaller code base than OpenSSL or OpenSSH - less than 1/10th the size of OpenSSL and about 1/5th the size of OpenSSH in terms of lines reported by gcov. So, presumaby it's a lot easier to reach higher levels of code coverage in fwknop than the other two projects. Nevertheless, it is shocking that the line coverage in the OpenSSL test suite is below 50%, and not much better in OpenSSH. What are the odds that the other half of the code is bug free? What are the odds that changes in newer versions won't break assumptions made in the untested code? What are the odds that one of the next security vulnerabilities announced in either project stems from this code? Of course, test suites are not a panacea, but there are almost certainly bugs lying in wait within the untested code. It is easier to have confidence in code that has at least some test suite coverage than code that has zero coverage - at least as far as the test suite is concerned (I'm not saying that other tools are not being used). Both OpenSSL and OpenSSH use tools beyond their respective test suites to try and maintain code quality - OpenSSL uses Coverity for example - but these tools are not integrated with test suite results and do not contribute to the code coverage stats above. What I'm suggesting is that the test suites themselves should get much closer to 100% coverage, and this may require the integration of infrastructure like a custom fuzzer or fault injection library (more on this below).

On another note, given that an explicit design goal of the fwknop test suite is to maximize code coverage, and given the results above, it is clear there is significant work left to do.

2) Make it easy to automatically produce code coverage results

Neither OpenSSL nor OpenSSH make it easy to automatically generate code coverage stats like those shown above. One can always Google for the appropriate CFLAGS and LDFLAGS settings, recompile, and run lcov, but you shouldn't have to. This should be automatic and built into the test suite as an option. If your project is using autoconf, then there should be a top level --enable-code-coverage switch (or similar) to the configure script, and the test suite should take the next steps to produce the code coverage reports. Without this, there is unnecessary complexity and manual work, and this affects users and developers alike. My guess is this lack of automation is a factor for why code coverage for OpenSSL and OpenSSH is not better. Of course, it takes a lot of effort to develop a test suite with comprehensive code coverage support, but automation is low hanging fruit.

If you want to generate the code coverage reports above, here are two trivial scripts - one for OpenSSH and another for OpenSSL. This one works for OpenSSH:
#!/bin/sh
#
# Basic script to compile OpenSSH with code coverage support via gcc
# gcov and build HTML reports with lcov.
#

LCOV_DIR=lcov-results
LCOV_FILE=coverage.info
LCOV_FILTERED=coverage_final.info
PREFIX=~/install/openssh  ### tmp path

mkdir $LCOV_DIR

make clean
./configure --with-cflags="-fprofile-arcs -ftest-coverage" --with-ldflags="-fprofile-arcs -lgcov" --prefix=$PREFIX
make
make tests

### build coverage info and filter /usr/include/ files
lcov --capture --directory . --output-file $LCOV_FILE
lcov -r $LCOV_FILE /usr/include/\* --output-file $LCOV_DIR/$LCOV_FILTERED

### create the HTML report
genhtml $LCOV_DIR/$LCOV_FILTERED --output-directory $LCOV_DIR

exit

3) Integrate a fuzzer into your test suite

If your test suite does not achieve 99%/95% function/line coverage, architectural changes should be made to reach these goals and beyond. This will likely require that test suite drive a fuzzer against your project and measure how well it exercises the code base.

Looking at code coverage results for older versions of the fwknop project was an eye opener. Although the test suite had hundreds of tests, there were large sections of code that were not exercised. It was for this reason the 2.6.3 release concentrated on more comprehensive automated test coverage. However, achieving better coverage was not a simple matter of executing fwknop components with different configuration files or command line arguments - it required the development of a dedicated SPA packet fuzzer along with a special macro -DENABLE_FUZZING built into the source code to allow the fuzzer to reach portions of code that would have otherwise been more difficult to trigger due to encryption and authentication requirements. This is a similar to the strategy proposed in Michal Zalewski's fuzzer American Fuzzy Lop here (see the "Known Limitations" section).

The main point is that fwknop was changed to support fuzzing driven by the test suite as a way to extend code coverage. It is the strong integration of fuzzing into the test suite that provides a powerful testing technique, and looking at code coverage results allows you to measure it.

Incidentally, an example double free() bug that the fwknop packet fuzzer triggered in conjunction with the test suite can be found here (fixed in 2.6.3).

4) Further extend your code coverage with fault injection

Any C project leveraging libc functions should implement error checking against function return values. The canonical example is checking to see whether malloc() returned NULL, and if so this is usually treated as an unrecoverable error like so:
char *buf = NULL;
int size = 100;

if ((buf = malloc(size)) == NULL) {
    clean_up();
    exit(EXIT_FAILURE);
}
Some projects elect to write a "safe_malloc()" wrapper for malloc() or other libc functions so that error handling can be done in one place, but it is not feasible to do this for every libc function. So, how to verify whether error conditions are properly handled at run time? For malloc(), NULL is typically returned under extremely high memory pressure, so it is hard to trigger this condition and still have a functioning system let alone a functioning test suite. In other words, in the example above, how can the test suite achieve code coverage for the clean_up() function? Other examples include filesystem or network function errors that are returned when disks fill up, or a network communication is blocked, etc.

What's needed is a mechanism for triggering libc faults artificially, without requiring the underlying conditions to actually exist that would normally cause such faults. This is where a fault injection library like libfiu comes in. Not only does it support fault injection at run time against libc functions without the need to link against libfiu (a dedicated binary "fiu-run" takes care of this), but it can also be used to trigger faults in arbitrary non-libc functions within a project to see how function callers handle errors. In fwknop, both strategies are used by the test suite, and this turned up a number of bugs like this one.

Full disclosure: libfiu does not yet support code coverage when executing a binary under fiu-run because there are problems interacting with libc functions necessary to write out the various source_file.c.gcno and source_file.c.gcda coverage files. This issue is being worked on for an upcoming release of libfiu. So, in the context of fwknop, libfiu is used to trigger faults directly in fwknop functions to see how calling functions handle errors, and this strategy is compatible with gcov coverage results. The fiu-run tool is also used, but more from the perspective of trying to crash one of the fwknop binaries since we can't (yet) see code coverage results under fiu-run. Here is an example fault introduced into the fko_get_username() function:
/* Return the current username for this fko context.
*/
int 
fko_get_username(fko_ctx_t ctx, char **username)
{   
#if HAVE_LIBFIU
    fiu_return_on("fko_get_username_init", FKO_ERROR_CTX_NOT_INITIALIZED);
#endif
With the fault set (there is a special command line argument --fault-injection-tag on the fwknopd server command line to enable the fault), the error handling code seen at the end of the example below is executed via the test suite. For proof of error handling execution, see the full coverage report (look at line 240).
/* Popluate a spa_data struct from an initialized (and populated) FKO context.
*/
static int
get_spa_data_fields(fko_ctx_t ctx, spa_data_t *spdat)
{   
    int res = FKO_SUCCESS;

    res = fko_get_username(ctx, &(spdat->username));
    if(res != FKO_SUCCESS)
            return(res);

Once again, it is the integration of fault injection with the test suite and corresponding code coverage reports that extends testing efforts in a powerful way. libfiu offers many nice features, including thread safey, the ability to enable a fault injection tag relative to other functions in the stack, and more.

5) Negative valgrind findings should force tests to fail

So far, a theme in this blog post has been better code coverage through integration. I've attempted to make the case for the integration of fuzzing and fault injection with project test suites, and code coverage stats should be produced for both styles of testing.

A third integration effort is to leverage valgrind. That is, the test suite should run tests underneath valgrind when possible (speed and memory usage may be constraints here depending on the project). If valgrind discovers a memory leak, double free(), or other problem, this finding should automatically cause corresponding tests to fail. In some cases valgrind suppressions will need to be created if a project depends on libraries or other code that is known to have issues under valgrind, but findings within project sources should cause tests to fail. For projects heavy on the crypto side, there are some instances where code is very deliberately built in a manner that triggers a valgrind error (see Tonnerre Lombard's write up on the old Debian OpenSSL vulnerability), but these are not common occurrences and suppressions can always be applied. The average valgrind finding in a large code base should cause test failure.

Although running tests under valgrind will not expand code coverage, valgrind is a powerful tool that should be tightly integrated with your test suite.

Summary

  • Publish your code coverage stats
  • Make it easy to automatically produce code coverage results
  • Integrate a fuzzer into your test suite
  • Further extend your code coverage with fault injection
  • Negative valgrind findings should automatically force tests to fail

Software Release: fwknop-2.6.3

fwknop-2.6.3 software release The 2.6.3 release of fwknop is available for download. The emphasis in this release is maximizing code coverage through a new python SPA packet fuzzer, and also on fault injection testing with the excellent fault injection library libfiu developed by Alberto Bertogli. Another important change in 2.6.3 is all IP resolution lookups in '-R' mode now happen over SSL to make it harder for an adversary to mount a MITM attack on the resolution lookup. As always, manually specifying the IP to allow through the remote firewall is safer than relying on any network communication - even when SSL would be involved.

Here is the complete ChangeLog for fwknop-2.6.3:

  • [client] External IP resolution via '-R' (or '--resolve-ip-http') is now done via SSL by default. The IP resolution URL is now 'https://www.cipherdyne.org/cgi-gin/myip', and a warning is generated in '-R' mode whenever a non-HTTPS URL is specified (it is safer just to use the default). The fwknop client leverages 'wget' for this operation since that is cleaner than having fwknop link against an SSL library.
  • Integrated the 'libfiu' fault injection library available from http://blitiri.com.ar/p/libfiu/ This feature is disabled by default, and requires the --enable-libfiu-support argument to the 'configure' script in order to enable it. With fwknop compiled against libfiu, fault injections are done at various locations within the fwknop sources and the test suite verifies that the faults are properly handled at run time via test/fko-wrapper/fko_fault_injection.c. In addition, the libfiu tool 'fiu-run' is used against the fwknop binaries to ensure they handle faults that libfiu introduces into libc functions. For example, fiu-run can force malloc() to fail even without huge memory pressure on the local system, and the test suite ensures the fwknop binaries properly handle this.
  • [test suite] Integrated a new python fuzzer for fwknop SPA packets (see test/spa_fuzzing.py). This greatly extends the ability of the test suite to validate libfko operations since SPA fuzzing packets are sent through libfko routines directly (independently of encryption and authentication) with a special 'configure' option --enable-fuzzing-interfaces. The python fuzzer generates over 300K SPA packets, and when used by the test suite consumes about 400MB of disk. For reference, to use both the libfiu fault injection feature mentioned above and the python fuzzer, use the --enable-complete option to the test suite.
  • [test suite] With the libfiu fault injection support and the new python fuzzer, automated testing of fwknop achieves 99.7% function coverage and 90.2% line coverage as determined by 'gcov'. The full report may be viewed here: http://www.cipherdyne.org/fwknop/lcov-results/
  • [server] Add a new GPG_FINGERPRINT_ID variable to the access.conf file so that full GnuPG fingerprints can be required for incoming SPA packets in addition to the abbreviated GnuPG signatures listed in GPG_REMOTE_ID. From the test suite, an example fingerprint is:
    GPG_FINGERPRINT_ID     00CC95F05BC146B6AC4038C9E36F443C6A3FAD56
    
  • [server] When validating access.conf stanzas make sure that one of GPG_REMOTE_ID or GPG_FINGERPRINT_ID is specified whenever GnuPG signatures are to be verified for incoming SPA packets. Signature verification is the default, and can only be disabled with GPG_DISABLE_SIG but this is NOT recommended.
  • [server] Bug fix for PF firewalls without ALTQ support on FreeBSD. With this fix it doesn't matter whether ALTQ support is available or not. Thanks to Barry Allard for discovering and reporting this issue. Closes issue #121 on github.
  • [server] Bug fix discovered with the libfiu fault injection tag "fko_get_username_init" combined with valgrind analysis. This bug is only triggered after a valid authenticated and decrypted SPA packet is sniffed by fwknopd:
    ==11181== Conditional jump or move depends on uninitialised value(s)
    ==11181==    at 0x113B6D: incoming_spa (incoming_spa.c:707)
    ==11181==    by 0x11559F: process_packet (process_packet.c:211)
    ==11181==    by 0x5270857: ??? (in /usr/lib/x86_64-linux-gnu/libpcap.so.1.4.0)
    ==11181==    by 0x114BCC: pcap_capture (pcap_capture.c:270)
    ==11181==    by 0x10F32C: main (fwknopd.c:195)
    ==11181==  Uninitialised value was created by a stack allocation
    ==11181==    at 0x113476: incoming_spa (incoming_spa.c:294)
    
  • [server] Bug fix to handle SPA packets over HTTP by making sure to honor the ENABLE_SPA_OVER_HTTP fwknopd.conf variable and to properly account for SPA packet lengths when delivered via HTTP.
  • [server] Add --test mode to instruct fwknopd to acquire and process SPA packets, but not manipulate firewall rules or execute commands that are provided by SPA clients. This option is mostly useful for the fuzzing tests in the test suite to ensure broad code coverage under adverse conditions.

Software Release: fwknop-2.6.0

fwknop-2.6.0 software release The 2.6.0 release of fwknop is available for download. This release incorporates a number of feature enhancements such as an AppArmor policy for fwknopd, HMAC authenticated encryption support for the Android client, new NAT criteria that are independently configurable for each access.conf stanza, and more rigorous valgrind verification powered by the CPAN Test::Valgrind module. A few bugs were fixed as well, and similarly to the 2.5 and 2.5.1 releases, the fwknop project has a Coverity defect count of zero. As proof of this, you can see the Coverity high-level defect stats for fwknop here (you'll need to sign up for an account): Coverity Scan Build Status I would encourage any open source project that is using Coverity to publish their scan results. At last count, it appears that over 1,100 projects are using Coverity, but OpenSSH is still not one of them.

Development on fwknop-2.6.1 will begin shortly, and here is the complete ChangeLog for fwknop-2.6.0:

  • (Radostan Riedel) Added an AppArmor policy for fwknopd that is known to work on Debian and Ubuntu systems. The policy file is available at extras/apparmor/usr.sbin/fwknopd.
  • [libfko] Nikolay Kolev reported a build issue with Mac OS X Mavericks where local fwknop copies of strlcat() and strlcpy() were conflicting with those that already ship with OS X 10.9. Closes #108 on github.
  • [libfko] (Franck Joncourt) Consolidated FKO context dumping function into lib/fko_util.c. In addition to adding a shared utility function for printing an FKO context, this change also makes the FKO context output slightly easier to parse by printing each FKO attribute on a single line (this change affected the printing of the final SPA packet data). The test suite has been updated to account for this change as well.
  • [libfko] Bug fix to not attempt SPA packet decryption with GnuPG without an fko object with encryption_mode set to FKO_ENC_MODE_ASYMMETRIC. This bug was caught with valgrind validation against the perl FKO extension together with the set of SPA fuzzing packets in test/fuzzing/fuzzing_spa_packets. Note that this bug cannot be triggered via fwknopd because additional checks are made within fwknopd itself to force FKO_ENC_MODE_ASYMMETRIC whenever an access.conf stanza contains GPG key information. This fix strengthens libfko itself to independently require that the usage of fko objects without GPG key information does not result in attempted GPG decryption operations. Hence this fix applies mostly to third party usage of libfko - i.e. stock installations of fwknopd are not affected. As always, it is recommended to use HMAC authenticated encryption whenever possible even for GPG modes since this also provides a work around even for libfko prior to this fix.
  • [Android] (Gerry Reno) Updated the Android client to be compatible with Android-4.4.
  • [Android] Added HMAC support (currently optional).
  • [server] Updated pcap_dispatch() default packet count from zero to 100. This change was made to ensure backwards compatibility with older versions of libpcap per the pcap_dispatch() man page, and also because some of a report from Les Aker of an unexpected crash on Arch Linux with libpcap-1.5.1 that is fixed by this change (closes #110).
  • [server] Bug fix for SPA NAT modes on iptables firewalls to ensure that custom fwknop chains are re-created if they get deleted out from under the running fwknopd instance.
  • [server] Added FORCE_SNAT to the access.conf file so that per-access stanza SNAT criteria can be specified for SPA access.
  • [test suite] added --gdb-test to allow a previously executed fwknop or fwknopd command to be sent through gdb with the same command line args as the test suite used. This is for convenience to rapidly allow gdb to be launched when investigating fwknop/fwknopd problems.
  • [client] (Franck Joncourt) Added --stanza-list argument to show the stanza names from ~/.fwknoprc.
  • [libfko] (Hank Leininger) Contributed a patch to greatly extend libfko error code descriptions at various places in order to give much better information on what certain error conditions mean. Closes #98.
  • [test suite] Added the ability to run perl FKO module built-in tests in the t/ directory underneath the CPAN Test::Valgrind module. This allows valgrind memory checks to be applied to libfko functions via the perl FKO module (and hence rapid prototyping can be combined with memory leak detection). A check is made to see whether the Test::Valgrind module has been installed, and --enable-valgrind is also required (or --enable-all) on the test-fwknop.pl command line.
Next »