cipherdyne.org

Michael Rash, Security Researcher



Code Coverage: Challenges For Open Source Projects

In this blog post I'll pose a few challenges to open source projects. These challenges, among other goals, are designed to push automated test suites to maximize code coverage. For motivation, I'll present code coverage stats from the OpenSSL and OpenSSH test suites, and contrast this with what I'm trying to achieve in the fwknop project.

Stated bluntly, if comprehensive code coverage is not achieved by your test suite, you're doing it wrong.

With major bugs like Heartbleed and "goto fail", there is clearly a renewed need for better automated testing. As Mike Bland demonstrated, the problem isn't so much about technology since unit tests can be built for both Heartbleed and "goto fail". Rather, the problem is that the whole software engineering life cycle needs to embrace better testing.

1) Publish your code coverage stats

Just as open source projects publish source code, test suite code coverage results should be published too. The mere act of publishing code coverage - just as open source itself - is a way to engage developers and users alike to improve such results. Clearly displayed should be all branches, lines, and functions that are exercised by the test suite, and these results should be published for every software release. Without such stats, how can users have confidence that the project test suite is comprehensive? Bugs still crop up even with good code coverage, but many many bugs will be squashed during the development of a test suite that strives to exercise every line of source code.

For a project written in C and compiled with gcc, the lcov tool provides a nice front-end for gcov coverage data. lcov was used to generate the following code coverage reports on OpenSSL-1.0.1h, OpenSSH-6.6p1, and fwknop-2.6.3 using the respective test suite in each project:

Hit Total Coverage
OpenSSL-1.0.1h Lines: 44604 96783 46.1 %
Full code coverage report Functions: 3518 6574 53.5 %
Branches: 22653 67181 33.7 %

OpenSSH-6.6p1 Lines: 18241 33466 54.5 %
Full code coverage report Functions: 1217 1752 69.5 %
Branches: 9362 22674 41.3 %

fwknop-2.6.3 Lines: 6971 7748 90.0 %
Full code coverage report Functions: 376 377 99.7 %
Branches: 4176 5239 79.7 %


The numbers speak for themselves. To be fair, fwknop has a much smaller code base than OpenSSL or OpenSSH - less than 1/10th the size of OpenSSL and about 1/5th the size of OpenSSH in terms of lines reported by gcov. So, presumaby it's a lot easier to reach higher levels of code coverage in fwknop than the other two projects. Nevertheless, it is shocking that the line coverage in the OpenSSL test suite is below 50%, and not much better in OpenSSH. What are the odds that the other half of the code is bug free? What are the odds that changes in newer versions won't break assumptions made in the untested code? What are the odds that one of the next security vulnerabilities announced in either project stems from this code? Of course, test suites are not a panacea, but there are almost certainly bugs lying in wait within the untested code. It is easier to have confidence in code that has at least some test suite coverage than code that has zero coverage - at least as far as the test suite is concerned (I'm not saying that other tools are not being used). Both OpenSSL and OpenSSH use tools beyond their respective test suites to try and maintain code quality - OpenSSL uses Coverity for example - but these tools are not integrated with test suite results and do not contribute to the code coverage stats above. What I'm suggesting is that the test suites themselves should get much closer to 100% coverage, and this may require the integration of infrastructure like a custom fuzzer or fault injection library (more on this below).

On another note, given that an explicit design goal of the fwknop test suite is to maximize code coverage, and given the results above, it is clear there is significant work left to do.

2) Make it easy to automatically produce code coverage results

Neither OpenSSL nor OpenSSH make it easy to automatically generate code coverage stats like those shown above. One can always Google for the appropriate CFLAGS and LDFLAGS settings, recompile, and run lcov, but you shouldn't have to. This should be automatic and built into the test suite as an option. If your project is using autoconf, then there should be a top level --enable-code-coverage switch (or similar) to the configure script, and the test suite should take the next steps to produce the code coverage reports. Without this, there is unnecessary complexity and manual work, and this affects users and developers alike. My guess is this lack of automation is a factor for why code coverage for OpenSSL and OpenSSH is not better. Of course, it takes a lot of effort to develop a test suite with comprehensive code coverage support, but automation is low hanging fruit.

If you want to generate the code coverage reports above, here are two trivial scripts - one for OpenSSH and another for OpenSSL. This one works for OpenSSH:
#!/bin/sh
#
# Basic script to compile OpenSSH with code coverage support via gcc
# gcov and build HTML reports with lcov.
#

LCOV_DIR=lcov-results
LCOV_FILE=coverage.info
LCOV_FILTERED=coverage_final.info
PREFIX=~/install/openssh  ### tmp path

mkdir $LCOV_DIR

make clean
./configure --with-cflags="-fprofile-arcs -ftest-coverage" --with-ldflags="-fprofile-arcs -lgcov" --prefix=$PREFIX
make
make tests

### build coverage info and filter /usr/include/ files
lcov --capture --directory . --output-file $LCOV_FILE
lcov -r $LCOV_FILE /usr/include/\* --output-file $LCOV_DIR/$LCOV_FILTERED

### create the HTML report
genhtml $LCOV_DIR/$LCOV_FILTERED --output-directory $LCOV_DIR

exit

3) Integrate a fuzzer into your test suite

If your test suite does not achieve 99%/95% function/line coverage, architectural changes should be made to reach these goals and beyond. This will likely require that test suite drive a fuzzer against your project and measure how well it exercises the code base.

Looking at code coverage results for older versions of the fwknop project was an eye opener. Although the test suite had hundreds of tests, there were large sections of code that were not exercised. It was for this reason the 2.6.3 release concentrated on more comprehensive automated test coverage. However, achieving better coverage was not a simple matter of executing fwknop components with different configuration files or command line arguments - it required the development of a dedicated SPA packet fuzzer along with a special macro -DENABLE_FUZZING built into the source code to allow the fuzzer to reach portions of code that would have otherwise been more difficult to trigger due to encryption and authentication requirements. This is a similar to the strategy proposed in Michal Zalewski's fuzzer American Fuzzy Lop here (see the "Known Limitations" section).

The main point is that fwknop was changed to support fuzzing driven by the test suite as a way to extend code coverage. It is the strong integration of fuzzing into the test suite that provides a powerful testing technique, and looking at code coverage results allows you to measure it.

Incidentally, an example double free() bug that the fwknop packet fuzzer triggered in conjunction with the test suite can be found here (fixed in 2.6.3).

4) Further extend your code coverage with fault injection

Any C project leveraging libc functions should implement error checking against function return values. The canonical example is checking to see whether malloc() returned NULL, and if so this is usually treated as an unrecoverable error like so:
char *buf = NULL;
int size = 100;

if ((buf = malloc(size)) == NULL) {
    clean_up();
    exit(EXIT_FAILURE);
}
Some projects elect to write a "safe_malloc()" wrapper for malloc() or other libc functions so that error handling can be done in one place, but it is not feasible to do this for every libc function. So, how to verify whether error conditions are properly handled at run time? For malloc(), NULL is typically returned under extremely high memory pressure, so it is hard to trigger this condition and still have a functioning system let alone a functioning test suite. In other words, in the example above, how can the test suite achieve code coverage for the clean_up() function? Other examples include filesystem or network function errors that are returned when disks fill up, or a network communication is blocked, etc.

What's needed is a mechanism for triggering libc faults artificially, without requiring the underlying conditions to actually exist that would normally cause such faults. This is where a fault injection library like libfiu comes in. Not only does it support fault injection at run time against libc functions without the need to link against libfiu (a dedicated binary "fiu-run" takes care of this), but it can also be used to trigger faults in arbitrary non-libc functions within a project to see how function callers handle errors. In fwknop, both strategies are used by the test suite, and this turned up a number of bugs like this one.

Full disclosure: libfiu does not yet support code coverage when executing a binary under fiu-run because there are problems interacting with libc functions necessary to write out the various source_file.c.gcno and source_file.c.gcda coverage files. This issue is being worked on for an upcoming release of libfiu. So, in the context of fwknop, libfiu is used to trigger faults directly in fwknop functions to see how calling functions handle errors, and this strategy is compatible with gcov coverage results. The fiu-run tool is also used, but more from the perspective of trying to crash one of the fwknop binaries since we can't (yet) see code coverage results under fiu-run. Here is an example fault introduced into the fko_get_username() function:
/* Return the current username for this fko context.
*/
int 
fko_get_username(fko_ctx_t ctx, char **username)
{   
#if HAVE_LIBFIU
    fiu_return_on("fko_get_username_init", FKO_ERROR_CTX_NOT_INITIALIZED);
#endif
With the fault set (there is a special command line argument --fault-injection-tag on the fwknopd server command line to enable the fault), the error handling code seen at the end of the example below is executed via the test suite. For proof of error handling execution, see the full coverage report (look at line 240).
/* Popluate a spa_data struct from an initialized (and populated) FKO context.
*/
static int
get_spa_data_fields(fko_ctx_t ctx, spa_data_t *spdat)
{   
    int res = FKO_SUCCESS;

    res = fko_get_username(ctx, &(spdat->username));
    if(res != FKO_SUCCESS)
            return(res);

Once again, it is the integration of fault injection with the test suite and corresponding code coverage reports that extends testing efforts in a powerful way. libfiu offers many nice features, including thread safey, the ability to enable a fault injection tag relative to other functions in the stack, and more.

5) Negative valgrind findings should force tests to fail

So far, a theme in this blog post has been better code coverage through integration. I've attempted to make the case for the integration of fuzzing and fault injection with project test suites, and code coverage stats should be produced for both styles of testing.

A third integration effort is to leverage valgrind. That is, the test suite should run tests underneath valgrind when possible (speed and memory usage may be constraints here depending on the project). If valgrind discovers a memory leak, double free(), or other problem, this finding should automatically cause corresponding tests to fail. In some cases valgrind suppressions will need to be created if a project depends on libraries or other code that is known to have issues under valgrind, but findings within project sources should cause tests to fail. For projects heavy on the crypto side, there are some instances where code is very deliberately built in a manner that triggers a valgrind error (see Tonnerre Lombard's write up on the old Debian OpenSSL vulnerability), but these are not common occurrences and suppressions can always be applied. The average valgrind finding in a large code base should cause test failure.

Although running tests under valgrind will not expand code coverage, valgrind is a powerful tool that should be tightly integrated with your test suite.

Summary

  • Publish your code coverage stats
  • Make it easy to automatically produce code coverage results
  • Integrate a fuzzer into your test suite
  • Further extend your code coverage with fault injection
  • Negative valgrind findings should automatically force tests to fail

Software Release: fwknop-2.6.3

fwknop-2.6.3 software release The 2.6.3 release of fwknop is available for download. The emphasis in this release is maximizing code coverage through a new python SPA packet fuzzer, and also on fault injection testing with the excellent fault injection library libfiu developed by Alberto Bertogli. Another important change in 2.6.3 is all IP resolution lookups in '-R' mode now happen over SSL to make it harder for an adversary to mount a MITM attack on the resolution lookup. As always, manually specifying the IP to allow through the remote firewall is safer than relying on any network communication - even when SSL would be involved.

Here is the complete ChangeLog for fwknop-2.6.3:

  • [client] External IP resolution via '-R' (or '--resolve-ip-http') is now done via SSL by default. The IP resolution URL is now 'https://www.cipherdyne.org/cgi-gin/myip', and a warning is generated in '-R' mode whenever a non-HTTPS URL is specified (it is safer just to use the default). The fwknop client leverages 'wget' for this operation since that is cleaner than having fwknop link against an SSL library.
  • Integrated the 'libfiu' fault injection library available from http://blitiri.com.ar/p/libfiu/ This feature is disabled by default, and requires the --enable-libfiu-support argument to the 'configure' script in order to enable it. With fwknop compiled against libfiu, fault injections are done at various locations within the fwknop sources and the test suite verifies that the faults are properly handled at run time via test/fko-wrapper/fko_fault_injection.c. In addition, the libfiu tool 'fiu-run' is used against the fwknop binaries to ensure they handle faults that libfiu introduces into libc functions. For example, fiu-run can force malloc() to fail even without huge memory pressure on the local system, and the test suite ensures the fwknop binaries properly handle this.
  • [test suite] Integrated a new python fuzzer for fwknop SPA packets (see test/spa_fuzzing.py). This greatly extends the ability of the test suite to validate libfko operations since SPA fuzzing packets are sent through libfko routines directly (independently of encryption and authentication) with a special 'configure' option --enable-fuzzing-interfaces. The python fuzzer generates over 300K SPA packets, and when used by the test suite consumes about 400MB of disk. For reference, to use both the libfiu fault injection feature mentioned above and the python fuzzer, use the --enable-complete option to the test suite.
  • [test suite] With the libfiu fault injection support and the new python fuzzer, automated testing of fwknop achieves 99.7% function coverage and 90.2% line coverage as determined by 'gcov'. The full report may be viewed here: http://www.cipherdyne.org/fwknop/lcov-results/
  • [server] Add a new GPG_FINGERPRINT_ID variable to the access.conf file so that full GnuPG fingerprints can be required for incoming SPA packets in addition to the abbreviated GnuPG signatures listed in GPG_REMOTE_ID. From the test suite, an example fingerprint is:
    GPG_FINGERPRINT_ID     00CC95F05BC146B6AC4038C9E36F443C6A3FAD56
    
  • [server] When validating access.conf stanzas make sure that one of GPG_REMOTE_ID or GPG_FINGERPRINT_ID is specified whenever GnuPG signatures are to be verified for incoming SPA packets. Signature verification is the default, and can only be disabled with GPG_DISABLE_SIG but this is NOT recommended.
  • [server] Bug fix for PF firewalls without ALTQ support on FreeBSD. With this fix it doesn't matter whether ALTQ support is available or not. Thanks to Barry Allard for discovering and reporting this issue. Closes issue #121 on github.
  • [server] Bug fix discovered with the libfiu fault injection tag "fko_get_username_init" combined with valgrind analysis. This bug is only triggered after a valid authenticated and decrypted SPA packet is sniffed by fwknopd:
    ==11181== Conditional jump or move depends on uninitialised value(s)
    ==11181==    at 0x113B6D: incoming_spa (incoming_spa.c:707)
    ==11181==    by 0x11559F: process_packet (process_packet.c:211)
    ==11181==    by 0x5270857: ??? (in /usr/lib/x86_64-linux-gnu/libpcap.so.1.4.0)
    ==11181==    by 0x114BCC: pcap_capture (pcap_capture.c:270)
    ==11181==    by 0x10F32C: main (fwknopd.c:195)
    ==11181==  Uninitialised value was created by a stack allocation
    ==11181==    at 0x113476: incoming_spa (incoming_spa.c:294)
    
  • [server] Bug fix to handle SPA packets over HTTP by making sure to honor the ENABLE_SPA_OVER_HTTP fwknopd.conf variable and to properly account for SPA packet lengths when delivered via HTTP.
  • [server] Add --test mode to instruct fwknopd to acquire and process SPA packets, but not manipulate firewall rules or execute commands that are provided by SPA clients. This option is mostly useful for the fuzzing tests in the test suite to ensure broad code coverage under adverse conditions.

Software Release: fwknop-2.6.0

fwknop-2.6.0 software release The 2.6.0 release of fwknop is available for download. This release incorporates a number of feature enhancements such as an AppArmor policy for fwknopd, HMAC authenticated encryption support for the Android client, new NAT criteria that are independently configurable for each access.conf stanza, and more rigorous valgrind verification powered by the CPAN Test::Valgrind module. A few bugs were fixed as well, and similarly to the 2.5 and 2.5.1 releases, the fwknop project has a Coverity defect count of zero. As proof of this, you can see the Coverity high-level defect stats for fwknop here (you'll need to sign up for an account): Coverity Scan Build Status I would encourage any open source project that is using Coverity to publish their scan results. At last count, it appears that over 1,100 projects are using Coverity, but OpenSSH is still not one of them.

Development on fwknop-2.6.1 will begin shortly, and here is the complete ChangeLog for fwknop-2.6.0:

  • (Radostan Riedel) Added an AppArmor policy for fwknopd that is known to work on Debian and Ubuntu systems. The policy file is available at extras/apparmor/usr.sbin/fwknopd.
  • [libfko] Nikolay Kolev reported a build issue with Mac OS X Mavericks where local fwknop copies of strlcat() and strlcpy() were conflicting with those that already ship with OS X 10.9. Closes #108 on github.
  • [libfko] (Franck Joncourt) Consolidated FKO context dumping function into lib/fko_util.c. In addition to adding a shared utility function for printing an FKO context, this change also makes the FKO context output slightly easier to parse by printing each FKO attribute on a single line (this change affected the printing of the final SPA packet data). The test suite has been updated to account for this change as well.
  • [libfko] Bug fix to not attempt SPA packet decryption with GnuPG without an fko object with encryption_mode set to FKO_ENC_MODE_ASYMMETRIC. This bug was caught with valgrind validation against the perl FKO extension together with the set of SPA fuzzing packets in test/fuzzing/fuzzing_spa_packets. Note that this bug cannot be triggered via fwknopd because additional checks are made within fwknopd itself to force FKO_ENC_MODE_ASYMMETRIC whenever an access.conf stanza contains GPG key information. This fix strengthens libfko itself to independently require that the usage of fko objects without GPG key information does not result in attempted GPG decryption operations. Hence this fix applies mostly to third party usage of libfko - i.e. stock installations of fwknopd are not affected. As always, it is recommended to use HMAC authenticated encryption whenever possible even for GPG modes since this also provides a work around even for libfko prior to this fix.
  • [Android] (Gerry Reno) Updated the Android client to be compatible with Android-4.4.
  • [Android] Added HMAC support (currently optional).
  • [server] Updated pcap_dispatch() default packet count from zero to 100. This change was made to ensure backwards compatibility with older versions of libpcap per the pcap_dispatch() man page, and also because some of a report from Les Aker of an unexpected crash on Arch Linux with libpcap-1.5.1 that is fixed by this change (closes #110).
  • [server] Bug fix for SPA NAT modes on iptables firewalls to ensure that custom fwknop chains are re-created if they get deleted out from under the running fwknopd instance.
  • [server] Added FORCE_SNAT to the access.conf file so that per-access stanza SNAT criteria can be specified for SPA access.
  • [test suite] added --gdb-test to allow a previously executed fwknop or fwknopd command to be sent through gdb with the same command line args as the test suite used. This is for convenience to rapidly allow gdb to be launched when investigating fwknop/fwknopd problems.
  • [client] (Franck Joncourt) Added --stanza-list argument to show the stanza names from ~/.fwknoprc.
  • [libfko] (Hank Leininger) Contributed a patch to greatly extend libfko error code descriptions at various places in order to give much better information on what certain error conditions mean. Closes #98.
  • [test suite] Added the ability to run perl FKO module built-in tests in the t/ directory underneath the CPAN Test::Valgrind module. This allows valgrind memory checks to be applied to libfko functions via the perl FKO module (and hence rapid prototyping can be combined with memory leak detection). A check is made to see whether the Test::Valgrind module has been installed, and --enable-valgrind is also required (or --enable-all) on the test-fwknop.pl command line.

Validating libfko Memory Usage with Test::Valgrind

Validating libfko Memory Usage with Test::Valgrind The fwknop project consistently uses valgrind to ensure that memory leaks, double free() conditions, and other problems do not creep into the code base. A high level of automation is built around valgrind usage with the fwknop test suite, and a recent addition extends this even further by using the excellent CPAN Test::Valgrind module. Even though the test suite has had the ability to run tests through valgrind, previous to this change these tests only applied to the fwknop C binaries when executed directly by the test suite. Further, some of the most rigorous testing is done through the usage of the perl FKO extension to fuzz libfko functions, so without the Test::Valgrind module these tests also could not take advantage of valgrind support. Now that the test suite supports Test::Valgrind (and a check is done to see if it is installed), all fuzzing tests can also be validated with valgrind. Technically, the fuzzing tests have been added as FKO built-in tests in the t/ directory, and the test suite runs them through Test::Valgrind like this:
# prove --exec 'perl -Iblib/lib -Iblib/arch -MTest::Valgrind' t/*.t
Here is a complete example - first, run the test suite like so:
# ./test-fwknop.pl --enable-all --include perl --test-limit 3

[+] Starting the fwknop test suite...

    args: --enable-all --include perl --test-limit 3

[+] Total test buckets to execute: 3

[perl FKO module] [compile/install] to: ./FKO...................pass (1)
[perl FKO module] [make test] run built-in tests................pass (2)
[perl FKO module] [prove t/*.t] Test::Valgrind..................pass (3)
[valgrind output] [flagged functions] ..........................pass (4)

    Run time: 1.27 minutes

[+] 0/0/0 OpenSSL tests passed/failed/executed
[+] 0/0/0 OpenSSL HMAC tests passed/failed/executed
[+] 4/0/4 test buckets passed/failed/executed
Note that all tests passed as shown above. This indicates that the test suite has not found any memory leaks through the fuzzing tests run via Test::Valgrind. But, let's validate this by artificially introducing a memory leak and see if the test suite can automatically catch it. For example, here is a patch that forces a memory leak in the validate_access_msg() libfko function. This function ensures that the shape of the access request conforms to something fwknop expects like "1.2.3.4,tcp/22". The memory leak happens because a new buffer is allocated from the heap but is never free()'d before returning from the function (obviously this patch is for illustration and testing purposes only):
$ git diff
diff --git a/lib/fko_message.c b/lib/fko_message.c
index fa6803b..c04e035 100644
--- a/lib/fko_message.c
+++ b/lib/fko_message.c
@@ -251,6 +251,13 @@ validate_access_msg(const char *msg)
     const char   *ndx;
     int     res         = FKO_SUCCESS;
     int     startlen    = strnlen(msg, MAX_SPA_MESSAGE_SIZE);
+    char *leak = NULL;
+
+    leak = malloc(100);
+    leak[0] = 'a';
+    leak[1] = 'a';
+    leak[2] = '\0';
+    printf("LEAK: %s\n", leak);

     if(startlen == MAX_SPA_MESSAGE_SIZE)
         return(FKO_ERROR_INVALID_DATA_MESSAGE_ACCESS_MISSING);
Now recompile fwknop and run the test suite again, after applying the patch (recompilation output is not shown):
# cd ../
# make
# test
# ./test-fwknop.pl --enable-all --include perl --test-limit 3
[+] Starting the fwknop test suite...

    args: --enable-all --include perl --test-limit 3

    Saved results from previous run to: output.last/

    Valgrind mode enabled, will import previous coverage from:
        output.last/valgrind-coverage/

[+] Total test buckets to execute: 3

[perl FKO module] [compile/install] to: ./FKO...................pass (1)
[perl FKO module] [make test] run built-in tests................pass (2)
[perl FKO module] [prove t/*.t] Test::Valgrind..................fail (3)
[valgrind output] [flagged functions] ..........................fail (4)

    Run time: 1.27 minutes

[+] 0/0/0 OpenSSL tests passed/failed/executed
[+] 0/0/0 OpenSSL HMAC tests passed/failed/executed
[+] 2/2/4 test buckets passed/failed/executed

This time two tests fail. The first is the test that runs the perl FKO module built-in tests under Test::Valgrind, and the second is the "flagged functions" test which compares test suite output looking for new functions that valgrind has flagged vs. the previous test suite execution. By looking at the output file of the "flagged functions" test it is easy to see the offending function where the new memory leak exists. This provides an easy, automated way of memory leak detection that is driven by perl FKO fuzzing tests.
# cat output/4.test
[+] fwknop client functions (with call line numbers):
       10 : validate_access_msg [fko_message.c:256]
        6 : fko_set_spa_message [fko_message.c:184]
        4 : fko_new_with_data [fko_funcs.c:263]
        4 : fko_decrypt_spa_data [fko_encryption.c:264]
        4 : fko_decode_spa_data [fko_decode.c:350]
Currently, there are no known memory leaks in the fwknop code, and automation built around the Test::Valgrind module will help keep it that way.

Port Knocking: Why You Should Give It Another Look

Port Knocking: Why you Should Give It Another Look (Update 10/20/2013: There is a Reddit comment thread going on this post here.)

It has been a decade since Port Knocking was first introduced to the security community in 2003, so it seemed fitting to recap how far the concept has evolved. Much effort has gone into solving architectural problems with PK systems, and today Single Packet Authorization (SPA) embodies the primary benefits of PK while fixing its limitations.

There are noted security researchers on both sides of the debate as to whether PK/SPA has any security value, but it is interesting that researchers who don't find value seem to concentrate on aspects of PK/SPA that have little to do with the chief benefit: cryptographically strong concealment. At least, this is the property offered by Single Packet Authorization but admittedly not necessarily with Port Knocking. Let's first go through some of the more common criticisms of PK/SPA, and show what the SPA answer is to each one. For those that haven't considered SPA in the past, perhaps it is time to give it a second look if for no other reason than to propose a method for breaking it.

1) Isn't PK/SPA just another password?

Suppose I hand you an arbitrary IP, say 2.2.2.2 that is running a default-drop firewall policy. As an attacker, you scan 2.2.2.2 and can't get any information back whatsoever. It doesn't respond to pings, Nmap cannot detect any TCP or UDP service under all scanning techniques, and any Metasploit module that relies on a TCP connect() call is ineffective. In the absence of a routing issue, it is safe to assume there is a firewall or ACL blocking all incoming scans. It is not feasible to tell whether there are any services listening (SSH or otherwise), and it also not feasible to tell whether there is a PK/SPA daemon either - the firewall is being used to do what it does best: block network traffic. The point is that there is no information coming back from the target. A PK/SPA daemon may be deployed, but the passive nature of PK/SPA makes it undiscoverable [1].

As a thought experiment, what would it take to make PK/SPA "just another password"? Well, if the PK/SPA daemon listened on a TCP socket and advertised itself via a server banner (like "fwknop-2.5.1, enter password for SSH access:") this would go a long way. Then Nmap would once again become an effective tool for finding the PK/SPA daemon, and an attacker could start to try different passwords. In other words, the daemon would no longer be passive, which is the whole point of PK/SPA to begin with.

But wait, you might say "but attackers can just try to brute force passive PK/SPA daemons anyway (even though they can't scan for them directly) and see if a port opens up", which brings us to:

2) Can PK/SPA be brute-forced?

Port Knocking implementations that use simple shared sequences are certainly vulnerable to brute forcing, and they are also vulnerable to replay attacks. These vulnerabilities (among other problems) were primary motivators for the development of SPA, and any modern SPA implementation is not vulnerable to either of these attacks. For example, fwknop uses AES in CBC mode authenticated with an HMAC SHA-256 in the encrypt-then-authenticate model, and both the encryption and HMAC keys (256 and 512 bits respectively for a total of 768 bits) are generated from random data in --key-gen mode. Further, fwknop can leverage GnuPG instead of AES, and 2048-bit GnuPG keys are fully supported. If it were practical to brute force fwknop encryption and authentication, then it would also be practical to brute force a lot of other cryptographic software too. Hence, fwknop is not vulnerable to such attacks in any practical sense [2].

Beyond this, from 1) above remember that the very existence of the SPA daemon is not discoverable by an attacker. For the average adversary, interacting with the SPA daemon must be done blindly "by chance". So, the target - even if it is running an SPA daemon which the attacker can't see - which implementation should the attacker try to brute force? If the target is running Moxie Marlinspike's knockknock (which uses AES in CTR mode authenticated with a truncated HMAC SHA-1), then the attacker needs to try and brute force the daemon with crafted TCP SYN packets via the following fields: TCP window size, TCP sequence, TCP acknowledgement, and the network layer IP ID. On the other hand, if the target is running fwknop, then the attacker would have to try and brute force the fwknopd daemon with UDP payloads to a port that the fwknopd pcap filter statement allows (although fwknopd can also be configured to only accept SPA payloads over ICMP instead). Should the attacker try to brute force the fwknop AES-CBC + HMAC SHA-256 mode? Or the GnuPG + HMAC SHA-256 mode? Further, the fwknopd daemon can place restrictions on the services that an authenticated client is authorized to request via the access.conf file. There are a lot of bits adding up, and the entire time the attacker doesn't even know whether an SPA daemon is actually running let alone which one.

Unfortunately for the attacker it gets worse. Even if the attacker could somehow brute force both the encryption and authentication steps in fwknop or other SPA software, to which service should the attacker try to make a connection? No service has to listen on the default port, so if a connection to SSH isn't answered should the attacker scan the target looking for a service? Maybe SPA is being used to conceal an IMAP daemon, a webserver, or OpenVPN instead of SSHD? Further, because the SPA daemon never acknowledges anything to begin with, the attacker can only infer that a brute force attempt was successful by seeing if a service is finally available after each attempt. So, in order for the attacker to be effective, the work flow for every brute force attempt should be: (1) brute force attempt, (2) scan for SSH (just because SPA is usually used to conceal SSH), (3) full scan if the SSH scan doesn't work. This starts to become extremely noisy to say the least. Even if full scans aren't also used, the volume of traffic just to attempt brute force operations by themselves is prohibitively huge.

Regardless of the attacker work flow, which service is concealed, or how heavily the attacker scans the target after every attempt, the brute force resistance offered by fwknop is fundamentally provided through the strength of cryptography. Getting past the authentication step alone would require breaking the 512-bit HMAC SHA-256 key (or forcing a hash collision against SHA-256), and fwknop even supports HMAC SHA-512 if one prefers that instead. Beyond this, the encryption key would also need to be brute-forced. For all intents and purposes it is not practical to brute force fwknop, and similar arguments apply to other SPA software.

3) Does PK/SPA add intolerable complexity?

Every security measure has some associated complexity. Firewalls add complexity. Encryption adds complexity. SSH itself adds complexity. If complexity were always the trump card against higher levels of security, then people would connect admin shells to sockets directly via netcat (or just run telnet) and not worry about encryption. People would not run firewalls because vanilla IP stacks without firewalling hooks would be simpler. Filesystem permissions overlays would be considered insecure because they add complexity. Obviously such viewpoints don't pass muster in the real world. The point is that using more complex code sometimes enables higher levels of security against widely understood threat models. For example, people use SSH and SSL because they want authentication and confidentiality over untrusted networks. Firewalls are used to reduce the attack surface that an adversary can easily communicate with. Application and/or filesystem layer policy controls are engineered to place restrictions on classes of users. The list continues.

This is not to say that complexity is not an important consideration - far from it. Rather, a real security benefit must be realized in order to justify increasing the complexity of a system. In the SPA community, we assert there is a security benefit afforded by passive, cryptographically strong service concealment. How would an attacker try to brute force user passwords via SSH when it is concealed by SPA? How would an attacker exploit even a zero-day vulnerability in a service protected by SPA? How would an attacker exploit a vulnerability in the SPA daemon itself when it is indistinguishable from a system that is running a default-drop firewall policy? (An attacker may interact with the SPA daemon, but it is more or less "by chance" [3].)

In the context of PK/SPA, the real issue is whether the complexity of the code that an attacker can interact with is more or less when PK/SPA is deployed. Anyone protecting networked services is probably already running a firewall, so the firewall usage by the PK/SPA software isn't adding to complexity that wasn't already there. Next, if PK/SPA is used to conceal multiple services (say, SSH, an IMAP daemon, and a webserver all at the same time), a would-be attacker cannot interact with any of those code bases without first getting past the PK/SPA daemon. It is a good bet that the complexity of the PK/SPA daemon is a lot less than the aggregate complexity of all three of those services if they were open to the world. Further, this may still be true even for a single daemon such as SSH as well. In essence, the effective complexity of code that an attacker can interact with may actually go down with PK/SPA deployed - that is, until the PK/SPA daemon can be circumvented (and hence a method for doing this becomes an important question for an attacker).

4) What happens if the PK/SPA daemon dies?

This is a solved problem. Process monitoring software has been around for decades. Many options exist for any OS on which a PK/SPA daemon is deployed. For example, on Ubuntu systems fwknopd is monitored by upstart. Having said this, fwknopd is extremely stable anyway so this feature is hardly ever needed in practice. Still, it is certainly important to ensure that PK/SPA usage does not cause a single point of failure, so using process monitoring software is a good idea.

Summary

There are other criticisms of SPA that are not included in this blog post, and certainly some of them are legitimate such as the fact that SPA requires a specialized client to access concealed services and the fact that "NAT piggybacking" is possible for users on the same network from which an SPA client is used when behind a NAT. However, these points don't generally rise to the level that they invalidate the SPA strategy. This blog post attempts to address those criticisms that could rise to this level were it not for the effort that has gone into solid SPA design by fwknop and other projects. More information on the design goals that guide fwknop can be found in the project tutorial.

In conclusion, SSL uses cryptography to provide authentication and confidentiality, Tor uses cryptography to provide anonymity, and SPA uses cryptography to conceal service existence. For those that assert there is no security value in the later strategy, it should consequently not be difficult to circumvent. To those in this camp, given the material in this post, please propose a method for breaking SPA.

[1] At least, a PK/SPA daemon is not discoverable by attackers who aren't already in a privileged position to sniff all traffic to and from the target. Clearly, most attackers - including password-guessing botnets - do not fall into this category.

[2] It is possible to weaken the security of fwknop SPA communications by not using --key-gen mode to generate random encryption and HMAC keys and thereby make them more susceptible to brute force attacks. However, this type of problems similarly affects other cryptographic software so it isn't unique to fwknop. And, even if a user doesn't use --key-gen mode, it is still not as easy to brute force fwknopd (which never confirms its existence to an attacker) as other software which an attacker can readily see is available to exploit.

[3] The security of fwknopd code itself is nonetheless quite important, and this is why the fwknop project uses static analysis provided by Coverity (and has a Coverity scan score of zero), the CLANG static analyzer, and also implements dynamic analysis with valgrind via a comprehensive test suite.

TCP Options and Detection of Masscan Port Scans

After Errata Security scanned port 22 across the entire Internet earlier this month, I thought I would go back through my iptables logs to see how the scan appeared against one of my systems. Errata Security published the IP they used for the scan as 71.6.151.167 so that it is easy to differentiate their scan from all of the other scans and probes:
[minastirith]# grep 71.6.151.167 /var/log/syslog | grep "DPT=22 "
Sep 12 21:19:15 minastirith kernel: [555953.034807] DROP IN=eth0 OUT= MAC=00:13:46:11:11:11:78:cd:11:6b:11:7e:11:00 SRC=71.6.151.167 DST=1.2.3.4 LEN=40 TOS=0x00 PREC=0x20 TTL=241 ID=17466 PROTO=TCP SPT=61000 DPT=22 WINDOW=0 RES=0x00 SYN URGP=0
Interestingly, the SYN packet that produced the log message above does not contain TCP options. The LOG rule in the iptables policy was built with the --log-tcp-options switch, and yet the OPT field for TCP options is not included. Looking through the Masscan sources, TCP SYN packets are created with the tcp_create_packet() function which does not appear to include code to set TCP options, and neither does the default template used for describing TCP packets. This is most likely done in order to maximize performance - not from the perspective of the sender since a static hard-coded TLV encoded buffer would have done nicely - but rather to minimize the time that scanned TCP stacks must spend processing the incoming SYN packets before a response is made. While this processing time is trivial for individual TCP connections, it would start to become substantial when trying to rapidly scan the entire IPv4 address space.

A consequence of this strategy is that SYN packets produced by Masscan look different on the wire from SYN packets produced by most operating systems (at least according to p0f), and they also differ from SYN scans produced by Nmap (which do include options as we'll see below). This is not to say that every SYN packet without options necessarily comes from Masscan. There are operating systems in the p0f signature set (such as Ultrix-4.4) that do not include options, and the Scapy project also seems not to set options by default when producing SYN scans like this. In addition it looks like Zmap also does not include TCP options in SYN scans. For reference, here are three iptables LOG messages for SYN packets produced by a standard TCP connect() call from an Ubuntu 12.04 system, an Nmap SYN (-sS) scan, and Scapy (source and destination IP's and MAC addresses have been obscured):
### TCP connect() SYN:
Sep 29 21:16:00 minastirith kernel: [171470.436701] DROP IN=eth0 OUT= MAC=00:13:46:11:11:11:78:cd:11:6b:11:7e:11:00 SRC=2.2.2.2 DST=1.2.3.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=15593 DF PROTO=TCP SPT=58884 DPT=12345 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A0CE97C070000000001030306)

### Nmap SYN (-sS):
Sep 29 21:16:12 minastirith kernel: [171482.094163] DROP IN=eth0 OUT= MAC=00:13:46:11:11:11:78:cd:11:6b:11:7e:11:00 SRC=2.2.2.2 DST=1.2.3.4 LEN=44 TOS=0x00 PREC=0x00 TTL=39 ID=26480 PROTO=TCP SPT=48271 DPT=12345 WINDOW=4096 RES=0x00 SYN URGP=0 OPT (020405B4)

### Scapy SYN via: sr1(IP(dst="1.2.3.4")/TCP(dport=12345,flags="S"))
Sep 29 21:35:15 minastirith kernel: [172625.207745] DROP IN=eth0 OUT= MAC=00:13:46:11:11:11:78:cd:11:6b:11:7e:11:00 SRC=2.2.2.2 DST=1.2.3.4 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=1 PROTO=TCP SPT=20 DPT=12345 WINDOW=8192 RES=0x00 SYN URGP=0
As a result, we can infer that SYN scans without options may originate from Masscan (the Scapy example not withstanding since Scapy's usage as a port scanner is a probably a lot less common than Masscan usage), and this will become more likely true over time as Masscan's popularity increases. (It is included as a tool leveraged by Rapid7's Sonar project for example.)

The upcoming 2.2.2 release of psad will include detection of scans that originate from Masscan, and if you check out the latest psad sources from git, this feature has already been added. To make this work, the iptables LOG rule needs to be instantiated with the --log-tcp-options switch, and a new psad.conf configuration variable "EXPECT_TCP_OPTIONS" has been added to assist. When looking for Masscan SYN scans, psad requires at least one TCP options field to be populated within a LOG message (so that it knows --log-tcp-options has been set for at least some logged traffic), and after seeing this then subsequent SYN packets with no options are attributed to Masscan traffic. All usual psad threshold variables continue to apply however, so (by default) a single Masscan SYN packet will not trigger a psad alert. Masscan detection can be disabled altogether by setting EXPECT_TCP_OPTIONS to "N", and this will not affect any other psad detection techniques such as passive OS fingerprinting, etc.

Design of a New 'xbits' Cross-Stream IDS Keyword

Snort and Suricata xbits implementation In the previous blog post a proposal was made for a new Snort and Suricata keyword "xbits" for cross-stream signature matching. This post had little discussion of implementation tradeoffs, and some have reacted to the blog post by saying that it is difficult to properly design and implement the type of cross-stream state tracking that would be necessary for xbits to work. I agree. However, the initial xbits proposal did not assume that such an implementation would be easy or straightforward, and this blog post will attempt to illustrate where some of the pitfalls are likely to be. In the end, I'm confident that is possible to develop something similar to xbits. Victor Julien, lead developer of Suricata, commented on my Google+ posting on the xbits keyword to state that Suricata has been considering implementing something similar to xbits for a while.

First, before diving into xbits itself, I would argue that there is already precedent in IDS/IPS engines for detecting an important class of communications that cross multiple transport layer "conversations" (using this term loosely for a moment): port scans and sweeps. Detecting such traffic is mostly about setting thresholds on various things such as the number of ports and IP addresses that are contacted within a given period of time, prioritizing on sets of ports that are usually associated with important services (with sweeps for certain ports sometimes spiking after a new vulnerability is discovered), and differentiating TCP flags that are used (a TCP FIN scan looks a lot different than a connect() scan and indicates some things about the adversary such as privileged OS access). By definition, the raw ability to differentiate scans and sweeps vs. normal traffic requires the capability of keeping some state across transport layer conversations. It just so happens that Snort and Suricata track this state within dedicated preprocessors and do not also expose port scan detection configuration in the signature language itself. By contrast, other preprocessors do offer signature language interfaces such as stream5 with the flow keyword. To be clear, I'm not at all advocating that configuration aspects of the sfPortscan preprocessor actually belong in the signature language (that would be unnatural to say the least) - I'm merely making the point that the idea of maintaining some state across transport layer conversations is something that Snort and Suricata already do. So, in this area at least, such a concept is not foreign.

Traffic Visibility

Now, in terms of factors that would affect an xbits implementation, it should be noted that not every IDS/IPS necessarily has a global view of all traffic on a given network. This can be the result of several different factors that depend not only on the physical hardware, but also how the IDS/IPS itself is developed (multi-threaded or not), and configured. Starting with the hardware, I've seen major IDS deployments that require multiple IDS appliances working together in order to inspect all of the network traffic. Essentially the IDS appliances form a cluster of systems (not in the HPC sense) where portions of the traffic are split across each appliance with a device such as a Gigamon tap. This allows each IDS appliance to handle a fraction of the traffic that it would otherwise have had to inspect, and this in turn enables the cluster as a whole to scale to massive amounts of network traffic. But, this also means portions of the traffic are physically separated from one IDS to the next, and therefore xbits on a single IDS can only apply to the set of transport layer conversations that actually traverse it. So, is there an opportunity for an attacker to evade xbits when deployed in this fashion? Sure, if the attacker knows how the traffic splitting is done, then attacks could most likely be sent against systems in ways that nullify xbits criteria just by using different source networks to force each individual IDS appliance into having only a limited view of a cross-stream attack. There are potential evasions at every level, but many attackers are not going to have access to such traffic splitting details.

Other IDS/IPS architectures such as those that rely on network processors or specialized packet acquisition hardware can also result in limited traffic visibility. Some organizations run multiple Snort processes on a single appliance, and have a network processor split packet data based on IP network ranges or transport layer port number across the Snort instances. Once again, each Snort process has a limited few of the traffic. Similarly, in a multi-threaded IDS/IPS such as Suricata, each thread may be tasked with processing a portion of the total network traffic, and this can result in a limited view within software even if there is no hardware enforced traffic splitting mechanism.

The Stream Preprocessor

Modifying the stream preprocessor to handle the xbits keyword could be tricky. By its nature, xbits would force the stream preprocessor to consider information that is not derived from single transport layer conversations, so locking issues against a global xbits tracking data structure in a multi-threaded context would become important. Also, it would be nice to not place onerous restrictions on xbits such as requiring that a connection close before a set xbit can be tested within a different connection. My guess is that stream preprocessor modifications have previously been a barrier to implementing something like xbits.

xbits Design

Given all of the above, what would be the ideal xbits design? Stepping back for a moment, when any signature language feature is implemented in an IDS, what should be the primary goal? Better attack detection. Performance is certainly a consideration too, and performance features sometimes bleed into the signature language (see the fast_pattern keyword for example), but usually a new signature language feature is added because it enables better detection of threats at acceptable performance levels. Further, some features of the language are important enough to expend lots of CPU cycles and consume precious memory anyway because the detection accuracy would be significantly harmed without them - see the pcre keyword for example. So, we should strive for the ideal xbits design from a detection perspective and let performance and other tradeoffs take place where they must:

  1. Allow an xbit to be set on one transport layer conversation and inspected in a different conversation before the first is closed.
  2. Allow an xbit to be set on a conversation that involves one IP protocol, and tested in a conversation that involves a different IP protocol. E.g. set a bit on a UDP flow and test it in a subsequent TCP connection.
  3. Interface with the current stream preprocessor to allow the setting and testing of xbits to take advantage of existing connection tracking capabilities. This most likely can be implemented as an extension to stream5 without requiring a wholly new preprocessor.
  4. For multi-threaded intrusion detection engines such as Suricata, some of the same tradeoffs that allow port scan detection to apply across threads could be used for xbits. Ideally, xbits would not be limited to traffic that is seen within a single thread.