cipherdyne.org

Michael Rash, Security Researcher



Programming    [Summary View]

RPM and a perl.req Heredoc Bug

RPM and a perl.req Heredoc Bug It looks like the Redhat Package Manager has a bug that in some corner cases where the dependencies of a perl program are not properly resolved. This happens whenever the perl program contains a particularly formatted embedded heredoc where a line within the heredoc text section begins with either the word use or require and is followed by any other word. Both "use" and "require" happen to be perl import statements, and the /usr/lib/rpm/perl.req script bundled with the RPM software does not make an exception for usages of these words within heredoc text. The end result is that whatever word follows either "use" or "require" is interpreted as a module dependency.

Here is an example of a heredoc section within a perl program that exposes the RPM dependency bug with line numbers included (this is derived from a modified version of the usage() text in the psad project, but please note that this has been fixed in psad; the example is just to show how to reproduce the bug):
1    print <<_HELP_;
2
3 psad: the Port Scan Attack Detector
4
5 Usage: psad [options]
6
7   --packets <number>      - Specify number of packets to
8                                   use in benchmark test (default is
9                                   10,000).
10  -U,  --USR1                   - Send a running psad process a USR1
11                                  signal (generates a dump of psad
12                                  data structures on STDOUT).
13
14 _HELP_
Note on line 8 above, the line begins with "use in benchmark" (ignoring the line number and the whitespace). The word after "use" is "in", and this is interpreted as a dependency by perl.req as illustrated below when trying to install an RPM out of psad with the heredoc text section above: [/usr/src/redhat/RPMS/i386]# rpm -ivvh psad-2.1.1-1.i386.rpm
D: package psad-2.1.1-1.i386 has unsatisfied Requires: perl(in)
D: Requires: perl(strict) YES (db provides)
D: Requires: perl(vars) YES (db provides)
D: Requires: perl(warnings) YES (db provides)
D: Requires: rpmlib(CompressedFileNames) <= 3.0.4-1 YES (rpmlib provides)
D: Requires: rpmlib(PayloadFilesHavePrefix) <= 4.0-1 YES (rpmlib provides)
D: Requires: rpmlib(VersionedDependencies) <= 3.0.3-1 YES (rpmlib provides)
D: Requires: rtld(GNU_HASH) YES (db provides)
D: opening db index /var/lib/rpm/Conflictname rdonly mode=0x0
D: closed db index /var/lib/rpm/Pubkeys
D: closed db index /var/lib/rpm/Depends
D: closed db index /var/lib/rpm/Conflictname
D: closed db index /var/lib/rpm/Providename
D: closed db index /var/lib/rpm/Basenames
D: closed db index /var/lib/rpm/Packages
D: closed db environment /var/lib/rpm/Packages
error: Failed dependencies:
perl(in) is needed by psad-2.1.1-1.i386
There is no "in" module, so it's no wonder the RPM installation fails. Unfortunately, this problem does not manifest itself when an RPM is built - only when it is installed.

The heredoc functionality is handy in perl, and allows a multi-line block of text to be faithfully reproduced without having to specify syntactic details such as newline characters in normal print statements. With a bug such as illustrated above, it can be hard to track down why certain RPM's do not install properly since heredoc text can be complex and verbose. It is usually easy to restructure the text so that neither "use" nor "require" begin a line, but only if you know that is the fundamental source of the problem.

Chances are that the RPM maintainers are already aware of this bug, but I will submit it to be sure that it hasn't slipped through the cracks.

Trailing Spaces and iptables LOG Prefixes

Trailing Spaces and iptables LOG Prefixes The iptables firewall offers verbose logging capabilities that generate syslog messages for IP packets, and these log messages include the header field values for nearly every interesting field in both the network and transport layer headers. The iptables logging function is implemented by two piece of code; one is the iptables userland binary that accepts a rule specified by the user with the LOG target, and the other is the in-kernel code that generates the appropriate klogd message when a packet matches a LOG rule within the kernel. The userland code that handles LOG rules is implemented by the file iptables/extensions/libipt_LOG.c in the iptables sources, and the kernel code that implements the LOG target is implemented by the file linux/net/ipv4/netfilter/ipt_LOG.c in the kernel sources.

A useful feature is that iptables LOG rules accept up to a 29-character string that is set by the user in order to add descriptive information to a log message when a packet triggers a LOG match. This is used by many open source projects to make iptables logs more effective when packets are logged. For example, fwsnort creates log prefixes such as "[1] REJ SID2180 ESTAB" when the Snort rule ID 2180 "P2P BitTorrent announce request" is detected by the fwsnort iptables policy. This prefix contains the rule number (1 in this case), shows that the TCP connection was terminated with a RST (indicated by the REJ string), and the original signature match took place over an established TCP connection (indicated by ESTAB).

This is all fine, but there is one minor wrinkle that affects many open source projects based around iptables: the LOG target does not enforce a trailing space character on log prefixes. This results in an unfortunate situation where the user can cause iptables to create log messages that break its own notion of tokens that define the shape of the log messages. For example, here is an iptables rule along with its resulting log message when a TCP SYN packet is sent to port 5001: # iptables -I INPUT 1 -p tcp --dport 5001 -j LOG --log-prefix "TEST"
Aug 28 22:04:43 minastirith kernel: TESTIN=eth0 OUT= MAC=00:13:d3:38:b6:e4:00:13:46:c2:60:44:08:00 SRC=192.168.10.3 DST=192.168.10.1 LEN=60 TOS=0x10 PREC=0x00 TTL=63 ID=56577 DF PROTO=TCP SPT=53479 DPT=5001 WINDOW=5840 RES=0x00 SYN URGP=0
Note the "TESTIN=eth0" string above. This string only exists because neither the ipt_LOG.c code distributed with the Linux kernel nor the libipt_LOG.c distributed with iptables adds a trailing space after the "TEST" string. If the rule had been built with "--log-prefix "TEST " (note the trailing space), then the log message would not break its own syntax.

Patches to enforce this behavior for this have been proposed before on the Netfilter development list, but they have not been accepted. The reason given is that the Netfilter maintainers don't want to break any parsers that are already coded to handle log prefixes without the space.

I understand this concern, but I respectfully submit that most major open source projects that build iptables policies add in the space already, and any parser that is worth its salt would therefore have to handle both cases. That is, parsers have to handle log prefixes that contain the trailing space and prefixes that omit it. In addition, all three of the shorewall, ipcop, and fwbuilder projects build in the trailing space to the log prefixes they generate, so parsers are already more likely to handle prefixes with the space than without it because these are popular projects for building iptables policies.

From now on, I will maintain a patch as a part of the psad project that modifies the iptables binary such that if the last character in a log prefix is not a space, then the code will add one automatically. A similar strategy could be implemented in the kernel, but it seems that a less invasive and a more easily deployed solution is to patch the iptables userland binary. A patch that implements this for iptables-1.3.8 has been checked into the psad source tree, and you can download it here. You can apply it as follows, and then recompile and install iptables: $ cd iptables-1.3.8
$ wget -O iptables-1.3.8_LOG_prefix_space.patch http://trac.cipherdyne.org/trac/psad/browser/psad/trunk/patches /iptables-1.3.8_LOG_prefix_space.patch?format=txt
$ patch -p0 < iptables-1.3.8_LOG_prefix_space.patch
patching file extensions/libipt_LOG.c
With this patch, all of your existing iptables scripts will function exactly as they always have; no additional space is added for any iptables log prefix that already has a trailing space - only those that are missing the space will be changed. It is my hope that the Netfilter project will incorporate this patch into iptables.

Profiling psad with Devel::DProf

The Devel::DProf perl module is a code profiler that builds detailed information about a piece of perl code at run time. This handy tool allows the relative time perl devotes to each subroutine to be recorded, and this information is valuable in the optimization process. Using Devel::DProf it is easy to determine the sections of code where psad spends most of its time. Because psad is written to detect as many network attacks as possible via iptables log messages, one might expect that psad devotes significant processing time to signature matching. Let us verify this by using the --Benchmark mode, which creates and sends 10,000 iptables log messages through the psad parsing and signature matching routines like so (note that this is against psad-2.0.4): # perl -d:DProf psad --Benchmark
Mon Feb 19 00:38:06 2007 [+] Entering benchmark mode.
Mon Feb 19 00:38:06 2007 [+] The --packets command line option was not specified.
Mon Feb 19 00:38:06 2007 [+] Defaulting to a 10,000 packet test.
Mon Feb 19 00:38:06 2007 [+] Creating packet array.
Mon Feb 19 00:38:06 2007 [+] check_scan()
Mon Feb 19 00:38:24 2007 [+] Err packets: -1.
Mon Feb 19 00:38:24 2007 [+] Packet creation and processing time: 18 sec.
Mon Feb 19 00:38:24 2007 [+] Exiting benchmark mode.
Now, a new file tmon.out is created in the local directory, and it can be examined using the dprofpp utility which displays perl code profile data:
# dprofpp tmon.out
Total Elapsed Time = 12.98063 Seconds
  User+System Time = 11.83063 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 24.9   2.950 10.806  10001   0.0003 0.0011  main::match_sigs
 11.2   1.329  1.329 501050   0.0000 0.0000  main::check_sig_int_range
 11.1   1.320  1.320 160016   0.0000 0.0000  Net::IPv4Addr::ipv4_chkip
 9.13   1.080  1.110  10001   0.0001 0.0001  main::parse_NF_pkt_str
 9.04   1.070  2.390 160016   0.0000 0.0000  Net::IPv4Addr::ipv4_parse
 8.71   1.030  1.960  80008   0.0000 0.0000  Net::IPv4Addr::ipv4_network
 8.28   0.980 13.104      1   0.9799 13.104  main::check_scan
 6.25   0.740  4.470 110011   0.0000 0.0000  main::check_sig_ip
 6.09   0.720  3.387  20127   0.0000 0.0002  main::match_snort_keywords
 5.91   0.699  1.929 150140   0.0000 0.0000  \
main::match_snort_ip_keywords
 3.71   0.439  0.738  80133   0.0000 0.0000  \
main::match_snort_tcp_keywords
 2.62   0.310  3.730  40004   0.0000 0.0001  \
Net::IPv4Addr::ipv4_in_network
 1.69   0.200  0.200 110136   0.0000 0.0000  main::check_sig_ipopts
 1.44   0.170  0.180      1   0.1700 0.1799  main::import_snort_rules
 1.18   0.140  0.140  20002   0.0000 0.0000  main::check_range
From the above output it is easy to see that the match_sigs() function takes the largest percentage of processing time (24.9%) of any psad function. This function makes a good candidate for any optimization efforts. The next two functions check_sig_int_range() and Net::IPv4Addr::ipv4_chkip() (from the Net::IPv4Addr perl module) are dedicated to checking signature port numbers and IP address against iptables log messages. The parse_NF_pkt_str() is the main routine for parsing iptables log messages and building a hash that describes each log message field. This routine takes over 9% of psad's processing time, so optimizing the regular expressions psad uses to parse iptables log messages might provide a significant additional speedup.

Website Editing from the Perl Command Line

The cipherdyne.org website is completely maintained by a set of custom perl scripts for automatic editing of HTML pages, verifcation of page structure with XML::Simple, building release tarballs and RPM files of the cipherdyne.org software projects, and deploying the site to both staging and production web servers. Because a key requirement for this site is to run as few pieces of code as possible for security reasons, CGI scripts are kept to a minimum. All of the blog postings and pages dedicated to the four software projects available here (psad, fwknop, fwsnort, and gpgdir) are pure HTML that is edited and updated by a set of perl scripts. Note that there are excellent content management solutions out there such as Plone, Drupal, and Joomla for building complete database-driven websites for online communities and the like, but I do not yet need such a comprehensive solution.

Sometimes I run into a situation where I need to apply a global change to all of the cipherdyne.org HTML pages, and at the same time it is not necessary to update my perl scripts to account for the change because it affects content that only changes once. In this situation a single perl command with a judiciously chosen regular expression can do the trick.

For example, I include the meta tag revisit-after at the top of many of the HTML pages. Initially, I set the revisit time interval to 31 days like so: <meta name="revisit-after" content="31 days" /> However, because the cipherdyne.org site usually changes significantly more rapidly than this, I decided to shorten the time interval to two days. With a single perl command combined with the find and xargs command, every .html page in a directory structure (including recursively in all subdirectories) can be updated: $ find /path/to/webroot -name '*.html' | xargs perl -p -i -e 's|revisit-after"\s+content="31\s+days"|revisit-after" content="2 days"|' All of the cipherdyne.org pages are stored within a Subversion repository, so checking that the above command did the right thing is easy: $ svn diff 14.html
Index: 14.html
===================================================================
--- 14.html (revision 590)
+++ 14.html (working copy)
@@ -4,7 +4,7 @@
 <head>
 <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
 <meta name="robots" content="all" />
-<meta name="revisit-after" content="31 days" />
+<meta name="revisit-after" content="2 days" />

 <title>Website Editing from the Perl Command Line<title>
One could argue that this information should be kept in a global header that is applied programmatically to all HTML pages. This way the "revisit-after" attribute would only have to be changed in one place for it to be applied to all HTML pages. This is a good argument for this particular class of change, but let us consider a more complex example. At one point I decided that I would like all links within the body of a blog posts (such as the one you are reading now) to be highlighted in bold characters. I had been doing this for some links by manually including "<b>" and "</b>" (bold) tags within the text portion of the standard "<a href=...>" tags. But, after deciding to make all links in bold characters, a simple change to the stylesheet used by cipherdyne.org could accomplish this without using the bold tags. However, all of the existing blog posts still contained links with the bold tags embedded, so I needed a quick way to remove them. Again, perl to the rescue: find /path/to/webroot/ -name '*.html' |xargs perl -p -i -e 'undef $/; s|<a\s+href=(.*?)> \s*<b>\s*(.*?)\s*</b>\s*</a>|<a href=$1>$2</a>|gs' This time things are a bit more complicated. Note the usage of "undef $/" so that each .html file is slurped into a single string so that the regex can match links that span multiple lines. Also, note the usage of the "?" which turns the .* into a non-greedy match so that only the minimal text that qualifies as valid link and descriptive text is matched for each link. A quick check against Subversion is in order to make sure the command worked properly: $ svn diff 2003/10/01.html Index: 2003/10/01.html =================================================================== --- 2003/10/01.html (revision 589) +++ 2003/10/01.html (working copy) @@ -42,7 +42,7 @@  <a class="link" href="/gpgdir/download/">download<a>  <br/><br/>  <div class="createdate">Posted by Michael Rash on 2003/10/01 -| <a href="/blog/categories/software-releases.html"><b> Software Releases</b></a> +| <a href="/blog/categories/software-releases.html">Software Releases</a>  <div>  <span class="article_separator">nbsp;<span>  <div>

Verifying Proper Memory Handling in psad with Valgrind

Valgrind Memory Validation I have started using the excellent Valgrind project to ensure the proper handling of heap allocated memory within the portions of psad that are written in C (kmsgsd, and psadwatchd). The following example is from psad-2.0.3, and please note that this bug is fixed in psad-2.0.4. For reference, kmsgsd is the psad daemon that is responsible for collecting iptables log messages from a named pipe that is written to by syslog.
Assuming that valgrind is installed, and the psad-2.0.3 sources are unpacked in the local directory, the first thing is to compile kmsgsd and psadwatchd in debug mode. This is just so that neither program calls fork() to become a daemon: $ cd psad-2.0.3
$ make debug
/usr/bin/gcc -Wall -g -DDEBUG kmsgsd.c psad_funcs.c strlcpy.c strlcat.c -o kmsgsd
/usr/bin/gcc -Wall -g -DDEBUG psadwatchd.c psad_funcs.c strlcpy.c strlcat.c -o psadwatchd
Now, let us run kmsgsd (as root) underneath valgrind (some output has been abbreviated): # valgrind --leak-check=full ./kmsgsd
==424== Memcheck, a memory error detector.
==424== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==424== Using LibVEX rev 1715, library for dynamic binary translation.
==424== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==424== Using valgrind-3.2.2, dynamic binary instrumentation framework
. ==424== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==424== For more details, rerun with: -v
==424==
[+] Entering DEBUG mode
[+] Firewall messages will be written to both STDOUT _and_ to fwdata.

[+] parsing config_file: /etc/psad/kmsgsd.conf

==424== Invalid write of size 1
==424== at 0x804A165: strlcpy (strlcpy.c:52)
==424== by 0x8049763: parse_fw_search_file (kmsgsd.c:407)
==424== by 0x8048C53: main (kmsgsd.c:116)
==424== Address 0x415E350 is 0 bytes after a block of size 8 alloc'd
==424== at 0x401C5F1: malloc (vg_replace_malloc.c:149)
==424== by 0x8049736: parse_fw_search_file (kmsgsd.c:405)
==424== by 0x8048C53: main (kmsgsd.c:116)
So, it looks like the call to strlcpy() at kmsgsd.c line 407 is causing the problem. Indeed this code is the culprit: fw_msg_search[num_fw_search_strings] = (char *) malloc(strlen(tmp_fw_search_buf));
strlcpy(fw_msg_search[num_fw_search_strings], tmp_fw_search_buf, MAX_GEN_LEN);
The problem is that the strlcpy() function uses the size parameter (MAX_GEN_LEN in this case) as the size of the destination buffer. But, the above code shows that the size of the destination buffer is not MAX_GEN_LEN and is defined instead by a call to the strlen() function. Hence the fix is as follows (note the additional of the +1 as well to accomodate the ending NULL char and use of safe_malloc() which is a malloc() wrapper function - this is coming in psad-2.0.5): fw_msg_search[num_fw_search_strings] = (char *) safe_malloc(strlen(tmp_fw_search_buf)+1);
strlcpy(fw_msg_search[num_fw_search_strings], tmp_fw_search_buf, strlen(tmp_fw_search_buf)+1);
Because (to my knowledge) there has never been a crash in kmsgsd as a result of the bug above, it would have been hard to track down without a tool like valgrind.