petalert.pl ----------------------- This is a snmptrapd handler script to alert when Platform Event Traps (PET) occur. It was written because traptoemail distributed with net-snmp-5.3.2.2 is incapable of handling multi-line hexstrings and restricted to email alert. This script operates in two modes, traphandle or embperl. When in traphandle mode, it concatenates the quoted hex string into one long line, then builds structures to resemble embperl mode. Both modes then invokes helper decoder, ipmi-pet(8) from FreeIPMI, parses the output and alerts in given way like email, nagios external command, nsca, etc. 1. REQUIREMENTS freeipmi-1.1.1 and above is required for the script to function. Both FreeIPMI and the script imply Unix-like system, notably GNU/Linux; Windows is not supported as of this writing, Dec 13, 2011. Net-SNMP 5.3.2.2 and above is required to make a complete alerting solution. Actually only snmptrapd is related which acts as the trap receiver. If you prefer to running it as embedded perl handler, your version of Net-SNMP should have embedded perl support enabled, see "Embedded Perl Support" from snmpd.conf(8) for more information. Usually it's enabled, and you can verify with the following command: # net-snmp-config --configure-options | tr ' ' '\n' | grep perl '--enable-embedded-perl' If you prefer to invoking the handler directly rather than invoking it with perl(1), make sure the script itself has execute permission. Both cases require a working Perl installation, better Perl-5.8.8. If you prefer to built-in email alerting, make sure Net::SMTP is installed. If you prefer to Nagios monitoring system, make sure the Nagios process and snmptrapd is on the same host. Usually you don't need to worry about write permission of Nagios external command file, because the handler is invoked as root by snmptrapd. If that's not your case, you need to ensure write permission on the command file. You might prefer to other alerting methods, bad news is it is not implemented yet. Please drop me a mail, then I might take my time to go on with plugin support. Paranoids might check firewall rules allowing only traffic from trusted hosts. 2. CONFIGURATION (Note backslash-newline concatenates adjacent lines, so put them in one) Put a line like these in your snmptrapd.conf file: traphandle .1.3.6.1.4.1.3183.1.1 /usr/bin/perl \ /usr/share/doc/freeipmi/contrib/pet/petalert.pl --mode=traphandle \ --alert=email --sdrcache SDRCONF -- -f FROM -s SMTPSERVER ADDRESSES Or, if you prefer embedded perl, perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl"; perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \ --alert=email -- -s SMTPSERVER -f FROM ADDRESSES)); where: only --mode is required, see "petalert.pl -h". Make sure execute permission is granted to execute handlers, for example, authCommunity execute COMMUNITY_STRING see "ACCESS CONTROL" from snmptrapd.conf(8) for more information. Bad news is that you have to use numeric representation, so in addition add "-Of -On" to snmptrapd options. You have to enable PET on IPMI nodes as well, including LAN access, PEF alerting, community, alert policy and destination. You may use ipmi-config from FreeIPMI to do the configuration (use --category to checkout core and pef category of configuration). See "IPMI NODES". You might wish to set up PTR records for IPMI nodes, otherwise, snmptrapd reports to traphandle and the script will fall back to use ip. 2.1 ACKNOWLEDGE Platform event trap is over UDP, you might worry about trap loss. IPMI spec allows the trap receiver to acknowledge the trap. Use --ack to acknowledge the trap before alerting. You may need workarounds for acknowledgement. See BUGS. So in a acknowledge setup, it might be like this: perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl"; perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \ --ack -W malformedack \ --alert=email -- -s SMTPSERVER -f FROM ADDRESSES)); 2.2 NAGIOS INTEGRATION Nagios monitoring system could be plugged into by writing to its external command file as passive check. See ipminodes.cfg and check_rmcping for related Nagios configuration. Assuming Nagios process is local, use: perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl"; perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \ --alert=nagios -- -H short -S PET NAGIOS_COMMAND_FILE)); where "-H short" means if 10.2.3.4 resolves to foo.example.com, Nagios passive check gets foo as host; use "-H fqdn" to pass foo.example.com to Nagios. In addition, "-S PET" sets service description. If Nagios process is on remote host, normally you turns to NSCA which consists of NSCA daemon on the Nagios host and the send_nsca client program. To alert by send_nsca, perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl"; perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \ --alert=nsca -- --prog /usr/bin/send_nsca -H short -S PET \ -- -H NAGIOS_HOST -c SEND_NSCA_CONF)); Notice the unattached -- appears two times in the configuration line separating three steps of arguments processing, namely generic args, alert specific args, and external helper args. 3. SDR CACHE FILE MAPPING Notice the underlying helper program ipmi-pet(8) normally depends on some sdr cache file, either preinitialized or created on demand. If no credential is supplied, ipmi-pet(8) simply assumes localhost and creates sdr cache which is usually ~/.freeipmi/sdr-cache/sdr-cache-.localhost. You may wish to supply preinitialized ones, then use -c sdrmapping.conf to associate them with IPMI nodes. The sdr cache config syntax is: every unindented line starts an sdr cache file, followed by any number of indented lines of IPMI nodes. Every IPMI node line may consist of multiple nodes delimited by whitespaces. Comments follow Shell-style, trailing whitespaces are trimmed, empty lines skipped. For example, |/path/to/sdr-cache-file-1 | 10.2.3.10 # comment | |/path/to/sdr-cache-file-2 | 10.2.3.4 # one node | 10.2.3.5 10.2.3.6 # two nodes | 10.2.3.[7-9] # trhee nodes in range form | ^-- this is the beginning of lines 3.1 SDR CACHE INITIALIZATION The sdr cache file can be initialized by ipmi-sel(8) and the --sdr-cache-file option. # ipmi-sel -h 10.2.3.4 -u root -P --sdr-cache-file=/path/to/sdr-cache-file-X Password: Caching SDR repository information: /path/to/sdr-cache-file-X Caching SDR record 125 of 125 (current record ID 125) ID | Date | Time | Name | Type | Event 1 | Dec-12-2011 | 16:41:51 | SEL | Event Logging Disabled | Log Area Reset/Cleared ... 4. IPMI NODES For PET to be generated, configurations on IPMI nodes have to be done, including LAN access, PEF alerting, trap community, alert policy and destination. You may use ipmi-config from FreeIPMI to do the configuration (use --category to checkout core and pef category of configuration). However, before doing configurations and facing unexpected firmware issues, you'd better verify that the trap receiver end works well. Simply modify the following example traphandle input to meet your setup, then feed it to stdin of petalert.pl like this, assuming you prefer to alert email: # perl petalert.pl -D :all --mode=traphandle --sdrcache SDRCONF \ --alert=email -- -f FROM -s SMTPSERVER ADDRESSES <; $x=eval "@v"; print join(" ", $x->[0], @{$x->[1]})."\n"' [ '/usr/sbin/ipmi-pet', [ '--pet-acknowledge', '-h', '10.2.3.4', '356096', '44', '45', '4C', '4C', ... ] ] Ctrl-D /usr/sbin/ipmi-pet --pet-acknowledge -h 10.2.3.4 356096 44 45 4C 4C ... Then you could simply paste the command in the shell to simulate a manual acknowledge. Looks like acknowledge requests without previous PET is also accepted and responded as usual. snmptrapd(8) itself allows for logging of traps into syslog which requires log permission, see "ACCESS CONTROL" from snmptrapd.conf for more information. NSCA daemon logs to syslog, set "debug=1" in nsca.cfg to get detailed connection handling. Nagios is also able to log to syslog, set "use_syslog=1" in nagios.cfg to help debugging alert. 6. PET TRAFFIC On a no-acknowledge setup, usually there should be only one packet on behalf of the PET from the ipmi node targeting the trap receiver, however, firmware defect was spotted resulting in additional traffic, see BUGS. On an acknowledge setup, there should be three packets per event, one PET, one PET acknowledge request from trap receiver targeting the ipmi node, and one PET acknowledge response in the other direction. More bugs were spotted, see BUGS. Any setup, packets could be captured like this # tcpdump -i any -nn -vvv -s0 -w pet.pcap 'host 10.2.3.4 and udp' Then you can browse the interactions with the help of Wireshark. 7. BUGS It's spotted that factory default rules of iDRAC Express on Dell PowerEdge R610 don't match software generated events. You need to make a catch-all filter rule to report those events. However, hardware generated events are not subject to such limitation. To verify this situation, open the case to generate a hardware generated intrusion event. Dell PowerEdge 1950 with BMC has similar problem. The difference is that 1950 has 31 filter rules, so you don't worry about overwriting an existent one. It's spotted that Dell PowerEdge 1950 with BMC suffers clock drift, remarkably SEL timestamps. bmc-device(8) from FreeIPMI could be used to adjust SEL time and SDR repository time, # bmc-device --set-sdr-repository-time=now # bmc-device --set-sel-time=now Notice that 'now' refers to current timestamp on the host where the commands are issued. bmc-device(8) works out of band, so simply issue the commands on a host where clock is synchronized. It's spotted that iDRAC Express on Dell PowerEdge R610 generates two traps per hardware event. Notice session id from the two traps differ, they are different traps instead of duplication, even though other contents of payload are identical. It's spotted that iDRAC Express on Dell PowerEdge R610 produces malformed PET acknowledge responses. In this case, ipmi-pet exits with timeout error "ipmi_cmd_pet_acknowledge: message timeout". You may use '-W malformedack', which is simply passed through, to instruct the underlying helper ipmi-pet(8) to disable such detection and to immediately return. Timeout hurts snmptrapd because slow handler hinders the main loop. To discover potential time consuming cases, use "-D perf" and observe the log. It's spotted that some DNS servers return "localhost" on private ip addresses rather than NXDOMAIN, in this case, snmptrapd(8) passes "localhost" as resolved hostname to petalert.pl which is confused. You'd better switch to a correctly configured DNS server, or contact the administrator to solve the problem. Kaiwang Chen kaiwang.chen@gmail.com