salinfo_decode − decode Itanium SAL error records
salinfo_decode [−d] [−i pct] [−s pct] [−l limit] [−T filename] −t type −D directory
salinfo_decode [−d] filename
salinfo_decode extracts and decodes CMC/CPE/MCA/INIT records from SAL. It can decode a saved record from a file, or it can request records from the kernel, decode them, save the raw and decoded records, and clear them from SAL.
OPTIONS
−d |
Each -d increments the debug level. |
||
−i pct |
A persistent error such as bad memory can generate a lot of records. To prevent a persistent error from using all the inodes on the filesystem containing the SAL logs, specify −i pct. If the percentage of inodes used in the filesystem containing the SAL logs is above this percentage then salinfo_decode will stop writing records. The records are still cleared from SAL, so they are lost forever. A count of the number of lost records is kept and written to syslog occasionally. This option can only be used with −D. Note: Not all filesystems have a fixed number of inodes, some will dynamically add new inodes as required. Using −i pct for such filesystems makes little sense. |
||
−s pct |
Like −i pct, −s pct will stop writing records if the percentage of space used on the SAL log filesystem is above pct. |
−l limit
If more than limit records of this type occur within a minute then drop the additional records. A count of the number of lost records is kept and written to syslog occasionally. This option can only be used with −D. Note: The limit should be larger than the number of cpus in your system to cope with reading the saved SAL records at boot, they are all processed at approximately the same time.
−T filename
For each record that is written to −D, write a trigger line to filename. The trigger line contains the base filename in the first field, followed by the options that were passed to salinfo_decode, with −t type and −D directory as the first two options. A post processing program can monitor the trigger file and perform any operation on the raw or decoded records, including erasing them. If the post processing program erases the records then it should erase the decoded record before the raw record, to avoid conflicts with the calculation of the filename suffix. If writing to filename would block then salinfo_decode discards the trigger line. A count of the number of lost triggers is kept and written to syslog occasionally. This option can only be used with −D.
−t type
Specifies the type of record to monitor. Must be one of "cmc", "cpe", "mca", or "init" in lower case. The type is used as the third qualifier to access /proc/sal/type/{event,data}.
−D directory
Specifies the directory where the raw records and the decoded text will be written. The raw record is written to directory/raw, the decoded text is written to directory/decoded. The filenames are constructed from the record timestamp (year, month, day, hours, minutes, seconds), the record type, the cpu number and a suffix starting at ’.0’ to separate multiple events with the same timestamp.
If either type or directory is specified, then both are required. If neither is supplied, then a filename must be supplied.
OPERATION
When type and directory are supplied, salinfo_decode will open /proc/sal/type/event and wait until the kernel supplies the number of a cpu that has a record of this type. salinfo_decode then :-
* |
Reads the record from the kernel. |
||
* |
Extracts the timestamp. |
||
* |
Generates a unique filename from the timestamp, type, and cpu number. |
||
* |
If the raw record matches an entry in directory/raw then the new record is discarded, with a syslog entry listing the duplicate name, otherwise ... |
||
* |
Writes the raw record to directory/raw. |
||
* |
Decodes the raw record into directory/decoded, calling salinfo_decode_oem to decode any OEM data as required (only if salinfo_decode_oem exists). |
||
* |
Clears the record from SAL. |
||
* |
Waits for another record of this type. |
When only a filename is specified, salinfo_decode assumes it is a raw record, reads it, and decodes it without invoking SAL.
The trigger filename is provided to make any post processing more efficient, by avoiding frequent polling in the post processing program. However the post processing program should not assume that it receives a trigger line for every SAL record, there are many cases where the trigger may be lost. This includes any time that salinfo_decode is running but the post processing program is not, especially at boot. It also includes when the post processing is slower than the rate at which SAL records are being generated. The post processor should periodically scan the SAL log directories for any records that have not been processed yet, however this can be done every few hours, instead of every few seconds.
SYSLOG MESSAGES
If salinfo_decode has to drop records for any reason, it records the number of dropped records and the reason that they were dropped. Every 30 minutes, or when an ALRM or HUP signal is received, salinfo_decode logs the number and reason of dropped records to syslog, but only if there have been any dropped records since the last time it checked. The syslog messages are of the form
salinfo_decode[<pid>]: <n> <type> records dropped since <date>,
followed by the number of records that were dropped due to restrictions set by -i pct, -s pct and -l limit. If -T was specified and any trigger records have been dropped (but the original record was processed) then the log entry reads
salinfo_decode[<pid>]: <n> <type> trigger records dropped since <date>
OEM DATA
The Itanium SAL specification defines the overall structure of SAL error records, but the records may contain platform-specific information. To decode platform-specific OEM data, salinfo_decode attempts to invoke a program called salinfo_decode_oem. If that program does not exist, or it exists but does not decode the OEM data, the OEM data is printed in hex. salinfo_decode_oem is invoked with the same file descriptors as salinfo_decode. In particular, both programs can access the file descriptor used to read the raw record.
Communication between salinfo_decode and salinfo_decode_oem is via a pair of pipes, which salinfo_decode_oem sees as file descriptors 0 and 1. To decode OEM data, salinfo_decode writes this data to salinfo_decode_oem via a pipe :-
* |
A line of "==== salinfo_decode_oem start ====". |
||
* |
A variable number of lines of the form "key=value". salinfo_decode_oem must ignore keys that it does not recognise. At a minimum, values must be supplied for these keys :- |
-
fd_data - file descriptor for accessing the raw record. |
|||
- |
use_sal - 1 if the raw record is being accessed via SAL. |
||
- |
cpu - the cpu that the record belongs to, -1 for records that are not being read directly from SAL. |
||
- |
raw_length - the length of the raw record. |
||
- |
oem_section_offset - the offset of the section containing the OEM data to decode. |
||
* |
A line of "==== salinfo_decode_oem record ====".
* |
The raw record, exactly raw_length bytes, followed by a newline. |
||
* |
A line of "==== salinfo_decode_oem end ====". |
salinfo_decode_oem reads the above data on fd 0, decodes the OEM data if possible and writes the decoded output on its fd 1. Output from salinfo_decode_oem consists of :-
* |
A line of "==== salinfo_decode_oem start ====". |
||
* |
The decoded OEM data. If salinfo_decode_oem cannot decode this OEM data, it returns no data here, and salinfo_decode will print the OEM data in hex. |
||
* |
A line of "==== salinfo_decode_oem end ====". |