blob: 26185520f216034fe689c56c878a1df7079e2130 [file] [log] [blame]
**
** rteval-parsed - the rteval XML report parser
**
The purpose of the daemon is to off load the web server from the heavy duty
work of parsing and processing the rteval XML reports. The XML-RPC server
will receive the reports and put the files in a queue directory on the
file system and register the the submission in the database. This will notify
the rteval-parsed that a new report has been received and it will start
processing that file independently of the web/XML-RPC server.
** Installing the software
!! Please install also the rteval-xmlrpc package and read the !!
!! README.xmlrpc file also for setting up and preparing the !!
!! database which the rteval-parserd program will be using. !!
!! This file will also contain information regardingupgrading !!
!! the database. !!
When installing this application from a binary package, like RPM
files on Fedora/RHEL based boxes, you should have the rteval-parserd
in your $PATH. Otherwise, when installing from sources, the configure
script defines the default paths.
** Configure rteval-parsed
When starting the rteval-parserd via the init.d script (or via the 'service'
command on RHEL/Fedora distributions) it will use the values configured in
/etc/sysconfig/rteval-parserd.
The available parameters are:
- NUM_THREADS
When this is not defined, the default behaviour is to use the number
of available CPU cores. The init.d script will detect this
automatically.
- LOG
This defines how logging will be done. See the rteval-parserd
arguments description further down in the document for more
information.
- LOGLEVEL
Defines how verbose the logging will be. See the rteval-parserd
arguments description further down in the document for more
information.
- CONFIGFILE
The default configuration file rteval-parserd will try to read is
/etc/rteval.conf. See the next paragraph for more information about
this file. This argument let you override the default config file.
- PIDFILE
Defines where the init.d script will put the PID file for the
rteval-parserd process. The default is /var/run/rteval-parserd.pid
This daemon uses the same configuration file as the rest of the rteval program
suite, /etc/rteval.conf. It will parse the section named 'xmlrpc_parser'.
The default values are:
- xsltpath: /usr/share/rteval
Defines where it can find the xmlparser.xsl XSLT template
- db_server: localhost
Which database server to connect to
- db_port: 5432
Which port to use for the database connection
- database: rteval
Which database to make use of.
- db_username: rtevparser
Which user name to use for the connection
- db_password: rtevaldb_parser
Which password to use for the authentication
- reportdir: /var/lib/rteval/report
Where to save the parsed reports
- threads: 4
Number of worker threads. This defines how many reports you will
process in parallel. The recommended number here is the number
of available CPU cores, as having a higher thread number often
punishes the performance. The default value is 4 when rteval-parserd
is started directly. When started via the init.d script, the default
is to start one thread per CPU core.
- max_report_size: 2097152
Maximum file size of reports which the parser will process. The
default value is 2MB. The value must be given in bytes. Remember
that this value is per thread, and that XML and XSLT processing can
be quite memory hungry. If this value is set too high or you have too
many worker threads, your system might become unresponsive for a while
and the parser might be killed by the kernel (OOM).
- measurement_tables: cyclic_statistics, cyclic_histogram, hwlatdetect_summary, hwlatdetect_samples
Declares which measurement results will be parsed and stored in the
database. These names are referring to table definitions in the
xmlparser.xsl XSLT template. The definitions in this template tells
rteval-parsed which data to extract from the rteval summary.xml report
and where and how to store it in the database.
** rteval-parserd arguments
-d | --daemon Run as a daemon
-l | --log <log dest> Where to put log data
-L | --log-level <verbosity> What to log
-f | --config <config file> Which configuration file to use
-t | --threads <num. threads> How many worker threads to start (def: 4)
-h | --help This help screen
- Configuration file
By default the program will look for /etc/rteval.conf. This can be
overridden by using --config <config file>.
- Logging
When the program is started as a daemon, it will log to syslog by default.
The default log level is 'info'. When not started as a daemon, all logging
will go to stderr by default.
The --log argument takes either 'destination' or a file name. Unknown
destinations are treated as filenames. Valid 'destinations' are:
stderr: - Log to stderr
stdout: - Log to stdout
syslog:[facility] - Log to syslog
<file name> - Log to given file
For syslog the default facility is 'daemon', but can be overridden by using
one of the following facility values:
daemon, user and local0 to local7
Log verbosity is set by the --log-level. The valid values here are:
emerg, emergency - Only log errors which causes the program to stop
alert - Incidents which needs immediate attention
crit, critical - Unexpected incidents which is not urgent
err, error - Parsing errors. Issues with input data
warn, warning - Incidents which may influence performance
notice - Less important warnings
info - General run information
debug - Detailed run information, incl. thread operation
- Threads
By default, the daemon will use five threads. One for the main threads which
processes the submission queue and notifies the working threads. The four
other threads are worker threads, which will process the received reports.
Each of the worker threads will have its own connection to the database. This
connection will be connected to the database as long as the daemon is running.
It is therefore important that you do not have more worker threads than
available database connections.
** POSIX Message Queue
The daemon makes use of POSIX MQ for distributing work to the worker threads.
Each thread lives independently and polls the queue regularly for more work.
As the POSIX MQ has a pretty safe mechanism of not duplicating messages in the
implementation, no other locking facility is needed.
On Linux, the default value for maximum messages in the queue are set to 10.
If you receive a lot of reports and the threads do not process the queue
quickly enough, it will fill up pretty quickly. If the queue is filled up,
the main thread which populates the message queue will politely go to sleep
for one minute before attempting to send new messages. To avoid this, consider
to increase the queue size by modifying /proc/sys/fs/mqueue/msg_max.
When the daemon initialises itself, it will read this file to make sure it
uses the queue to the maximum, but not beyond that.
** PostgreSQL features
The daemon depends on the PostgreSQL database. It is written with an
abstraction layer so it should, in theory, be possible to easily adopt it to
different database implementation.
In the current implementation, it makes use of PostgreSQL's LISTEN, NOTIFY and
UNLISTEN features. A trigger is enabled on the submission queue table, which
sends a NOTIFY whenever a record is inserted into the table. The rteval-parser
daemon listens for these notifications, and will immediately poll the table
upon such a notification.
Whenever a notification is received, it will always parse all unprocessed
reports. In addition it will also only listen for notifications when there
are no unprocessed reports.
The core PostgreSQL implementation is only done in pgsql.[ch], which provides an
abstract API layer for the rest of the parser daemon.
** Submission queue status codes
In the rteval database's submissionqueue table there is a status field. The
daemon will only consider records with status == 0 for processing. It do not
consider any other fields. For a better understanding of the different status
codes, look into the file statuses.h.