Man Linux: Main Page and Category List

NAME

       cube_dispatcher - PgQ consumer that is used to write source records
       into partitoned tables

SYNOPSIS

           cube_dispatcher.py [switches] config.ini

DESCRIPTION

       cube_dispatcher is PgQ consumer that reads url encoded records from
       source queue and writes them into partitioned tables according to
       configuration file. Used to prepare data for business intelligence.
       Name of the table is read from producer field in event. Batch creation
       time is used for partitioning. All records created in same day will go
       into same table partion. If partiton does not exist cube dispatcer will
       create it according to template.

       Events are usually procuded by pgq.logutriga(). Logutriga adds all the
       data of the record into the event (also in case of updates and
       deletes).

       cube_dispatcher can be used in to modes:

       keep_all
           keeps all the data that comes in. If record is updated several
           times during one day then table partiton for that day will contain
           several instances of that record.

       keep_latest
           only last instance of each record is kept for each day. That also
           means that all tables must have primary keys so cube dispatcher can
           delete previous versions of records before inserting new data.

QUICK-START

       Basic cube_dispatcher setup and usage can be summarized by the
       following steps:

        1.  pgq and logutriga must be installed in source databases. See
           pgqadm man page for details. target database must also have pgq_ext
           schema.

        2.  edit a cube_dispatcher configuration file, say
           cube_dispatcher_sample.ini

        3.  create source queue

               $ pgqadm.py ticker.ini create <queue>

        4.  create target database and parent tables in it.

        5.  launch cube dispatcher in daemon mode

               $ cube_dispatcher.py cube_dispatcher_sample.ini -d

        6.  start producing events (create logutriga trggers on tables) CREATE
           OR REPLACE TRIGGER trig_cube_replica AFTER INSERT OR UPDATE ON
           some_table FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga(<queue>)

CONFIG

   Common configuration parameters
       job_name
           Name for particulat job the script does. Script will log under this
           name to logdb/logserver. The name is also used as default for PgQ
           consumer name. It should be unique.

       pidfile
           Location for pid file. If not given, script is disallowed to
           daemonize.

       logfile
           Location for log file.

       loop_delay
           If continuisly running process, how long to sleep after each work
           loop, in seconds. Default: 1.

       connection_lifetime
           Close and reconnect older database connections.

       use_skylog
           foo.

   Common PgQ consumer parameters
       pgq_queue_name
           Queue name to attach to. No default.

       pgq_consumer_id
           Consumers ID to use when registering. Default: %(job_name)s

   Config options specific to cube_dispatcher
       src_db
           Connect string for source database where the queue resides.

       dst_db
           Connect string for target database where the tables should be
           created.

       mode
           Operation mode for cube_dispatcher. Either keep_all or keep_latest.

       dateformat
           Optional parameter to specify how to suffix data tables. Default is
           YYYY_MM_DD which creates per-day tables. With YYYY_MM per-month
           tables can be created. If explicitly set empty, partitioning is
           disabled.

       part_template
           SQL fragment for table creation. Various magic replacements are
           done there:

           _PKEY           comma separated list of
                           primery key columns.
           _PARENT         schema-qualified parent
                           table name.
           _DEST_TABLE     schema-qualified partition
                           table.
           _SCHEMA_TABLE   same as DEST_TABLE but
                           dots replaced with "_", to
                           allow use as index names.

   Example config file
           [cube_dispatcher]
           job_name          = some_queue_to_cube

           src_db            = dbname=sourcedb_test
           dst_db            = dbname=dataminedb_test

           pgq_queue_name    = udata.some_queue

           logfile           = ~/log/%(job_name)s.log
           pidfile           = ~/pid/%(job_name)s.pid

           # how many rows are kept: keep_latest, keep_all
           mode = keep_latest

           # to_char() fmt for table suffix
           #dateformat = YYYY_MM_DD
           # following disables table suffixes:
           #dateformat =

           part_template =
                create table _DEST_TABLE (like _PARENT);
                alter table only _DEST_TABLE add primary key (_PKEY);

LOGUTRIGA EVENT FORMAT

       PgQ trigger function pgq.logutriga() sends table change event into
       queue in following format:

       ev_type

           (op || ":" || pkey_fields). Where op is either "I", "U" or "D",
           corresponging to insert, update or delete. And pkey_fields is
           comma-separated list of primary key fields for table. Operation
           type is always present but pkey_fields list can be empty, if table
           has no primary keys. Example: I:col1,col2

       ev_data
           Urlencoded record of data. It uses db-specific urlecoding where
           existence of = is meaningful - missing = means NULL, present =
           means literal value. Example: id=3&name=str&nullvalue&emptyvalue=

       ev_extra1
           Fully qualified table name.

COMMAND LINE SWITCHES

       Following switches are common to all skytools.DBScript-based Python
       programs.

       -h, --help
           show help message and exit

       -q, --quiet
           make program silent

       -v, --verbose
           make program more verbose

       -d, --daemon
           make program go background

       Following switches are used to control already running process. The
       pidfile is read from config then signal is sent to process id specified
       there.

       -r, --reload
           reload config (send SIGHUP)

       -s, --stop
           stop program safely (send SIGINT)

       -k, --kill
           kill program immidiately (send SIGTERM)

                                  09/22/2008