PZLABRD - reduce the first NB rows and columns of a complex general M-

NAME

       PZLABRD  - reduce the first NB rows and columns of a complex general M-
       by-N distributed matrix sub( A ) = A(IA:IA+M-1,JA:JA+N-1) to  upper  or
       lower  bidiagonal  form  by  an  unitary transformation Q’ * A * P, and
       returns the matrices X and Y which are needed to  apply  the  transfor-
       mation to the unreduced part of sub( A )

SYNOPSIS

       SUBROUTINE PZLABRD( M,  N,  NB,  A, IA, JA, DESCA, D, E, TAUQ, TAUP, X,
                           IX, JX, DESCX, Y, IY, JY, DESCY, WORK )

           INTEGER         IA, IX, IY, JA, JX, JY, M, N, NB

           INTEGER         DESCA( * ), DESCX( * ), DESCY( * )

           DOUBLE          PRECISION D( * ), E( * )

           COMPLEX*16      A( * ), TAUP( * ), TAUQ( * ), X( * ), Y( * ), WORK(
                           * )

PURPOSE

       PZLABRD  reduces  the first NB rows and columns of a complex general M-
       by-N distributed matrix sub( A ) = A(IA:IA+M-1,JA:JA+N-1) to  upper  or
       lower  bidiagonal  form  by  an  unitary transformation Q’ * A * P, and
       returns the matrices X and Y which are needed to  apply  the  transfor-
       mation to the unreduced part of sub( A ).

       If  M  >= N, sub( A ) is reduced to upper bidiagonal form; if M < N, to
       lower bidiagonal form.

       This is an auxiliary routine called by PZGEBRD.

       Notes
       =====

       Each global data object  is  described  by  an  associated  description
       vector.   This  vector stores the information required to establish the
       mapping between an object element and  its  corresponding  process  and
       memory location.

       Let  A  be  a generic term for any 2D block cyclicly distributed array.
       Such a global array has an associated description vector DESCA.  In the
       following  comments,  the  character _ should be read as "of the global
       array".

       NOTATION        STORED IN      EXPLANATION
       ---------------  --------------  --------------------------------------
       DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row  of  the  array  A  is  distributed.
       CSRC_A (global) DESCA( CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let K be the number of rows or columns of  a  distributed  matrix,  and
       assume that its process grid has dimension p x q.
       LOCr(  K  )  denotes  the  number of elements of K that a process would
       receive if K were distributed over  the  p  processes  of  its  process
       column.
       Similarly, LOCc( K ) denotes the number of elements of K that a process
       would receive if K were distributed over the q processes of its process
       row.
       The  values  of  LOCr()  and LOCc() may be determined via a call to the
       ScaLAPACK tool function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An  upper
       bound for these quantities may be computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A

ARGUMENTS

       M       (global input) INTEGER
               The  number  of rows to be operated on, i.e. the number of rows
               of the distributed submatrix sub( A ). M >= 0.

       N       (global input) INTEGER
               The number of columns to be operated on,  i.e.  the  number  of
               columns of the distributed submatrix sub( A ). N >= 0.

       NB      (global input) INTEGER
               The  number  of  leading  rows  and  columns  of sub( A ) to be
               reduced.

       A       (local input/local output) COMPLEX*16 pointer into the
               local memory to an array of dimension (LLD_A,LOCc(JA+N-1)).  On
               entry,  this  array  contains  the  local pieces of the general
               distributed matrix sub( A ) to be reduced. On exit,  the  first
               NB  rows and columns of the matrix are overwritten; the rest of
               the distributed matrix sub( A )  is  unchanged.   If  m  >=  n,
               elements  on  and  below  the diagonal in the first NB columns,
               with the array TAUQ,  represent  the  unitary  matrix  Q  as  a
               product  of  elementary  reflectors;  and  elements  above  the
               diagonal in the first NB rows, with the array  TAUP,  represent
               the unitary matrix P as a product of elementary reflectors.  If
               m < n, elements below the diagonal in  the  first  NB  columns,
               with  the  array  TAUQ,  represent  the  unitary  matrix Q as a
               product of elementary reflectors, and elements on and above the
               diagonal  in  the first NB rows, with the array TAUP, represent
               the unitary matrix P as a  product  of  elementary  reflectors.
               See  Further  Details.   IA      (global input) INTEGER The row
               index in the global array A indicating the first row of sub(  A
               ).

       JA      (global input) INTEGER
               The  column  index  in  the global array A indicating the first
               column of sub( A ).

       DESCA   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix A.

       D       (local output) DOUBLE PRECISION array, dimension
               LOCr(IA+MIN(M,N)-1) if M >= N;  LOCc(JA+MIN(M,N)-1)  otherwise.
               The  distributed  diagonal elements of the bidiagonal matrix B:
               D(i) = A(ia+i-1,ja+i-1). D is tied to the distributed matrix A.

       E       (local output) DOUBLE PRECISION array, dimension
               LOCr(IA+MIN(M,N)-1)  if  M >= N; LOCc(JA+MIN(M,N)-2) otherwise.
               The  distributed  off-diagonal  elements  of   the   bidiagonal
               distributed  matrix B: if m >= n, E(i) = A(ia+i-1,ja+i) for i =
               1,2,...,n-1;  if  m  <  n,  E(i)  =  A(ia+i,ja+i-1)  for  i   =
               1,2,...,m-1.  E is tied to the distributed matrix A.

       TAUQ    (local output) COMPLEX*16 array dimension
               LOCc(JA+MIN(M,N)-1).  The  scalar  factors  of  the  elementary
               reflectors which represent the unitary matrix Q. TAUQ  is  tied
               to  the  distributed  matrix  A.  See  Further  Details.   TAUP
               (local output) COMPLEX*16 array, dimension LOCr(IA+MIN(M,N)-1).
               The scalar factors of the elementary reflectors which represent
               the unitary matrix P. TAUP is tied to the distributed matrix A.
               See Further Details.  X       (local output) COMPLEX*16 pointer
               into the local memory to an array of dimension  (LLD_X,NB).  On
               exit,  the  local  pieces  of  the  distributed  M-by-NB matrix
               X(IX:IX+M-1,JX:JX+NB-1) required to update the  unreduced  part
               of sub( A ).

       IX      (global input) INTEGER
               The row index in the global array X indicating the first row of
               sub( X ).

       JX      (global input) INTEGER
               The column index in the global array  X  indicating  the  first
               column of sub( X ).

       DESCX   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix X.

       Y       (local output) COMPLEX*16 pointer into the local memory
               to an array of dimension (LLD_Y,NB).  On exit, the local pieces
               of  the  distributed  N-by-NB  matrix   Y(IY:IY+N-1,JY:JY+NB-1)
               required to update the unreduced part of sub( A ).

       IY      (global input) INTEGER
               The row index in the global array Y indicating the first row of
               sub( Y ).

       JY      (global input) INTEGER
               The column index in the global array  Y  indicating  the  first
               column of sub( Y ).

       DESCY   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix Y.

       WORK    (local workspace) COMPLEX*16 array, dimension (LWORK)
               LWORK >= NB_A + NQ, with

               NQ  =  NUMROC( N+MOD( IA-1, NB_Y ), NB_Y, MYCOL, IACOL, NPCOL )
               IACOL = INDXG2P( JA, NB_A, MYCOL, CSRC_A, NPCOL )

               INDXG2P and NUMROC are ScaLAPACK tool functions; MYROW,  MYCOL,
               NPROW  and  NPCOL  can  be determined by calling the subroutine
               BLACS_GRIDINFO.

FURTHER DETAILS

       The matrices  Q  and  P  are  represented  as  products  of  elementary
       reflectors:

          Q = H(1) H(2) . . . H(nb)  and  P = G(1) G(2) . . . G(nb)

       Each H(i) and G(i) has the form:

          H(i) = I - tauq * v * v’  and G(i) = I - taup * u * u’

       where  tauq  and  taup  are  complex  scalars,  and v and u are complex
       vectors.

       If m >= n, v(1:i-1) = 0, v(i) = 1, and v(i:m)  is  stored  on  exit  in
       A(ia+i-1:ia+m-1,ja+i-1); u(1:i) = 0, u(i+1) = 1, and u(i+1:n) is stored
       on exit in A(ia+i-1,ja+i:ja+n-1); tauq is stored  in  TAUQ(ja+i-1)  and
       taup in TAUP(ia+i-1).

       If  m  <  n,  v(1:i) = 0, v(i+1) = 1, and v(i+1:m) is stored on exit in
       A(ia+i+1:ia+m-1,ja+i-1); u(1:i-1) = 0, u(i) = 1, and u(i:n)  is  stored
       on  exit  in  A(ia+i-1,ja+i:ja+n-1); tauq is stored in TAUQ(ja+i-1) and
       taup in TAUP(ia+i-1).

       The elements of the vectors v and u together form the m-by-nb matrix  V
       and  the nb-by-n matrix U’ which are needed, with X and Y, to apply the
       transformation to the unreduced part  of  the  matrix,  using  a  block
       update of the form:  sub( A ) := sub( A ) - V*Y’ - X*U’.

       The  contents  of  sub(  A  )  on exit are illustrated by the following
       examples with nb = 2:

       m = 6 and n = 5 (m > n):          m = 5 and n = 6 (m < n):

         (  1   1   u1  u1  u1 )           (  1   u1  u1  u1  u1  u1 )
         (  v1  1   1   u2  u2 )           (  1   1   u2  u2  u2  u2 )
         (  v1  v2  a   a   a  )           (  v1  1   a   a   a   a  )
         (  v1  v2  a   a   a  )           (  v1  v2  a   a   a   a  )
         (  v1  v2  a   a   a  )           (  v1  v2  a   a   a   a  )
         (  v1  v2  a   a   a  )

       where a denotes an element of the original matrix which  is  unchanged,
       vi denotes an element of the vector defining H(i), and ui an element of
       the vector defining G(i).