Numer. Algorithms. 50 (2009), no. 1, 33–65.
An algorithm for computing the complete CS decomposition of a partitioned unitary matrix is developed. Although the existence of the CS decomposition (CSD) has been recognized since 1977, prior algorithms compute only a reduced version. This reduced version, which might be called a 2-by-1 CSD, is equivalent to two simultaneous singular value decompositions. The algorithm presented in this article computes the complete 2-by-2 CSD, which requires the simultaneous diagonalization of all four blocks of a unitary matrix partitioned into a 2-by-2 block structure. The algorithm appears to be the only fully specified algorithm available. The computation occurs in two phases. In the first phase, the unitary matrix is reduced to bidiagonal block form, as described by Sutton and Edelman. In the second phase, the blocks are simultaneously diagonalized using techniques from bidiagonal SVD algorithms of Golub, Kahan, Reinsch, and Demmel. The algorithm has a number of desirable numerical features.