of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Control-theoretic design of iterative methods for symmetric linear systems of equations

Category:

Health & Fitness

Publish on:

Views: 5 | Pages: 6

Extension: PDF | Download: 0

Share
Description
Abstract Iterative methods for linear systems with a symmetric positive definite coefficient matrix are designed from a control-theoretic viewpoint. In particular, it is shown that a control-theoretic approach loosely based on m-step dead beat
Transcript
  Control-theoretic design of iterative methods for symmetric linearsystems of equations Amit Bhaya, Pierre-Alexandre Bliman and Fernando Pazos  Abstract —Iterative methods for linear systems with asymmetric positive definite coefficient matrix are designedfrom a control-theoretic viewpoint. In particular, it isshown that a control-theoretic approach loosely based on m -step dead beat control of the error or residual system,with a suitable definition of error norm can be utilized todesign new iterative methods that are competitive with thepopular Barzilai–Borwein method, that is well known to bean efficient method with low computational cost. Numericalexperiments are reported on to confirm the claimed results. I. I NTRODUCTION Consider a system of linear equations of the type Ax = b , where A ∈ R n × n is a symmetric positivedefinite matrix, and x , b ∈ R n are, respectively, thevectors of unknowns to be found by the iterative methodand the given right hand side. As is usual in the contextof iterative methods, the dimension n is supposed to be“large”, so that Gaussian elimination or explicit inver-sion of matrix A would be prohibitive and alternativemethods have to be found.The classical steepest descent (SD) or Cauchy methodcan be applied to this problem by reformulating thisproblem as one of minimizing the quadratic form V   ( x ) := 12 x T Ax − b T x and observing that its minimizeris the desired solution vector x = A − 1 b , which is theunique zero of the gradient function ∇ V   ( x ) = Ax − b .Specifically, the steepest descent iterative method isgradient descent applied to the quadratic function V   ( x ) ,i.e., x k +1 = x k − α k ∇ V   ( x k ) , which, in terms of theresidual or error vector r := b − Ax , can be written as: x k +1 = x k + α k r k (1)which is equivalent to r k +1 = r k − α k Ar k = ( I − α k A ) r k (2) This work was supported by INRIA, COPPE and Brazilian fundingagencies CAPES, CNPq, FAPERJA. Bhaya and F. Pazos are with the Department of ElectricalEngineering, Federal University of Rio de Janeiro, PEE/COPPE/UFRJ,PO Box 68504, Rio de Janeiro, RJ 21945-970, Brazil, amit@nacad.ufrj.br , quini.coppe@gmail.com P.-A. Bliman is with l’Institut National de Recherche en In-formatique et en Automatique (INRIA), Domaine de Volceau,Rocquencourt, B. P. 105, 78153, Le Chesnay Cedex, France pierre-alexandre.bliman@inria.fr where α k is a (time-varying) step size to be chosen andthe steepest descent choice is: α k = r k T r k r k T Ar k (3)which is the step size that minimizes V   ( x k +1 ) . Equi-valently, in control terms, regarding (2) as a bilinearsystem in which r k is the state and α k the scalar control,and choosing the Liapunov function V   ( r k ) := r k T r k , asimple calculation [1] shows that the choice (3) makes ∆ V   := V   ( r k +1 ) − V   ( r k ) negative.It is easy to see that, if the initial guess x 0 happensto be such that the initial residue r 0 = b − Ax 0 isan eigenvector of  A , then the steepest descent iteration(2), (3) converges to the desired solution in the nextstep. However, since such a serendipitous initializationis not generic, the steepest descent method is usuallyplagued by slow convergence, taking a large numberof iterations to converge to the solution. The conjugategradient (CG) algorithm, proposed by Hestenes andStiefel [2] is the best known and most efficient wayto fix the slow convergence of the steepest descent (SD)algorithm. It can also be represented as a pair of coupledbilinear systems (in the vectors r and the conjugatedirection p ) and the scalar controls associated witheach bilinear system designed using a similar Liapunovfunction approach [1]. There are, however, applications,e.g. in optimization, in which the application of the CGalgorithm may turn out to be too expensive, so it isconvenient to have efficient modifications of the SDalgorithm. In 1988, a paper by Barzilai and Borwein[3], demonstrated via a two-dimensional analysis andnumerical examples that, surprisingly, the followingsimple modification of (1): x k +1 = x k + α k − 1 r k (4)where α k is still given by (3), is extraordinarily moreefficient than the steepest descent method. A series of papers in the numerical analysis literature subsequentlysought to analyze and take advantage of the Barzilai–Borwein (BB) method. Good accounts of all this activity Joint 48th IEEE Conference on Decision and Control and28th Chinese Control ConferenceShanghai, P.R. China, December 16-18, 2009 WeA04.2 978-1-4244-3872-3/09/$25.00 ©2009 IEEE115  and new results can be found in [4] and [5], as well asin the references therein.The objective of this paper is to advance towardsan understanding of how to design methods at least asefficient as the BB method (4), from a systematic controlviewpoint.II. C ONTROL - INSPIRED VERSIONS OF THE BB METHOD As explained in the introduction, the design of itera-tive methods for linear systems can be interpreted as thedesign of a control for the bilinear residue system (2).In particular, we propose two methods; the first is basedon the choice of an error norm (or Liapunov function)and an m -step ahead dead beat control idea, while thesecond is based on the (tricky) choice of a different errornorm for which a 1 -step ahead control is chosen to makethis norm as small as possible.  A. The m -step ahead deadbeat type method (DB m ) This method, which we will call the m -step aheaddeadbeat type method (DB m ), starts by defining an erroror residual norm. The objective is to choose a controlsequence in such a way that this norm is minimized in acertain number ( m ) of steps ahead, where m is chosenas a small number (i.e., smaller than the dimension n of the matrix A ).The error norm is chosen as: V   z ( r k + m ( α k ,α k +1 ,..,α k + m − 1 )) = r k + m T A z r k + m (5)where m is a positive integer and z is an integer. Thegoal is to choose the controls α k + i ,i ∈ { 0 , ··· ,m − 1 } in such a way that the error norm is minimized afterthe application of this sequence of controls. In control jargon, this is a type of dead-beat control in m steps.Observe that, from (2): r k + m = m − 1  i =0 ( I − α k + i A ) r k (6)Taking the derivative of (5) with respect to α k + i for all i ∈ { 0 , ··· ,m − 1 } and setting it equal to zero: ∂V   z ( r k + m ) ∂α k + i = − 2  i − 1 p =0 ( I − α k + p A )( I − α k + i A ) ×  m − 1 p = i +1 ( I − α k + p A ) r k  T A z ×  i − 1 p =0 ( I − α k + p A ) A  m − 1 p = i +1 ( I − α k + p A ) r k  = 0 Defining M k +  p := I − α k +  p A yields: α k + i = r k T (  i − 1  p =0 M k +  p ) 2 (  m − 1  p = i +1 M k +  p ) 2 A z +1 r k r k T (  i − 1  p =0 M k +  p ) 2 (  m − 1  p = i +1 M k +  p ) 2 A z +2 r k (7)for all i ∈ { 0 , ··· ,m − 1 } . Concerning convergence, itis not difficult to show that it holds for m = 1 . Now,for m > 1 , the same conclusion can be drawn, noticingthat the decrease of  V   z between r k and r k + m is at leastas large as the decrease obtained by m applications of the one-step ahead method.  Remark  : The choices m = 1 and z = 0 lead to the so-called Orthomin step size [6], while the choices m = 1 and z = − 1 lead to the steepest descent method (1). Asan illustration, the calculations that lead to the choice of controls are carried out below, assuming that m = 4 . αk = r k T ( I − αk +1 A )2( I − αk +2 A )2( I − αk +3 A )2 A z +1 r k r k T ( I − αk +1 A )2( I − αk +2 A )2( I − αk +3 A )2 A z +2 r k (8) αk +1= r k T ( I − αk A )2( I − αk +2 A )2( I − αk +3 A )2 A z +1 r k r k T ( I − αk A )2( I − αk +2 A )2( I − αk +3 A )2 A z +2 r k (9) αk +2= r k T ( I − αk A )2( I − αk +1 A )2( I − αk +3 A )2 A z +1 r k r k T ( I − αk A )2( I − αk +1 A )2( I − αk +3 A )2 A z +2 r k (10) αk +3= r k T ( I − αk A )2( I − αk +1 A )2( I − αk +2 A )2 A z +1 r k r k T ( I − αk A )2( I − αk +1 A )2( I − αk +2 A )2 A z +2 r k (11) Since each α k + i depends on the others α k +  p , p  = i , ineffect, we have to solve a fixed point problem: α k = F  0 ( α k +1 , ··· ,α k + m − 1 ) α k +1 = F  1 ( α k ,α k +2 , ··· ,α k + m − 1 ) · α k + i = F  i ( α k , ··· ,α k + i − 1 ,α k + i +1 , ··· ,α k + m − 1 ) · α k + m − 1 = F  m − 1 ( α k , ··· ,α k + m − 2 ) (12)it is easy to see that, for all i ∈ { 0 , ··· ,m − 1 } , F  i ( α k , ··· ,α k + i − 1 ,α k + i +1 , ··· ,α k + m − 1 ) > 0 . Let α :=  α k ·· α k + m − 1  F :=  F  o ( α ) ·· F  m − 1 ( α )  , be vectors in R m . Then the control vector α can befound by solving the fixed point equation α = F ( α ) ,or by finding a zero of the function G ( α ) = α − F ( α ) .In practice, since the function G is quite complex, andit is desirable to keep the overall operation count as lowas possible, we opted for fixed point iteration α j +1 = F ( α j ) , instead of using a Newton method. Furthermore,in the numerical implementation, we typically perform just a few iterations, to find an approximate controlvector α . A sufficient convergence condition for thefixed point iteration is that F ( α ) is a contraction map,but again, in practice, this condition is not checked 1 . 1 For example, for the matrix A 3 presented in the following section,calculating m = n = 6 step sizes ( α ∈ R 6 ), and with a precision of  10 − 2 , the sequence  α j +1 − α j  calculated by the fixed point al-gorithm is { 0 . 0553 , 0 . 0070 , 0 . 0052 , 0 . 0282 , 0 . 0198 , 8 . 65 . 10 − 6 } ;while non-decreasing, it is converging to zero. WeA04.2 116  The vector α 0 (initial value for fixed point iteration)can be chosen as a constant vector, at least for the firstiteration of the outer loop (the first m values of  α k , k ∈{ 0 , ··· ,m − 1 } ), for the following ones ( k ≥ m ) thelast values calculated by the fixed point algorithm inthe former outer loop iteration can be used. Anotherpossibility is to choose the initial values of  ( α k +  p ) 0 ,  p ∈ { 0 , ··· ,m − 1 } so as to minimize  r k +  p +1  2 =    pi =0 ( I − α k + i A ) r k  2 , that is: ( α k +  p ) 0 = r T k   p − 1 i =0 ( I − α k + i A ) 2 r k r T k   p − 1 i =0 ( I − α k + i A ) 2 Ar k , p > 0 (13)and α k 0 = r T k r k r T k Ar k , for p = 0 . Once the fixed point hasbeen found to within some desired approximation, theiteration for the residual vector proceeds as follows: r k + i +1 = ( I − α k + i A ) r k + i i ∈ { 0 , ··· ,m − 1 } (14)The resulting iterative method is denoted DB m ( z ) ,where m,z are the integer values that define the errornorm in (5).  B. The generalized Barzilai–Borwein (GBB) method  The second method proposed in this paper, referredto as the generalized Barzilai–Borwein (GBB) method,uses the bilinear system (2) with the correspondingiteration in x (1). The choice of error or residual normin this case is: V   z 1 ( g k +1 ) = g k +1 T A z 1 g k +1 (15)where z 1 is an integer and the vector g k +1 is defined asfollows: g k +1 = r k +1 −  I − r k − 1 T A z 2 r k − 1 r k − 1 T A z 3 r k − 1 A z 4  r k = − α k Ar k + r k − 1 T A z 2 r k − 1 r k − 1 T A z 3 r k − 1 A z 4 r k , (16) z 2 , z 3 , z 4 being integers. Thus: V   z 1 ( g k +1 ) = α 2 k r k T A 2+ z 1 r k − 2 α k r k − 1 T A z 2 r k − 1 r k − 1 T A z 3 r k − 1 r k T A 1+ z 1 + z 4 r k +  r k − 1 T A z 2 r k − 1 r k − 1 T A z 3 r k − 1  2 r k T A 2 z 4 + z 1 r k The objective is to choose the control α k so as tominimize V   z 1 ( g k +1 ) . To this end, we calculate: ∂V   z 1 ( g k +1 ) ∂α k = 2 α k r k T A 2+ z 1 r k − 2 r k − 1 T A z 2 r k − 1 r k − 1 T A z 3 r k − 1 r k T A 1+ z 1 + z 4 r k = 0 and the minimizing choice of  α k is given as: α k = r k − 1 T A z 2 r k − 1 r k T A 1+ z 1 + z 4 r k r k − 1 T A z 3 r k − 1 r k T A 2+ z 1 r k (17)The following special cases are of note: • z 1 = − 1 z 4 = 0 z 2 = z 3 → (1) ( steepest descent  ) • z 1 = 0 z 4 = 0 z 2 = z 3 → [6] ( orthomin ) • z 4 = 1 z 2 = 0 z 3 = 1 → (4) (  Barzilai-Borwein )Indeed, the last item provides the motivation for thenonintuitive choice of error norm (15)–(16), as well asfor the name of this method. We adopt the notationGBB( z 1 ,z 2 ,z 3 ,z 4 ) to denote the particular member of this class of methods in which the integers z i define themethod according to equations (15) and (16).  Remark: The two new methods proposed in this sectionhave an interpretation that is worth reflecting on. Bothmethods aim to minimize Liapunovfunctions (5), (15) of the residual vector r . This residual vector r is itself thenegative of the gradient of the quadratic form V   ( x ) := 12 x T Ax − b T x , the minimization of which (by a gradientmethod) gives rise to (1), the scalar α k being interpretedas an appropriately chosen step size in the directionof the gradient. However, both of the new proposedmethods are designed based on a type of  m -step aheaddeadbeat control idea applied, not to (1), but to theequivalent (bilinear) system (2), the control sequenceof which is exactly the same step size sequence thatwill be applied to (1). From this perspective, it couldbe said that, instead of exact line search along r , theproposed methods choose the step size by an auxiliarycalculation using different Liapunov functions (5), (15)of the residual vector r . One consequence of this is thenonmonotonic behavior of the residual norms, visible inthe numerical experiments reported on below.III. N UMERICAL EXPERIMENTS In this section, the two proposed algorithms, DB m and GBB (with different parameter choices, which areindicated by different suffixes (a,b,c etc.) to the abbre-viations DB m and GBB) are compared with the srcinalBarzilai–Borwein (BB) algorithm. For the purposes of evaluation against a gold standard, results for the con- jugate gradient (CG) iteration are also given, as wellas for the three term recursion (TTR) algorithm (seeappendix).For simplicity, in all cases, the right hand side b istaken as the zero vector 0 , so that the desired solution is x ∗ = 0 ; the initial residue is chosen as r 0 = [1 1 ··· 1] T ;and the stopping criterion is  r k  < 10 − 4 .For the DB m algorithm, the stopping criterion for thefixed point iteration, that determines the step size (orcontrol) vector α , was chosen as  α j +1 − α j  < 10 − 2 ,where α 0 = [100 ··· 100] T for the first outer loopiteration.The first test matrix chosen was A = A 1 =diag (1 , 6 , 23 , 58) ∈ R 4 × 4 . It has a condition number WeA04.2 117  510152025303540455010 −5 10 −4 10 −3 10 −2 10 −1 10 0   iteration number                |               |           r               |               | Log residual norm vs. iteration number for DB m , GBB and TTR System Ax=bA=A 1 b=0r 0 =[1 1 1 1] T  GBBaGBBbGBBcGBBdGBBeDBaDBbTTRGBBbGBBcGBBdGBBeDBbGBBaTTRDBa Fig. 1. Test results with the matrix A 1 , where DBa: DB 3 (0) ;DBb: DB 4 ( − 1) ; GBBa: BB; GBBb: GBB( z 1 , 1 , 2 , 1 ); GBBc:GBB( 0 , 1 , 2 , 0 ); GBBd: SD; GBBe: GBB( 0 ,z 2 = z 3 , 0 ) (OM) κ = λ max ( A 1 ) λ min ( A 1 ) = 58 , and the starting point was chosenas x 0 = A − 11 r 0 = [1 1 . 667 0 . 0453 0 . 0172] T .Figure 1 shows the norm of the residue as a functionof the number of iterations for each algorithm. Thesecond test matrix in R 4 × 4 was chosen as: A = A 2 =  135 37 2 4837 69 5 − 312 5 91 − 106 . 548 − 31 − 106 . 5 169 . 25  This is a symmetric positive definite matrixand has a condition number κ = 50046 .The starting point was x 0 = A − 12 r 0 =[ − 79 . 3795 95 . 0206 164 . 1170 143 . 1424] T . Thenumerical results are shown in Figure 2. The third testmatrix A 3 ∈ R 6 × 6 was chosen as:  119 − 25 − 28 46 86 . 9 97 − 25 400 − 169 . 5 31 25 . 5 − 347 − 28 − 169 225 . 25 7 − 77 . 85 2246 31 7 103 63 . 5 586 . 9 25 . 5 − 77 . 85 63 . 5 189 . 34 93 . 597 − 347 22 5 93 . 5 486  The condition number of this matrix is κ = 57 . 6043 and the initial point chosen was x 0 = A − 13 r 0 =[0 . 004 0 . 0633 0 . 0477 − 0 . 0149 − 0 . 0027 0 . 045] T . Testresults are shown in figure 3.Table 1 summarizes the test results, including theresults obtained using the "gold standard" CG algorithmand the TTR algorithm for the purposes of comparison. 510152025303540455010 −10 10 −5 10 0 10 5 Log residual norm vs. iteration number for DB m , GBB and TTR  System Ax=bA=A 2 b=0r 0 =[1 1 1 1] T   iteration number                |               |           r               |               | DBaDBbGBBaGBBbTTRDBbGBBaGBBbDBaTTR Fig. 2. Test results with the matrix A 2 , where DBa: DB 3 ( − 1) ; DBb:DB 4 ( − 1) ; GBBa: BB; GBBb: GBB( z 1 , 1 , 2 , 1 ). 102030405060708010 −6 10 −4 10 −2 10 0   Log residual norm vs. iteration number for DB m , GBB and TTR System Ax=bA=A 3 b=0r 0 =[1 1 1 1 1 1] T   iteration number                |               |           r               |               | DBaDBbGBBaGBBbGBBcGBBdGBBeTTRDBbDBaTTRGBBbGBBcGBBdGBBaGBBe Fig. 3. Test results with the matrix A 3 , where DBa: DB 5 (0) ; DBb:DB 6 ( − 1) ; GBBa: SD; GBBb: GBB( 0 ,z 2 = z 3 , 0 ) (OM); GBBc: BB;GBBd: GBB( z 1 , 1 , 2 , 1 ); GBBe: GBB( − 1 , 0 , 1 , 0 ). The results in table 1 show that the DB m algorithmneeds a fairly low number of iterations to converge. The“inner loop” fixed point iteration obviously increases thenumber of matrix times vector operations, but, for thissmall dimension of the test matrix, it is still lower forDB m than for the srcinal BB algorithm. The proposedDB m algorithm has a greater advantage as the conditionnumber of the test matrix increases.Another pertinent observation is that the number of iterations reduces as the number of steps ahead m approaches the size n of the matrix ( 4 for A 1 and A 2 , WeA04.2 118
Tags
Search Related
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks