Controltheoretic design of iterative methods for symmetric linearsystems of equations
Amit Bhaya, PierreAlexandre Bliman and Fernando Pazos
Abstract
—Iterative methods for linear systems with asymmetric positive deﬁnite coefﬁcient matrix are designedfrom a controltheoretic viewpoint. In particular, it isshown that a controltheoretic approach loosely based on
m
step dead beat control of the error or residual system,with a suitable deﬁnition of error norm can be utilized todesign new iterative methods that are competitive with thepopular Barzilai–Borwein method, that is well known to bean efﬁcient method with low computational cost. Numericalexperiments are reported on to conﬁrm the claimed results.
I. I
NTRODUCTION
Consider a system of linear equations of the type
Ax
=
b
, where
A
∈
R
n
×
n
is a symmetric positivedeﬁnite matrix, and
x
,
b
∈
R
n
are, respectively, thevectors of unknowns to be found by the iterative methodand the given right hand side. As is usual in the contextof iterative methods, the dimension
n
is supposed to be“large”, so that Gaussian elimination or explicit inversion of matrix
A
would be prohibitive and alternativemethods have to be found.The classical steepest descent (SD) or Cauchy methodcan be applied to this problem by reformulating thisproblem as one of minimizing the quadratic form
V
(
x
) :=
12
x
T
Ax
−
b
T
x
and observing that its minimizeris the desired solution vector
x
=
A
−
1
b
, which is theunique zero of the gradient function
∇
V
(
x
) =
Ax
−
b
.Speciﬁcally, the steepest descent iterative method isgradient descent applied to the quadratic function
V
(
x
)
,i.e.,
x
k
+1
=
x
k
−
α
k
∇
V
(
x
k
)
, which, in terms of theresidual or error vector
r
:=
b
−
Ax
, can be written as:
x
k
+1
=
x
k
+
α
k
r
k
(1)which is equivalent to
r
k
+1
=
r
k
−
α
k
Ar
k
= (
I
−
α
k
A
)
r
k
(2)
This work was supported by INRIA, COPPE and Brazilian fundingagencies CAPES, CNPq, FAPERJA. Bhaya and F. Pazos are with the Department of ElectricalEngineering, Federal University of Rio de Janeiro, PEE/COPPE/UFRJ,PO Box 68504, Rio de Janeiro, RJ 21945970, Brazil,
amit@nacad.ufrj.br , quini.coppe@gmail.com
P.A. Bliman is with l’Institut National de Recherche en Informatique et en Automatique (INRIA), Domaine de Volceau,Rocquencourt, B. P. 105, 78153, Le Chesnay Cedex, France
pierrealexandre.bliman@inria.fr
where
α
k
is a (timevarying) step size to be chosen andthe steepest descent choice is:
α
k
=
r
k
T
r
k
r
k
T
Ar
k
(3)which is the step size that minimizes
V
(
x
k
+1
)
. Equivalently, in control terms, regarding (2) as a bilinearsystem in which
r
k
is the state and
α
k
the scalar control,and choosing the Liapunov function
V
(
r
k
) :=
r
k
T
r
k
, asimple calculation [1] shows that the choice (3) makes
∆
V
:=
V
(
r
k
+1
)
−
V
(
r
k
)
negative.It is easy to see that, if the initial guess
x
0
happensto be such that the initial residue
r
0
=
b
−
Ax
0
isan eigenvector of
A
, then the steepest descent iteration(2), (3) converges to the desired solution in the nextstep. However, since such a serendipitous initializationis not generic, the steepest descent method is usuallyplagued by slow convergence, taking a large numberof iterations to converge to the solution. The conjugategradient (CG) algorithm, proposed by Hestenes andStiefel [2] is the best known and most efﬁcient wayto ﬁx the slow convergence of the steepest descent (SD)algorithm. It can also be represented as a pair of coupledbilinear systems (in the vectors
r
and the conjugatedirection
p
) and the scalar controls associated witheach bilinear system designed using a similar Liapunovfunction approach [1]. There are, however, applications,e.g. in optimization, in which the application of the CGalgorithm may turn out to be too expensive, so it isconvenient to have efﬁcient modiﬁcations of the SDalgorithm. In 1988, a paper by Barzilai and Borwein[3], demonstrated via a twodimensional analysis andnumerical examples that, surprisingly, the followingsimple modiﬁcation of (1):
x
k
+1
=
x
k
+
α
k
−
1
r
k
(4)where
α
k
is still given by (3), is extraordinarily moreefﬁcient than the steepest descent method. A series of papers in the numerical analysis literature subsequentlysought to analyze and take advantage of the Barzilai–Borwein (BB) method. Good accounts of all this activity
Joint 48th IEEE Conference on Decision and Control and28th Chinese Control ConferenceShanghai, P.R. China, December 1618, 2009
WeA04.2
9781424438723/09/$25.00 ©2009 IEEE115
and new results can be found in [4] and [5], as well asin the references therein.The objective of this paper is to advance towardsan understanding of how to design methods at least asefﬁcient as the BB method (4), from a systematic controlviewpoint.II. C
ONTROL

INSPIRED VERSIONS OF THE
BB
METHOD
As explained in the introduction, the design of iterative methods for linear systems can be interpreted as thedesign of a control for the bilinear residue system (2).In particular, we propose two methods; the ﬁrst is basedon the choice of an error norm (or Liapunov function)and an
m
step ahead dead beat control idea, while thesecond is based on the (tricky) choice of a different errornorm for which a
1
step ahead control is chosen to makethis norm as small as possible.
A. The
m
step ahead deadbeat type method (DB
m
)
This method, which we will call the
m
step aheaddeadbeat type method (DB
m
), starts by deﬁning an erroror residual norm. The objective is to choose a controlsequence in such a way that this norm is minimized in acertain number (
m
) of steps ahead, where
m
is chosenas a small number (i.e., smaller than the dimension
n
of the matrix
A
).The error norm is chosen as:
V
z
(
r
k
+
m
(
α
k
,α
k
+1
,..,α
k
+
m
−
1
)) =
r
k
+
m
T
A
z
r
k
+
m
(5)where
m
is a positive integer and
z
is an integer. Thegoal is to choose the controls
α
k
+
i
,i
∈ {
0
,
···
,m
−
1
}
in such a way that the error norm is minimized afterthe application of this sequence of controls. In control jargon, this is a type of deadbeat control in
m
steps.Observe that, from (2):
r
k
+
m
=
m
−
1
i
=0
(
I
−
α
k
+
i
A
)
r
k
(6)Taking the derivative of (5) with respect to
α
k
+
i
for all
i
∈ {
0
,
···
,m
−
1
}
and setting it equal to zero:
∂V
z
(
r
k
+
m
)
∂α
k
+
i
=
−
2
i
−
1
p
=0
(
I
−
α
k
+
p
A
)(
I
−
α
k
+
i
A
)
×
m
−
1
p
=
i
+1
(
I
−
α
k
+
p
A
)
r
k
T
A
z
×
i
−
1
p
=0
(
I
−
α
k
+
p
A
)
A
m
−
1
p
=
i
+1
(
I
−
α
k
+
p
A
)
r
k
= 0
Deﬁning
M
k
+
p
:=
I
−
α
k
+
p
A
yields:
α
k
+
i
=
r
k
T
(
i
−
1
p
=0
M
k
+
p
)
2
(
m
−
1
p
=
i
+1
M
k
+
p
)
2
A
z
+1
r
k
r
k
T
(
i
−
1
p
=0
M
k
+
p
)
2
(
m
−
1
p
=
i
+1
M
k
+
p
)
2
A
z
+2
r
k
(7)for all
i
∈ {
0
,
···
,m
−
1
}
. Concerning convergence, itis not difﬁcult to show that it holds for
m
= 1
. Now,for
m >
1
, the same conclusion can be drawn, noticingthat the decrease of
V
z
between
r
k
and
r
k
+
m
is at leastas large as the decrease obtained by
m
applications of the onestep ahead method.
Remark
: The choices
m
= 1
and
z
= 0
lead to the socalled Orthomin step size [6], while the choices
m
= 1
and
z
=
−
1
lead to the steepest descent method (1). Asan illustration, the calculations that lead to the choice of controls are carried out below, assuming that
m
= 4
.
αk
=
r
k
T
(
I
−
αk
+1
A
)2(
I
−
αk
+2
A
)2(
I
−
αk
+3
A
)2
A
z
+1
r
k
r
k
T
(
I
−
αk
+1
A
)2(
I
−
αk
+2
A
)2(
I
−
αk
+3
A
)2
A
z
+2
r
k
(8)
αk
+1=
r
k
T
(
I
−
αk
A
)2(
I
−
αk
+2
A
)2(
I
−
αk
+3
A
)2
A
z
+1
r
k
r
k
T
(
I
−
αk
A
)2(
I
−
αk
+2
A
)2(
I
−
αk
+3
A
)2
A
z
+2
r
k
(9)
αk
+2=
r
k
T
(
I
−
αk
A
)2(
I
−
αk
+1
A
)2(
I
−
αk
+3
A
)2
A
z
+1
r
k
r
k
T
(
I
−
αk
A
)2(
I
−
αk
+1
A
)2(
I
−
αk
+3
A
)2
A
z
+2
r
k
(10)
αk
+3=
r
k
T
(
I
−
αk
A
)2(
I
−
αk
+1
A
)2(
I
−
αk
+2
A
)2
A
z
+1
r
k
r
k
T
(
I
−
αk
A
)2(
I
−
αk
+1
A
)2(
I
−
αk
+2
A
)2
A
z
+2
r
k
(11)
Since each
α
k
+
i
depends on the others
α
k
+
p
, p
=
i
, ineffect, we have to solve a ﬁxed point problem:
α
k
=
F
0
(
α
k
+1
,
···
,α
k
+
m
−
1
)
α
k
+1
=
F
1
(
α
k
,α
k
+2
,
···
,α
k
+
m
−
1
)
·
α
k
+
i
=
F
i
(
α
k
,
···
,α
k
+
i
−
1
,α
k
+
i
+1
,
···
,α
k
+
m
−
1
)
·
α
k
+
m
−
1
=
F
m
−
1
(
α
k
,
···
,α
k
+
m
−
2
)
(12)it is easy to see that, for all
i
∈ {
0
,
···
,m
−
1
}
,
F
i
(
α
k
,
···
,α
k
+
i
−
1
,α
k
+
i
+1
,
···
,α
k
+
m
−
1
)
>
0
. Let
α
:=
α
k
··
α
k
+
m
−
1
F
:=
F
o
(
α
)
··
F
m
−
1
(
α
)
,
be vectors in
R
m
. Then the control vector
α
can befound by solving the ﬁxed point equation
α
=
F
(
α
)
,or by ﬁnding a zero of the function
G
(
α
) =
α
−
F
(
α
)
.In practice, since the function
G
is quite complex, andit is desirable to keep the overall operation count as lowas possible, we opted for ﬁxed point iteration
α
j
+1
=
F
(
α
j
)
, instead of using a Newton method. Furthermore,in the numerical implementation, we typically perform just a few iterations, to ﬁnd an approximate controlvector
α
. A sufﬁcient convergence condition for theﬁxed point iteration is that
F
(
α
)
is a contraction map,but again, in practice, this condition is not checked
1
.
1
For example, for the matrix
A
3
presented in the following section,calculating
m
=
n
= 6
step sizes (
α
∈
R
6
), and with a precision of
10
−
2
, the sequence
α
j
+1
−
α
j
calculated by the ﬁxed point algorithm is
{
0
.
0553
,
0
.
0070
,
0
.
0052
,
0
.
0282
,
0
.
0198
,
8
.
65
.
10
−
6
}
;while nondecreasing, it is converging to zero.
WeA04.2
116
The vector
α
0
(initial value for ﬁxed point iteration)can be chosen as a constant vector, at least for the ﬁrstiteration of the outer loop (the ﬁrst
m
values of
α
k
, k
∈{
0
,
···
,m
−
1
}
), for the following ones (
k
≥
m
) thelast values calculated by the ﬁxed point algorithm inthe former outer loop iteration can be used. Anotherpossibility is to choose the initial values of
(
α
k
+
p
)
0
,
p
∈ {
0
,
···
,m
−
1
}
so as to minimize
r
k
+
p
+1
2
=
pi
=0
(
I
−
α
k
+
i
A
)
r
k
2
, that is:
(
α
k
+
p
)
0
=
r
T
k
p
−
1
i
=0
(
I
−
α
k
+
i
A
)
2
r
k
r
T
k
p
−
1
i
=0
(
I
−
α
k
+
i
A
)
2
Ar
k
, p >
0
(13)and
α
k
0
=
r
T
k
r
k
r
T
k
Ar
k
, for
p
= 0
. Once the ﬁxed point hasbeen found to within some desired approximation, theiteration for the residual vector proceeds as follows:
r
k
+
i
+1
= (
I
−
α
k
+
i
A
)
r
k
+
i
i
∈ {
0
,
···
,m
−
1
}
(14)The resulting iterative method is denoted DB
m
(
z
)
,where
m,z
are the integer values that deﬁne the errornorm in (5).
B. The generalized Barzilai–Borwein (GBB) method
The second method proposed in this paper, referredto as the generalized Barzilai–Borwein (GBB) method,uses the bilinear system (2) with the correspondingiteration in
x
(1). The choice of error or residual normin this case is:
V
z
1
(
g
k
+1
) =
g
k
+1
T
A
z
1
g
k
+1
(15)where
z
1
is an integer and the vector
g
k
+1
is deﬁned asfollows:
g
k
+1
=
r
k
+1
−
I
−
r
k
−
1
T
A
z
2
r
k
−
1
r
k
−
1
T
A
z
3
r
k
−
1
A
z
4
r
k
=
−
α
k
Ar
k
+
r
k
−
1
T
A
z
2
r
k
−
1
r
k
−
1
T
A
z
3
r
k
−
1
A
z
4
r
k
,
(16)
z
2
,
z
3
,
z
4
being integers. Thus:
V
z
1
(
g
k
+1
) =
α
2
k
r
k
T
A
2+
z
1
r
k
−
2
α
k
r
k
−
1
T
A
z
2
r
k
−
1
r
k
−
1
T
A
z
3
r
k
−
1
r
k
T
A
1+
z
1
+
z
4
r
k
+
r
k
−
1
T
A
z
2
r
k
−
1
r
k
−
1
T
A
z
3
r
k
−
1
2
r
k
T
A
2
z
4
+
z
1
r
k
The objective is to choose the control
α
k
so as tominimize
V
z
1
(
g
k
+1
)
. To this end, we calculate:
∂V
z
1
(
g
k
+1
)
∂α
k
= 2
α
k
r
k
T
A
2+
z
1
r
k
−
2
r
k
−
1
T
A
z
2
r
k
−
1
r
k
−
1
T
A
z
3
r
k
−
1
r
k
T
A
1+
z
1
+
z
4
r
k
= 0
and the minimizing choice of
α
k
is given as:
α
k
=
r
k
−
1
T
A
z
2
r
k
−
1
r
k
T
A
1+
z
1
+
z
4
r
k
r
k
−
1
T
A
z
3
r
k
−
1
r
k
T
A
2+
z
1
r
k
(17)The following special cases are of note:
•
z
1
=
−
1
z
4
= 0
z
2
=
z
3
→
(1) (
steepest descent
)
•
z
1
= 0
z
4
= 0
z
2
=
z
3
→
[6] (
orthomin
)
•
z
4
= 1
z
2
= 0
z
3
= 1
→
(4) (
BarzilaiBorwein
)Indeed, the last item provides the motivation for thenonintuitive choice of error norm (15)–(16), as well asfor the name of this method. We adopt the notationGBB(
z
1
,z
2
,z
3
,z
4
) to denote the particular member of this class of methods in which the integers
z
i
deﬁne themethod according to equations (15) and (16).
Remark:
The two new methods proposed in this sectionhave an interpretation that is worth reﬂecting on. Bothmethods aim to minimize Liapunovfunctions (5), (15) of the residual vector
r
. This residual vector
r
is itself thenegative of the gradient of the quadratic form
V
(
x
) :=
12
x
T
Ax
−
b
T
x
, the minimization of which (by a gradientmethod) gives rise to (1), the scalar
α
k
being interpretedas an appropriately chosen step size in the directionof the gradient. However, both of the new proposedmethods are designed based on a type of
m
step aheaddeadbeat control idea applied, not to (1), but to theequivalent (bilinear) system (2), the control sequenceof which is exactly the same step size sequence thatwill be applied to (1). From this perspective, it couldbe said that, instead of exact line search along
r
, theproposed methods choose the step size by an auxiliarycalculation using different Liapunov functions (5), (15)of the residual vector
r
. One consequence of this is thenonmonotonic behavior of the residual norms, visible inthe numerical experiments reported on below.III. N
UMERICAL EXPERIMENTS
In this section, the two proposed algorithms, DB
m
and GBB (with different parameter choices, which areindicated by different sufﬁxes (a,b,c etc.) to the abbreviations DB
m
and GBB) are compared with the srcinalBarzilai–Borwein (BB) algorithm. For the purposes of evaluation against a gold standard, results for the con jugate gradient (CG) iteration are also given, as wellas for the
three term recursion
(TTR) algorithm (seeappendix).For simplicity, in all cases, the right hand side
b
istaken as the zero vector
0
, so that the desired solution is
x
∗
=
0
; the initial residue is chosen as
r
0
= [1 1
···
1]
T
;and the stopping criterion is
r
k
<
10
−
4
.For the DB
m
algorithm, the stopping criterion for theﬁxed point iteration, that determines the step size (orcontrol) vector
α
, was chosen as
α
j
+1
−
α
j
<
10
−
2
,where
α
0
= [100
···
100]
T
for the ﬁrst outer loopiteration.The ﬁrst test matrix chosen was
A
=
A
1
=diag (1
,
6
,
23
,
58)
∈
R
4
×
4
. It has a condition number
WeA04.2
117
510152025303540455010
−5
10
−4
10
−3
10
−2
10
−1
10
0
iteration number
  r  
Log residual norm vs. iteration number for DB
m
, GBB and TTR
System Ax=bA=A
1
b=0r
0
=[1 1 1 1]
T
GBBaGBBbGBBcGBBdGBBeDBaDBbTTRGBBbGBBcGBBdGBBeDBbGBBaTTRDBa
Fig. 1. Test results with the matrix
A
1
, where DBa: DB
3
(0)
;DBb: DB
4
(
−
1)
; GBBa: BB; GBBb: GBB(
z
1
,
1
,
2
,
1
); GBBc:GBB(
0
,
1
,
2
,
0
); GBBd: SD; GBBe: GBB(
0
,z
2
=
z
3
,
0
) (OM)
κ
=
λ
max
(
A
1
)
λ
min
(
A
1
)
= 58
, and the starting point was chosenas
x
0
=
A
−
11
r
0
= [1 1
.
667 0
.
0453 0
.
0172]
T
.Figure 1 shows the norm of the residue as a functionof the number of iterations for each algorithm. Thesecond test matrix in
R
4
×
4
was chosen as:
A
=
A
2
=
135 37 2 4837 69 5
−
312 5 91
−
106
.
548
−
31
−
106
.
5 169
.
25
This is a symmetric positive deﬁnite matrixand has a condition number
κ
= 50046
.The starting point was
x
0
=
A
−
12
r
0
=[
−
79
.
3795 95
.
0206 164
.
1170 143
.
1424]
T
. Thenumerical results are shown in Figure 2. The third testmatrix
A
3
∈
R
6
×
6
was chosen as:
119
−
25
−
28 46 86
.
9 97
−
25 400
−
169
.
5 31 25
.
5
−
347
−
28
−
169 225
.
25 7
−
77
.
85 2246 31 7 103 63
.
5 586
.
9 25
.
5
−
77
.
85 63
.
5 189
.
34 93
.
597
−
347 22 5 93
.
5 486
The condition number of this matrix is
κ
= 57
.
6043
and the initial point chosen was
x
0
=
A
−
13
r
0
=[0
.
004 0
.
0633 0
.
0477
−
0
.
0149
−
0
.
0027 0
.
045]
T
. Testresults are shown in ﬁgure 3.Table 1 summarizes the test results, including theresults obtained using the "gold standard" CG algorithmand the TTR algorithm for the purposes of comparison.
510152025303540455010
−10
10
−5
10
0
10
5
Log residual norm vs. iteration number for DB
m
, GBB and TTR
System Ax=bA=A
2
b=0r
0
=[1 1 1 1]
T
iteration number
  r  
DBaDBbGBBaGBBbTTRDBbGBBaGBBbDBaTTR
Fig. 2. Test results with the matrix
A
2
, where DBa: DB
3
(
−
1)
; DBb:DB
4
(
−
1)
; GBBa: BB; GBBb: GBB(
z
1
,
1
,
2
,
1
).
102030405060708010
−6
10
−4
10
−2
10
0
Log residual norm vs. iteration number for DB
m
, GBB and TTR
System Ax=bA=A
3
b=0r
0
=[1 1 1 1 1 1]
T
iteration number
  r  
DBaDBbGBBaGBBbGBBcGBBdGBBeTTRDBbDBaTTRGBBbGBBcGBBdGBBaGBBe
Fig. 3. Test results with the matrix
A
3
, where DBa: DB
5
(0)
; DBb:DB
6
(
−
1)
; GBBa: SD; GBBb: GBB(
0
,z
2
=
z
3
,
0
) (OM); GBBc: BB;GBBd: GBB(
z
1
,
1
,
2
,
1
); GBBe: GBB(
−
1
,
0
,
1
,
0
).
The results in table 1 show that the DB
m
algorithmneeds a fairly low number of iterations to converge. The“inner loop” ﬁxed point iteration obviously increases thenumber of matrix times vector operations, but, for thissmall dimension of the test matrix, it is still lower forDB
m
than for the srcinal BB algorithm. The proposedDB
m
algorithm has a greater advantage as the conditionnumber of the test matrix increases.Another pertinent observation is that the number of iterations reduces as the number of steps ahead
m
approaches the size
n
of the matrix (
4
for
A
1
and
A
2
,
WeA04.2
118