The Moore—Penrose or
generalized inverse
1. Basic definitions
Equations of the form
Ax = b,AEC
lx
,xeC ,beC
m
1
ccur in many pure and applied problems. If
AEC
'
1
<'
1
and is invertible,
then the system of equations (1) is, in principle, easy to solve. The unique
solution is x =
A
b.
If
A is
an arbitrary matrix in C ' , then it becomes
more difficult to solve (1). There may be none, one, or an infinite number
of solutions depending on whether
beR(A)
and whether nrank
(A) > 0.
One would like to be able to find a matrix (or matrices) C, such that
solutions of (1) are of the form
Cb.
But if
b0R(A),
then (1) has no solution.
This will eventually require us to modify our concept of what a solution of
(1)
is. However, as the applications will illustrate, this is not as unnaturalas it sounds. But for now we retain the standard definition
of
solution.
To motivate our first definition
of
the generalized inverse, consider the
functional equation
Y
=f(x),xe
9
c III,
2)
where
f is
a realvalued function with domain
9ã
One procedure for solving
(2)
is to restrict the domain
off
to a smaller set
9' so
that
f=
s one to
one. Then an inverse function
f
' from
R(f)
to
9 is
defined by
f

'(y) = x
if x
e
S' and
f
(x) = y. Thus
f

'(y) is
a solution
of
(2)
for y
e R(f
). This is
how the arcsec, arcsin, and other inverse functions are normally defined.The same procedure can be used in trying to solve equation (1). Asusual, we let A be the linear function from C into
Cm
defined by Ax =
Ax
for
xEC . To
make A a one to one linear transformation it must berestricted to a subspace complementary to
N(A).
An obvious one is
N(A)
1
= R(A*).
This suggests the following definition
of
the generalized
inverse.
Definition 1.1.1
Functional definition
of
the generalized inverse
f
D o w n l o a d e d 0 4 / 1 9 / 1 6 t o 1 3 2 . 2 3 9 . 1 . 2 3 1 . R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p : / / w w w . s i a m . o r g / j o u r n a l s / o j s a . p h p
THE
MOORE
PENROSE OR GENERALIZED INVERSE
9
AEC ' , define the linear transformation At :Cm +
C
by Atx =
0
if
xeR(A) and Atx = (A
R
A
*)
)
1
x
if
xeR(A). The matrix of At is denoted At
and is called the generalized inverse of A.
It is easy to check that
AAtx =
0
if xeR(A)l
and
AAtx =
x
if xER(A).
Similarly,
AtAx =
0
if xeN(A) = R(A*)
and
AtAx =
x
if xER(A*) = R(At).
Thus
AAt
is the orthogonal projector
of
C '
onto
R(A)
while
AtA
is the
orthogonal projector
of Cn
onto
R(A*) = R(At).
This suggests a second
definition
of
the generalized inverse due to
E. H.
Moore
Definition
1.1.2
Moore definition
of
the generalized inverse
f
A
u C
21
x
,
hen the generalized inverse
of
A is defined to be the unique
matrix At such that
(a)
AAt =
P
R(A)
, and
(b)
AtA =
P
R(A
,
)
.
Moore's definition was given in
1935
and then more or less forgotten.
This is possibly due to the fact that it was not expressed in the form
of
Definition
2
but rather in a more cumbersome (no pun intended) notation.
An algebraic form
of
Moore's definition was given in
1955
by Penrose who
was apparently unaware
of
Moore's work.
Definition
1. 1.3
Penrose definition
of
the generalized inverse
f
AeC' , then At is the unique matrix in
C
'2
such that
(i)
AAtA = A,
(ii)
AtAAt = At,
(iii)
(AAt)* = AAt,
(iv)
(AtA)* = AtA.
The first important fact to be established is the equivalence
of
thedefinitions.
Theorem
1.1.1
The functional, Moore and Penrose definitions
of
the
generalized inverse are equivalent.
Proof
We have already noted that if
At
satisfies Definition
1,
then it
satisfies equations (a) and
(b).
If
a matrix
At
satisfies (a) and
(b)
then it
immediately satisfies (iii) and (iv). Furthermore
(i)
follows from (a) by
observing that
AAtA =
P
R(A)
A = A. (ii) will
follow from
(b)
in a similar
manner. Since Definition
1
was constructive and the
At
it constructs
satisfies (a),
(b)
and
(i)
(iv), the question
of
existence in Definitions
2
and
3
is already taken care of. There are then two things remaining to be proven.
One is that a solution
of
equations
(i)
(iv) is a solution
of
(a) and
(b).
The
second is that a solution
of
(a) and
(b)
or
(i)
(iv) is unique.
Suppose then that
At is
a matrix satisfying
(i)
(iv). Multiplying (ii) onthe left by
A
gives
(AAt)
2
= (
AAt).
This and (iii) show that
AAt is
an
orthogonal projector. We must show that it has range equal to the range
D o w n l o a d e d 0 4 / 1 9 / 1 6 t o 1 3 2 . 2 3 9 . 1 . 2 3 1 . R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p : / / w w w . s i a m . o r g / j o u r n a l s / o j s a . p h p
10 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
of A. Using
(i)
and the fact that R(BC) c R(B) for matrices B and C, we get
R(A) = R(AAtA) 9 R(AAt) c R(A), so that R(A) = R(AAt).
Thus AAt = P
R(A)
as desired. The proof that AtA = PR(A') is similar and is
left to the reader as an exercise. One way to show uniqueness is to show
that if At satisfies (a) and (b), or
(i)
—(iv), then it satisfies Definition 1.
Suppose then that At is a matrix satisfying
(i)
—(iv), (a), and (b). If xeR(A)
1
,
then by (a), AAtx = 0. Thus by (ii) Atx = AtAAtx = At0 = 0. If xeR(A),then there exist yeR(A*) such that Ay = x. But Atx = AtAy = y. The last
equality follows byobserving that taking the adjoint of both sides of
(i)
gives P
R(At)
A = A* so that R(A*) 9 R(At). But y = (AI
R(A*)
)

'x. Thus
At satisfies Definition 1.
As this proof illustrates, equations
(i)
and (ii) are, in effect, cancellation
laws. While we cannot say that AB = AC implies B = C, we can say that if
ATAB = AtAC then AB = AC. This type of cancellation will frequently
appear in proofs and the exercises.For obvious reasons, the generalized inverse is often referred to as theMoore—Penrose inverse. Note also that if A e C and A is invertible,then A
1
= At so that the generalized inverse lives up to its name.
2. Basic properties of the generalized inverse
Before proceeding to establish some of what is true about generalized
inverses, the reader should be warned about certain things that are not
true.
While it is true that R(A*) = R(At), if At is the generalized inverse,
condition (b) in Definition 2 cannot be replaced by AtA = P
R(A
,^.
Example 1.2.1
Let A =
[
0
, X =
1
,
01. Since XA = AX =
1
01
P _ P
satisfies AX = P
R(A)
and XA = P
ut
O O
J
(A)
(A')^
(Aã)'
At
= [1
/2
0] and hence X # At. Note that XA * P
R(X)
and thus
XAX * X. If XA = P
R(A*)
, AX = P
R(A)
, and in addition XAX = X, thenX = At. The proof of this last statement is left to the exercises.
In computations involving inverses one frequently uses (AB)

' _
B

'A

' if A and B are invertible. This fails to hold for generalized
inverses even if AB = BA.
Fact 1 2 1
If A E C ` , B c C
p,
then (AB)t is not necessarily the same
as BtAt. Furthermore (At)
2
is not necessarily equal to (AZ)t.
Example 1.2.2
Let A = [
]. Then At = 2
[
1 0
J
. Now
D o w n l o a d e d 0 4 / 1 9 / 1 6 t o 1 3 2 . 2 3 9 . 1 . 2 3 1 . R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p : / / w w w . s i a m . o r g / j o u r n a l s / o j s a . p h p
THE
MOORE
—PENROSE OR GENERALIZED INVERSE
11
A
Z
= A while Ate = ?At. Thus (At)
2
A
2
= ? AtA which is not a projection.Thus (At)2 *
(
A')t
.
Ways of calculating At will be given shortly. The generalized inverses in
Examples
1
and
2
can be found directly from Definition
1
without too
much difficulty.
Examples
2
illustrates another way in which the properties of the
generalized inverse differ from those of the inverse. If A is invertible, then
2 ea(A) if and only if
1
ea(A
1
).
If A =
[
ß
I
as in Example
2,
then
6(A)
=
{1,0}
while a(At) = {
i
,
0
}
If A is similar to a matrix
C,
then A and
C
have the same eigenvalues,
the same Jordan form, and the same characteristic polynomial. None of
these are preserved by taking of the generalized inverse.
1 1 —1
Example
1 2 2
Let A =
0
2
. Then A
= BJB
where
—1 1
1 0 1
1
0
B= 01 1
nd
J=
10
00
The characteristic polynomial of A
1 0
oj
0 2
000
and
J
is
2
2
(2
—
2)
with elementary divisors
2
2
and I —
2.
Jt =
1 00
0 0 1/2
and the characteristic polynomial of Jt is
2
2
(1
—
1/2)
with elementary
1
1
divisors
2
2
,(2
—
1/2).
An easy computation gives At =
1/12
I
6
[12 1
But At has characteristic polynomial
2(1
—
(1
+ 13)/12)(A —
(1
—
13)/12)
and hence a diagonal Jordan form.
Thus, if A and
C
are similar, then about the only thing that one canalways say about At and Ct is that they have the same rank.
A type of inverse that behaves better with respect to similarity isdiscussed in Chapter VII. Since the generalized inverse does not have allthe properties of the inverse, it becomes important to know what propertiesit does have and which identities it does satisfy. There are, of course, anarbitrarily large number of true statements about generalized inverses. Thenext theorem lists some of the more basic properties.
Theorem
1 2 1
Suppose that
AEC' .
Then
(P1)
(At)t = A
(P2)
(At)* _ (A
*)t
(P3)
If
leC, (IA)t = ItAt
where
At = 
if I
0
and
At =
0
if2
=0.
(P4)
A* = A*AAt = AtAA*
(P5)
(A*A)t = AtA
*t
D o w n l o a d e d 0 4 / 1 9 / 1 6 t o 1 3 2 . 2 3 9 . 1 . 2 3 1 . R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p : / / w w w . s i a m . o r g / j o u r n a l s / o j s a . p h p