# Linear Algebra: #21 Which Matrices can be Diagonalized?

Now we have seen that not all

*orthogonal*matrices can be diagonalized. (Think about the rotations of ℜ

^{2}.) On the other hand, we can prove that all

*unitary,*and also all

*Hermitian*matrices can be diagonalized.

Of course, a matrix M is only a representation of a linear mapping f :

**V**→

**V**with respect to a given basis {

**v**, . . . ,

_{1}**v**} of the vector space

_{n}**V**. So the idea that the matrix can be diagonalized is that it is similar to a diagonal matrix. That is, there exists another matrix S, such that S

^{−1}MS is diagonal.

But this means that there must be a basis for

**V**, consisting entirely of eigenvectors.

In this section we will consider complex vector spaces — that is,

**V**is a vector space over the complex numbers ℂ. The vector space

**V**will be assumed to have a scalar product associated with it, and the bases we consider will be orthonormal. We begin with a definition.

**Deﬁnition**

Let

**W**⊂

**V**be a subspace of

**V**. Let

**W**

^{⊥}= {

**v**∈

**V**: <

**v**,

**w**> = 0, ∀

**w**∈

**W**}.

Then

**W**

^{⊥}is called the

*perpendicular space*to

**W**.

It is a rather trivial matter to verify that

**W**

^{⊥}is itself a subspace of

**V**, and furthermore

**W**∩

**W**

^{⊥}= {

**0**}. In fact, we have:

**Theorem 53****V = W**⊕

**W**

^{⊥}.

*Proof*

Let {

**w**, . . . ,

_{1}**w**} be some orthonormal basis for the vector space

_{m}**W**. This can be extended to a basis {

**w**, . . . ,

_{1}**w**,

_{m}**w**, . . . ,

_{m+1}**w**} of

_{n}**V**. Assuming the GramSchmidt process has been used, we may assume that this is an orthonormal basis. The claim is then that {

**w**, . . . ,

_{m+1}**w**} is a basis for

_{n}**W**

^{⊥}.

Now clearly, since <

**w**,

_{j}**w**> = 0, for j ≠ k, we have {

_{k}**w**, . . . ,

_{m+1}**w**} ⊂

_{n}**W**

^{⊥}. If

**u**∈

**W**

^{⊥}is some arbitrary vector in

**W**

^{⊥}, then we have

since <

**w**,

_{j}**u**> = 0 if j ≤ m. (Remember,

_{}**u**∈

**W**

^{⊥}) Therefore, {

**w**, . . . ,

_{m+1}**w**} is a linearly independent, orthonormal set which generates

_{n}**W**

^{⊥}, so it is a basis. And so we have

**V = W**⊕

**W**

^{⊥}.

**Theorem 54**Let f :

**V**→

**V**be a unitary mapping (

**V**is a vector space over the complex numbers ℂ). Then there exists an orthonormal basis {

**v**, . . . ,

_{1}**v**} for

_{n}**V**consisting of eigenvectors under f. That is to say, the matrix of f with respect to this basis is a diagonal matrix.

*Proof*

If the dimension of

**V**is zero or one, then obviously there is nothing to prove. So let us assume that the dimension n is at least two, and we prove things by induction on the number n. That is, we assume that the theorem is true for spaces of dimension less than n.

Now, according to the fundamental theorem of algebra, the characteristic polynomial of f has a zero, λ say, which is then an eigenvalue for f. So there must be some non-zero vector

**v**∈

_{n}**V**, with f(

**v**) = λ

_{n}**v**. By dividing by the norm of

_{n}**v**if necessary, we may assume that ||

_{n}**v**|| = 1.

_{n}Let

**W**⊂

**V**be the 1-dimensional subspace generated by the vector

**v**. Then

_{n}**W**

^{⊥}is an n−1 dimensional subspace. We have that

**W**

^{⊥}is invariant under f. That is, if

**u**∈

**W**

^{⊥}is some arbitrary vector, then f(

**u**) ∈

**W**

^{⊥}as well. This follows since

λ<f(

**u**),**v**> = <f(_{n}**u**), λ**v**> = <f(_{n}**u**), f(**v**)> = <_{n}**u**,**v**> = 0._{n}But we have already seen that for an eigenvalue λ of a unitary mapping, we must have |λ| = 1. Therefore we must have <f(

**u**),

**v**> = 0.

_{n}So we can consider f, restricted to

**W**

^{⊥}, and using the inductive hypothesis, we obtain an orthonormal basis of eigenvectors {

**v**, . . . ,

_{1}**v**} for

_{n-1}**W**

^{⊥}. Therefore, adding in the last vector

**v**, we have an orthonormal basis of eigenvectors {

_{n}**v**, . . . ,

_{1}**v**} for

_{n}**V**.

**Theorem 55**All Hermitian matrices can be diagonalized.

*Proof*

This is similar to the last one. Again, we use induction on n, the dimension of the vector space

**V**. We have a self-adjoint mapping f :

**V**→

**V**. If n is zero or one, then we are ﬁnished. Therefore we assume that n ≥ 2.

Again, we observe that the characteristic polynomial of f must have a zero, hence there exists some eigenvalue λ, and an eigenvector

**v**of f, which has norm equal to one, where f(

_{n}**v**) = λ

_{n}**v**. Again take

_{n}**W**to be the one dimensional subspace of

**V**generated by

**v**. Let

_{n}**W**

^{⊥}be the perpendicular subspace. It is only necessary to show that, again,

**W**

^{⊥}is invariant under f. But this is easy. Let

**u**∈

**W**

^{⊥}be given. Then we have

<f(

.
**u**),**v**> = <_{n}**u**, f(**v**)> = <_{n}**u**, λ**v**> = λ<_{n}**u**,**v**> = λ· 0 = 0_{n}The rest of the proof follows as before.

In the particular case where we have only real numbers (which of course are a subset of the complex numbers), then we have a symmetric matrix.

**Corollary**

All real symmetric matrices can be diagonalized.

Note furthermore, that even in the case of a unitary matrix, the symmetry condition, namely a

_{jk }= a

_{kj}, implies that on the diagonal, we have a

_{jj }= a

_{jj}for all j. That is, the diagonal elements are all real numbers. But these are the eigenvalues. Therefore we have:

**Corollary**

The eigenvalues of a self-adjoint matrix — that is, a symmetric or a Hermitian matrix — are all real numbers.

**Orthogonal matrices revisited**

Let A be an n × n orthogonal matrix. That is, it consists of real numbers, and we have A

^{t}= A

^{−1}. In general, it cannot be diagonalized. But on the other hand, it can be brought into the following form by means of similarity transformations.

To see this, start by imagining that A represents the orthogonal mapping f : ℜ

^{n}→ ℜ

^{n}with respect to the canonical basis of ℜ

^{n}. Now consider the

*symmetric matrix*

B = A + A

^{t}^{}= A + A^{−1}.This matrix represents another linear mapping, call it g : ℜ

^{n}→ ℜ

^{n}, again with respect to the canonical basis of ℜ

^{n}.

But, as we have just seen, B

*can*be diagonalized. In particular, there exists some vector

**v**∈ ℜ

^{n}with g(

**v**) = λg(

**v**), for some λ ∈ ℜ. We now proceed by induction on the number n. There are two cases to consider:

**v**is also an eigenvector for f, or- it isn’t.

**W**⊂

**V**be simply

**W**= [

**v**]. i.e. this is just the set of all scalar multiples of

**v**. Let

**W**

^{⊥}be the perpendicular space to

**W**. (That is,

**w**∈

**W**means that <

**w**,

**v**> = 0.) But it is easy to see that

**W**

^{⊥}is also invariant under f. This follows by observing ﬁrst of all that f(

**v**) = α

**v**, with α = ±1. (Remember that the eigenvalues of orthogonal mappings have absolute value 1.) Now take

**w**∈

**W**

^{⊥}. Then <f(

**w**),

**v**> = α

^{−1}<f(

**w**), α

**v**> = α

^{−1}<f(

**w**), f(

**v**)> = α

^{−1}<f

**w**,

**v**> = α

^{−1}· 0 = 0. Thus, by changing the basis of ℜ

^{n}to being an orthonormal basis, starting with

**v**(which we can assume has been normalized), we obtain that the original matrix is similar to the matrix

where A

^{*}is an (n−1) ×(n−1) orthogonal matrix, which, according to the inductive hypothesis, can be transformed into the required form.

If

**v**is not an eigenvector of f, then, still, we know it is an eigenvector of g, and furthermore g = f + f

^{−1}. In particular, g(

**v**) = λ

**v**= f(

**v**) + f

^{−1}(

**v**). That is,

f(f(

**v**)) = λf(**v**) −**v**.So this time, let

**W**= [

**v**, f(

**v**)]. This is a 2-dimensional subspace of

**V**. Again, consider

**W**

^{⊥}. We have

**V**=

**W**⊕

**W**

^{⊥}. So we must show that

**W**

^{⊥}is invariant under f. Now we have another two cases to consider:

- λ = 0, and
- λ ≠ 0.

**v**)) = −

**v**. Therefore, again taking

**w**∈

**W**

^{⊥}, we have <f(

**w**),

**v**> = <f(

**w**), −f(f(

**v**))> = −<

**w**, f(

**v**)> = 0. (Remember that

**w**∈

**W**

^{⊥}, so that <

**w**, f(

**v**)> = 0.) Of course we also have <f(

**w**), f(

**v**)> = <

**w**,

**v**> = 0.

On the other hand, if λ ≠ 0 then we have

**v**= λf(

**v**) − f(f(

**v**)) so that <f(

**w**),

**v**> = <f(

**w**), λf(

**v**) − f(f(

**v**))> = λ<f(

**w**), f(

**v**)> − <f(

**w**), f(f(

**v**))>, and we have seen that both of these scalar products are zero. Finally, we again have <f(

**w**), f(

**v**)> = <

**w**,

**v**> = 0.

Therefore we have shown that

**V**=

**W**⊕

**W**

^{⊥}, where both of these subspaces are invariant under the orthogonal mapping f. By our inductive hypothesis, there is an orthonormal basis for f restricted to the n −2 dimensional subspace

**W**

^{⊥}such that the matrix has the required form. As far as

**W**is concerned, we are back in the simple situation of an orthogonal mapping ℜ

^{2}→ ℜ

^{2}, and the matrix for this has the form of one of our 2 × 2 blocks.

## No comments:

## Post a Comment

If it's a past exam question, do not include links to the paper. Only the reference.

Comments will only be published after moderation