FOLLOW US ON TWITTER
SHARE THIS PAGE ON FACEBOOK, TWITTER, WHATSAPP ... USING THE BUTTONS ON THE LEFT


YOUR PARTICIPATION FOR THE GROWTH OF PHYSICS REFERENCE BLOG
Showing posts with label High. Show all posts
Showing posts with label High. Show all posts

Thursday, June 26, 2014

Linear Algebra: #22 Dual Spaces

  • Linear Algebra: #22 Dual Spaces

Again let V be a vector space over a field F (and, although its not really necessary here, we continue to take F = ℜ or ℂ).

Definition
The dual space to V is the set of all linear mappings f : V → F. We denote the dual space by V*.

Examples
  • Let V = ℜn. Then let fi be the projection onto the i-th coordinate. That is, if ej is the j-th canonical basis vector, then

    Linear Algebra: #22 Dual Spaces equation pic 1
    So each fi is a member of V*, for i = 1, . . . , n, and as we will see, these dual vectors form a basis for the dual space.


  • More generally, let V be any finite dimensional vector space, with some basis {v1, . . . , vn}. Let fi : V → F be defined as follows. For an arbitrary vector vV there is a unique linear combination

    v = a1v1 + · · · + anvn

    Then let fi(vi ) = ai. Again, fiV*, and we will see that the n vectors, f1, . . . , fn form a basis of the dual space.


  • Let C0([0, 1]) be the space of continuous functions f : [0, 1] → ℜ. As we have seen, this is a real vector space, and it is not finite dimensional. For each f ∈ C0([0, 1]) let

    Linear Algebra: #22 Dual Spaces equation pic 2
    This gives us a linear mapping Λ : C0([0, 1]) → ℜ. Thus it belongs to the dual space of C0([0, 1]).


  • Another vector in the dual space to C0([0, 1]) is given as follows. Let x ∈ [0, 1] be some fixed point. Then let Γx : C0([0, 1]) → ℜ is defined to be Γ(f) = f(x), for all f ∈ C0([0, 1]).


  • For this last example, let us assume that V is a vector space with scalar 0 product. (Thus F = ℜ or ℂ) For each vV, let φv(u) = <v, u>. Then φvV*.


Theorem 56
Let V be a finite dimensional vector space (over ℂ) and let V* be the dual space. For each vV, let φv : V → ℂ be given by φv(u) = <v, u>. Then given an orthonormal basis {v1, . . . , vn} of V, we have that {φv1, . . ., φvn} is a basis of V*. This is called the dual basis to {v1, . . . , vn}.

Proof
Let φ ∈ V* be an arbitrary linear mapping φ : V → ℂ. But, as always, we remember that φ is uniquely determined by vectors (which in this case are simply complex numbers) φ(v1), . . . , φ(vn). Say φ(vj) ∈ ℂ, for each j. Now take some arbitrary vector vV. There is the unique expression

Linear Algebra: #22 Dual Spaces equation pic 3

Therefore, φ = c1φv1 + · · · + cnφvn, and so {φv1, . . ., φvn} generates V*.

To show that {φv1, . . ., φvn} is linearly independent, let φ = c1φv1 + · · · + cnφvn be some linear combination, where cj ≠ 0, for at least one j. But then φ(vj) = cj ≠ 0, and thus φ ≠ 0 in V*.

Corollary
dim(V*) = dim(V).

Corollary
More specifically, we have an isomorphism VV*, such that v → φv for each v ∈ V.

But somehow, this isomorphism doesn’t seem to be very “natural”. It is defined in terms of some specific basis of V. What if V is not finite dimensional so that we have no basis to work with? For this reason, we do not think of V and V* as being “really” just the same vector space. [In case we have a scalar product, then there is a “natural” mapping V → V*, where v → φv, such that φv(u) = <v, u>, for all uV.]

On the other hand, let us look at the dual space of the dual space (V*)*. (Perhaps this is a slightly mind-boggling concept at first sight!) We imagine that “really” we just have (V*)* = V. For let Φ ∈ (V*)*. That means, for each φ ∈ V* we have Φ(φ) being some complex number. On the other hand, we also have φ(v) being some complex number, for each VV. Can we uniquely identify each VV with some Φ ∈ (V*)*, in the sense that both always give the same complex numbers, for all possible φ ∈ V*?

Let us say that there exists a vV such that Φ(φ) = φ(v), for all φ ∈ V*. In fact, if we define φv to be Φ(φ) = φ(v), for each φ ∈ V*, then we certainly have a linear mapping, V* → ℂ. On the other hand, given some arbitrary Φ ∈ (V*)*, do we have a unique vV such that Φ(φ) = φ(v), for all φ ∈ V*? At least in the case where V is finite dimensional, we can affirm that it is true by looking at the dual basis.


Dual mappings 
Let V and W be two vector spaces (where we again assume that the field is ℂ). Assume that we have a linear mapping f : VW. Then we can define a linear mapping f * : W*V* in a natural way as follows. For each φ ∈ W*, let f *(φ) = φ ◦ f. So it is obvious that f *(φ) : V → ℂ is a linear mapping. Now assume that V and W have scalar products, giving us the mappings s : VV* and t : WW*. So we can draw a little “diagram” to describe the situation.

Linear Algebra: #22 Dual Spaces equation pic 4
The mappings s and t are isomorphisms, so we can go around the diagram, using the mapping f adj = s−1 ◦ f* ◦ t : WV. This is the adjoint mapping to f. So we see that in the case V = W, we have that a self-adjoint mapping f : VV is such that f adj = f.

Does this correspond with our earlier definition, namely that <u, f(v)> = <f(u), v> for all u and vV? To answer this question, look at the diagram, which now has the form

Linear Algebra: #22 Dual Spaces equation pic 5
where s(v) ∈ V* is such that s(v)(u) = <v, u>, for all uV. Now f adj = s−1 ◦ f* ◦ s; that is, the condition    f adj = f becomes s−1 ◦ f* ◦ s = f. Since s is an isomorphism, we can equally say that the condition is that f* ◦ s = s ◦ f. So let v be some arbitrary vector in V. We have s ◦ f(v) = f* ◦ s(v). However, remembering that this is an element of V*, we see that this means

(s ◦ f(v))(u) = (f* ◦ s)(v)(u), 

for all uV. But (s ◦ f(v))(u) = <f(v), u> and (f* ◦ s)(v)(u) = <v, f(u)>. Therefore we have

<f(v), u> = <v, f(u)>

for all v and uV, as expected.




This is the last section for this series on Linear Algebra. But that is not to say that there is nothing more that you have to know about the subject. For example, when studying the theory of relativity you will encounter tensors, which are combinations of linear mappings and dual mappings. One speaks of “covariant” and “contravariant” tensors. That is, linear mappings and dual mappings.

But then, proceeding to the general theory of relativity, these tensors are used to describe differential geometry. That is, we no longer have a linear (that is, a vector) space. Instead, we imagine that space is curved, and in order to describe this curvature, we define a thing called the tangent vector space which you can think of as being a kind of linear approximation to the spacial structure near a given point. And so it goes on, leading to more and more complicated mathematical constructions, taking us away from the simple “linear” mathematics which we have seen in this semester.

After a few years of learning the mathematics of contemporary theoretical physics, perhaps you will begin to ask yourselves whether it really makes so much sense after all. Can it be that the physical world is best described by using all of the latest techniques which pure mathematicians happen to have been playing around with in the last few years — in algebraic topology, functional analysis, the theory of complex functions, and so on and so forth? Or, on the other hand, could it be that physics has been loosing touch with reality, making constructions similar to the theory of epicycles of the medieval period, whose conclusions can never be verified using practical experiments in the real world?




IMPORTANT NOTE:
This series on Linear Algebra has been taken from the lecture notes prepared by Geoffrey Hemion. I used his notes when studying Linear Algebra for my physics course and it was really helpful. So, I thought that you could also benefit from his notes. The document can be found at his homepage.

Tuesday, June 24, 2014

Linear Algebra: #21 Which Matrices can be Diagonalized?

  • Linear Algebra: #21 Which Matrices can be Diagonalized?

The complete answer to this question is a bit too complicated. It all has to do with a thing called the “minimal polynomial”.

Now we have seen that not all orthogonal matrices can be diagonalized. (Think about the rotations of ℜ2.) On the other hand, we can prove that all unitary, and also all Hermitian matrices can be diagonalized.

Of course, a matrix M is only a representation of a linear mapping f : VV with respect to a given basis {v1, . . . , vn} of the vector space V. So the idea that the matrix can be diagonalized is that it is similar to a diagonal matrix. That is, there exists another matrix S, such that S−1MS is diagonal.

Linear Algebra: #21 Which Matrices can be Diagonalized? equation pic 1

But this means that there must be a basis for V, consisting entirely of eigenvectors.

In this section we will consider complex vector spaces — that is, V is a vector space over the complex numbers ℂ. The vector space V will be assumed to have a scalar product associated with it, and the bases we consider will be orthonormal. We begin with a definition.

Definition
Let WV be a subspace of V. Let

 W = {vV : <v, w> = 0, ∀wW}.

Then W is called the perpendicular space to W.

It is a rather trivial matter to verify that W is itself a subspace of V, and furthermore WW = {0}. In fact, we have:


Theorem 53
V = W W

Proof
Let {w1, . . . , wm} be some orthonormal basis for the vector space W. This can be extended to a basis {w1, . . . , wm, wm+1, . . . , wn} of V. Assuming the GramSchmidt process has been used, we may assume that this is an orthonormal basis. The claim is then that {wm+1, . . . , wn} is a basis for W.

Now clearly, since <wj, wk> = 0, for j ≠ k, we have {wm+1, . . . , wn} ⊂ W.  If uW is some arbitrary vector in W, then we have

Linear Algebra: #21 Which Matrices can be Diagonalized? equation pic21

since <wj, u> = 0 if j ≤ m. (Remember, uW) Therefore, {wm+1, . . . , wn} is a linearly independent, orthonormal set which generates W, so it is a basis. And so we have V = W W.


Theorem 54
Let f : VV be a unitary mapping (V is a vector space over the complex numbers ℂ). Then there exists an orthonormal basis {v1, . . . , vn} for V consisting of eigenvectors under f. That is to say, the matrix of f with respect to this basis is a diagonal matrix.

Proof
If the dimension of V is zero or one, then obviously there is nothing to prove. So let us assume that the dimension n is at least two, and we prove things by induction on the number n. That is, we assume that the theorem is true for spaces of dimension less than n.

Now, according to the fundamental theorem of algebra, the characteristic polynomial of f has a zero, λ say, which is then an eigenvalue for f. So there must be some non-zero vector vnV, with f(vn) = λvn. By dividing by the norm of vn if necessary, we may assume that ||vn|| = 1.

Let WV be the 1-dimensional subspace generated by the vector vn. Then W is an n−1 dimensional subspace. We have that W is invariant under f. That is, if uW is some arbitrary vector, then f(u) ∈ W as well. This follows since

λ<f(u), vn> = <f(u), λvn> = <f(u), f(vn)> = <u, vn> = 0. 

But we have already seen that for an eigenvalue λ of a unitary mapping, we must have |λ| = 1. Therefore we must have <f(u), vn> = 0.

So we can consider f, restricted to W, and using the inductive hypothesis, we obtain an orthonormal basis of eigenvectors {v1, . . . , vn-1} for W. Therefore, adding in the last vector vn, we have an orthonormal basis of eigenvectors {v1, . . . , vn} for V.


Theorem 55
All Hermitian matrices can be diagonalized.

Proof
This is similar to the last one. Again, we use induction on n, the dimension of the vector space V. We have a self-adjoint mapping f : VV. If n is zero or one, then we are finished. Therefore we assume that n ≥ 2.


Again, we observe that the characteristic polynomial of f must have a zero, hence there exists some eigenvalue λ, and an eigenvector vn of f, which has norm equal to one, where f(vn) = λvn. Again take W to be the one dimensional subspace of V generated by vn. Let W be the perpendicular subspace. It is only necessary to show that, again, W is invariant under f. But this is easy. Let uW be given. Then we have

 <f(u), vn> = <u, f(vn)> = <u, λvn> = λ<u, vn> = λ· 0 = 0
.
The rest of the proof follows as before.

In the particular case where we have only real numbers (which of course are a subset of the complex numbers), then we have a symmetric matrix.

Corollary
All real symmetric matrices can be diagonalized.

Note furthermore, that even in the case of a unitary matrix, the symmetry condition, namely ajk = akj , implies that on the diagonal, we have ajj = ajj for all j. That is, the diagonal elements are all real numbers. But these are the eigenvalues. Therefore we have:

Corollary
The eigenvalues of a self-adjoint matrix — that is, a symmetric or a Hermitian matrix — are all real numbers.


Orthogonal matrices revisited
Let A be an n × n orthogonal matrix. That is, it consists of real numbers, and we have At = A−1. In general, it cannot be diagonalized. But on the other hand, it can be brought into the following form by means of similarity transformations.

Linear Algebra: #21 Which Matrices can be Diagonalized? equation pic 3

To see this, start by imagining that A represents the orthogonal mapping f : ℜn → ℜn with respect to the canonical basis of ℜn. Now consider the symmetric matrix

B = A + At = A + A−1

This matrix represents another linear mapping, call it g : ℜn → ℜn, again with respect to the canonical basis of ℜn.

But, as we have just seen, B can be diagonalized. In particular, there exists some vector v ∈ ℜn with g(v) = λg(v), for some λ ∈ ℜ. We now proceed by induction on the number n. There are two cases to consider:
  • v is also an eigenvector for f, or 
  • it isn’t. 
The first case is easy. Let WV be simply W = [v]. i.e. this is just the set of all scalar multiples of v. Let W be the perpendicular space to W. (That is, wW means that <w, v> = 0.) But it is easy to see that W is also invariant under f. This follows by observing first of all that f(v) = αv, with α = ±1. (Remember that the eigenvalues of orthogonal mappings have absolute value 1.) Now take wW. Then <f(w), v> = α−1<f(w), αv> = α−1<f(w), f(v)> = α−1<fw, v> = α−1 · 0 = 0. Thus, by changing the basis of ℜn to being an orthonormal basis, starting with v (which we can assume has been normalized), we obtain that the original matrix is similar to the matrix

Linear Algebra: #21 Which Matrices can be Diagonalized? equation pic 4
where A* is an (n−1) ×(n−1) orthogonal matrix, which, according to the inductive hypothesis, can be transformed into the required form.

If v is not an eigenvector of f, then, still, we know it is an eigenvector of g, and furthermore g = f + f −1. In particular, g(v) = λv = f(v) + f −1(v). That is,

f(f(v)) = λf(v) − v.

So this time, let W = [v, f(v)]. This is a 2-dimensional subspace of V. Again, consider W. We have V = WW. So we must show that W is invariant under f. Now we have another two cases to consider:
  • λ = 0, and 
  • λ ≠ 0. 
So if λ = 0 then we have f(f(v)) = −v. Therefore, again taking wW, we have <f(w), v> = <f(w), −f(f(v))> = −<w, f(v)> = 0. (Remember that wW, so that <w, f(v)> = 0.) Of course we also have <f(w), f(v)> = <w, v> = 0.

On the other hand, if λ ≠ 0 then we have v = λf(v) − f(f(v)) so that <f(w), v> = <f(w), λf(v) − f(f(v))> = λ<f(w), f(v)> − <f(w), f(f(v))>, and we have seen that both of these scalar products are zero. Finally, we again have <f(w), f(v)> = <w, v> = 0.

Therefore we have shown that V = WW, where both of these subspaces are invariant under the orthogonal mapping f. By our inductive hypothesis, there is an orthonormal basis for f restricted to the n −2 dimensional subspace W such that the matrix has the required form. As far as W is concerned, we are back in the simple situation of an orthogonal mapping ℜ2 → ℜ2, and the matrix for this has the form of one of our 2 × 2 blocks.

Sunday, June 22, 2014

Linear Algebra: #20 Characterizing Orthogonal, Unitary, and Hermitian Matrices

  • Linear Algebra: #20 Characterizing Orthogonal, Unitary, and Hermitian Matrices

20.1 Orthogonal matrices
Let V be an n-dimensional real vector space (that is, over the real numbers ℜ), and let {v1, . . . , vn} be an orthonormal basis for V. Let f : VV be an orthogonal mapping, and let A be its matrix with respect to the basis {v1, . . . , vn}. Then we say that A is an orthogonal matrix.


Theorem 50
The n × n matrix A is orthogonal ⇔ A−1 = At (Recall that if aij is the ij-th element of A, then the ij-th element of At is aji . That is, everything is “flipped over” the main diagonal in A.)

Proof
For an orthogonal mapping f, we have <u, w> = <f(u), f(w)>, for all j and k. But in the matrix notation, the scalar product becomes the inner product. That is, if

Linear Algebra: #20 Characterizing Orthogonal, Unitary, and Hermitian Matrices equation pic 1

In other words, the matrix whose jk-th element is always <vj , vk> is the n×n identity matrix In. On the other hand,

Linear Algebra: #20 Characterizing Orthogonal, Unitary, and Hermitian Matrices equation pic 2

That is, we obtain the j-th column of the matrix A. Furthermore, since <vj , vk> = <f(vj), f(vk)>, we must have the matrix whose jk-th elements are <f(vj), f(vk)> being again the identity matrix. So

Linear Algebra: #20 Characterizing Orthogonal, Unitary, and Hermitian Matrices equation pic 3

But now, if you think about it, you see that this is just one part of the matrix multiplication AtA. All together, we have

Linear Algebra: #20 Characterizing Orthogonal, Unitary, and Hermitian Matrices equation pic 4

Thus we conclude that A−1 = At. (Note: this was only the proof that f orthogonal ⇒ A−1 = At. The proof in the other direction, going backwards through our argument, is easy, and is left as an exercise for you.)


20.2 Unitary matrices


Theorem 51
The n × n matrix A is unitary ⇔ A−1 = At. (The matrix A is obtained by taking the complex conjugates of all its elements.)

Proof
Entirely analogous with the case of orthogonal matrices. One must note however, that the inner product in the complex case is

Linear Algebra: #20 Characterizing Orthogonal, Unitary, and Hermitian Matrices equation pic 5

20.3 Hermitian and symmetric matrices


Theorem 52
The n × n matrix A is Hermitian ⇔ A−1 = At.

Proof
This is again a matter of translating the condition <vj, f(vk)> = <f(vj), vk> into matrix notation, where f is the linear mapping which is represented by the matrix A, with respect to the orthonormal basis {v1, . . . , vn}. We have

Linear Algebra: #20 Characterizing Orthogonal, Unitary, and Hermitian Matrices equation pic 6

In particular, we see that in the real case, self-adjoint matrices are symmetric.

Saturday, June 21, 2014

Linear Algebra: #19 "Classical Groups" often seen in Physics

  • Linear Algebra: #19 "Classical Groups" often seen in Physics


  • The orthogonal group O(n): This is the set of all linear mappings f : ℜn → ℜn such that <u, v> = <f(u), f(v)>, for all u, v ∈ ℜn. We think of this as being all possible rotations and inversions (Spiegelungen) of n-dimensional Euclidean space. 

  • The special orthogonal group SO(n): This is the subgroup of O(n), containing all orthogonal mappings whose matrices have determinant +1. 

  • The unitary group U(n): The analog of O(n), where the vector space is n-dimensional complex space ℂn. That is, <u, v> = <f(u), f(v)>, for all u, v ∈ ℂn

  • The special unitary group SU(n): Again, the subgroup of U(n) with determinant +1. 

Note that for orthogonal, or unitary mappings, all eigenvalues — if they exist — must have absolute value 1. To see this, let v be an eigenvector with eigenvalue λ. Then we have

Linear Algebra: #19 Classical Groups often seen in Physics equation pic 1

Since v is an eigenvector, and thus v0, we must have |λ| = 1.

We will prove that all unitary matrices can be diagonalized. That is, for every unitary mapping ℂn → ℂn, there exists a basis consisting of eigenvectors. On the other hand, as we have already seen in the case of simple rotations of 2-dimensional space, “most” orthogonal matrices cannot be diagonalized. On the other hand, we can prove that every orthogonal mapping ℜn → ℜn, where n is an odd number, has at least one eigenvector. [For example, in our normal 3-dimensional space of physical reality, any rotating object — for example the Earth rotating in space — has an axis of rotation, which is an eigenvector.]

  • The self-adjoint mappings f (of ℜn → ℜn or ℂn → ℂn) are such that <u, f(v)> = <f(v), u>, for all u, v in ℜn or ℂn, respectively. As we will see, the matrices for such mappings are symmetric in the real case, and Hermitian in the complex case. In either case, the matrices can be diagonalized. Examples of Hermitian matrices are the Pauli spin-matrices:

    Linear Algebra: #19 Classical Groups often seen in Physics equation pic 2

We also have the Lorentz group, which is important in the Theory of Relativity. Let us imagine that physical space is ℜ4, and a typical point is v = (tv, xv, yv, zv). Physicists call this Minkowski space, which they often denote by M4. A linear mapping f : M4 → M4 is called a Lorentz transformation if, for f(v) = (tv*, xv*, yv*, zv*), we have

  • − (tv*)2 + (xv*)2 + (yv*)2 +  (zv*)2 = − tv2 + xv2 + yv2 +  zv2 for  all v ∈ M4, and also the mapping is “time-preserving” in the sense that the unit vector in the time direction, (1, 0, 0, 0) is mapped to some vector (t*, x*, y*, z*), such that t* > 0.

The Poincare group is obtained if we consider, in addition, translations of Minkowski space. But translations are not linear mappings, so I will not consider these things further in this lecture.

Friday, June 20, 2014

Linear Algebra: #18 Orthogonal Bases

  • Linear Algebra: #18 Orthogonal Bases

Our vector space V is now assumed to be either Euclidean, or else unitary — that is, it is defined over either the real numbers ℜ, or else the complex numbers ℂ. In either case we have a scalar product <·,·> : V × V → F (here, F = ℜ or ℂ).

As always, we assume that V is finite dimensional, and thus it has a basis {v1, . . . , vn}. Thinking about the canonical basis for ℜn or ℂn, and the inner product as our scalar product, we see that it would be nice if we had
  • <vj , vj> = 1, for all j (that is, the basis vectors are normalized), and furthermore 
  • <vj , vk> = 0, for all j ≠ k (that is, the basis vectors are an orthogonal set in V). 
Linear Algebra: #18 Orthogonal Bases equation pic 1


That is to say, {v1, . . . , vn} is an orthonormal basis of V. Unfortunately, most bases are not orthonormal. But this doesn’t really matter. For, starting from any given basis, we can successively alter the vectors in it, gradually changing it into an orthonormal basis. This process is often called the Gram-Schmidt orthonormalization process. But first, to show you why orthonormal bases are good, we have the following theorem.


Theorem 48
Let V have the orthonormal basis {v1, . . . , vn}, and let x V be arbitrary. Then

Linear Algebra: #18 Orthogonal Bases equation pic 2

That is, the coefficients of x, with respect to the orthonormal basis, are simply the scalar products with the respective basis vectors.

Proof
This follows simply because if x = ∑ ajvj, with (j = 1, ... , n), then we have for each k,

Linear Algebra: #18 Orthogonal Bases equation pic 3

So now to the Gram-Schmidt process. To begin with, if a non-zero vector vV is not normalized — that is, its norm is not one — then it is easy to multiply it by a scalar, changing it into a vector with norm one. For we have <v, v> 0. Therefore ||v|| = √<v, v> > 0 and we have

Linear Algebra: #18 Orthogonal Bases equation pic 4

In other words, we simply multiply the vector by the inverse of its norm.


Theorem 49
Every finite dimensional vector space V which has a scalar product has an orthonormal basis.

Proof
The proof proceeds by constructing an orthonormal basis {u1, . . . , un} from a given, arbitrary basis {v1, . . . , vn}. To describe the construction, we use induction on the dimension, n. If n = 1 then there is almost nothing to prove. Any non-zero vector is a basis for V, and as we have seen, it can be normalized by dividing by the norm. (That is, scalar multiplication with the inverse of the norm.)

So now assume that n ≥ 2, and furthermore assume that the Gram-Schmidt process can be constructed for any n−1 dimensional space. Let UV be the subspace spanned by the first n − 1 basis vectors {v1, . . . , vn−1}. Since U is only n − 1 dimensional, our assumption is that there exists an orthonormal basis {u1, . . . , un−1} for U. Clearly, adding in vn gives a new basis {u1, . . . , un−1, vn} for V.

[Since both {v1, . . . , vn−1} and {u1, . . . , un−1} are bases for U, we can write each vj as a linear combination of the uk’s. Therefore {u1, . . . , un−1, vn} spans V, and since the dimension is n, it must be a basis.]

Unfortunately, this last vector, vn, might disturb the nice orthonormal character of the other vectors. Therefore, we replace vn with the new vector (A linear independent set remains linearly independent if one of the vectors has some linear combination of the other vectors added on to it.)

Linear Algebra: #18 Orthogonal Bases equation pic 6

Wednesday, June 18, 2014

Linear Algebra: #17 Scalar Products, Norms, etc.

  • Linear Algebra: #17 Scalar Products, Norms, etc.

So now we have arrived at the subject matter which is usually taught in the second semester of the beginning lectures in mathematics — that is in Linear Algebra II— namely, the properties of (finite dimensional) real and complex vector spaces. Finally now, we are talking about geometry. That is, about vector spaces which have a distance function. (The word “geometry” obviously has to do with the measurement of physical distances on the earth.)

So let V be some finite dimensional vector space over ℜ, or ℂ. Let vV be some vector in V. Then, since V ≅ ℜn, or ℂn, we can write v = ∑ ajej, (with j ranging from 1, ... , n) where {e1, . . . , en} is the canonical basis for ℜn or ℂn and aj ∈ ℜ or ℂ, respectively, for all j. Then the length of v is defined to be the non-negative real number

||v|| = √|a1|2+ · · · + |an|2 

Of course, as these things always are, we will not simply confine ourselves to measurements of normal physical things on the earth. We have already seen that the idea of a complex vector space defies our normal powers of geometric visualization. Also, we will not always restrict things to finite dimensional vector spaces. For example, spaces of functions — which are almost always infinite dimensional — are also very important in theoretical physics. Therefore, rather than saying that ||v|| is the “length” of the vector v, we use a new word, and we say that ||v|| is the norm of v. In order to define this concept in a way which is suitable for further developments, we will start with the idea of a scalar product of vectors.

Definition
Let F = ℜ or ℂ and let V, W be two vector spaces over F. A bilinear form is a mapping s : V × W → F satisfying the following conditions with respect to arbitrary elements v, v1 and v2V, w, w1 and w2W, and a ∈ F.
  1. s(v1 + v2, w) = s(v1, w) + s(v2, w), 
  2. s(av, w) = as(v, w), 
  3. s(v, w1 + w2) = s(v, w1) + s(v, w2) and 
  4. s(v, aw) = as(v, w). 

If V = W, then we say that a bilinear form s : V × V → F is symmetric, if we always have s(v1, v2) = s(v2, v1). Also the form is called positive definite if s(v, v) > 0 for all v ≠ 0.

On the other hand, if F = ℂ and f : VW is such that we always have
  1. f(v1 + v2) = f(v1) + f(v2) and 
  2. f(av) = af(v)
Then f is a semi-linear (not a linear) mapping. (Note: if F = ℜ then semi-linear is the same as linear.)

A mapping s : V × W → F such that
  1. The mapping given by s(·, w) : V → F, where v → s(v, w) is semi-linear for all wW, whereas 
  2. The mapping given by s(v, ·) : W → F, where w → s(v, w) is linear for all vV 
is called a sesqui-linear form.

In the case V = W, we say that the sesqui-linear form is Hermitian (or Euclidean, if we only have F = ℜ), if we always have s(v1, v2) = s(v2, v1). (Therefore, if F = ℜ, an Hermitian form is symmetric.)

Finally, a scalar product is a positive definite Hermitian form s : V × V → F. Normally, one writes (v1, v2), rather than s(v1, v2).

Well, these are a lot of new words. To be more concrete, we have the inner products, which are examples of scalar products.


Inner products
Linear Algebra: #17 Scalar Products, Norms, etc. equation pic 1
Thus, we are considering these vectors as column vectors, defined with respect to the canonical basis of ℂn. Then define (using matrix multiplication)
Linear Algebra: #17 Scalar Products, Norms, etc. equation pic 2

It is easy to check that this gives a scalar product on ℂn. This particular scalar product is called the inner product.

Remark
One often writes u · v for the inner product. Thus, considering it to be a scalar product, we just have u · v = <u, v>.

This inner product notation is often used in classical physics; in particular in Maxwell’s equations. Maxwell’s equations also involve the “vector product” u × v. However the vector product of classical physics only makes sense in 3-dimensional space. Most physicists today prefer to imagine that physical space has 10, or even more — perhaps even a frothy, undefinable number of — dimensions. Therefore it appears to be the case that the vector product might have gone out of fashion in contemporary physics. Indeed, mathematicians can imagine many other possible vector-space structures as well. Thus I shall dismiss the vector product from further discussion here.

Definition
A real vector space (that is, over the field of the real numbers ℜ), together with a scalar product is called a Euclidean vector space. A complex vector space with scalar product is called a unitary vector space.

Now, the basic reason for making all these definitions is that we want to define the length — that is the norm — of the vectors in V. Given a scalar product, then the norm of vV — with respect to this scalar product — is the non-negative real number

||v|| = √<v, v> .

More generally, one defines a norm-function on a vector space in the following way.

Definition
Let V be a vector space over ℂ (and thus we automatically also include the case ℜ ⊂ ℂ as well). A function || · || : V → ℜ is called a norm on V if it satisfies the following conditions.
  1. ||av|| = |a| ||v|| for all vV and for all a ∈ ℂ, 
  2. ||v1 + v2|| ≤ ||v1|| + ||v2|| for all vV (the triangle inequality), and 
  3. ||v|| = 0 ⇔ v = 0.


Theorem 46 (Cauchy-Schwarz inequality)
Let V be a Euclidean or a unitary vector space, and let ||v|| = √<v, v> for all vV. Then we have

|<u, v>| ≤ ||u|| · ||v||

for all u and vV. Furthermore, the equality |<u, v>| ≤ ||u|| · ||v|| holds if, and only if, the set {u, v} is linearly dependent.

Proof
It suffices to show that |<u, v>|2 ≤ <u, u><v, v>. Now, if v = 0, then — using the properties of the scalar product — we have both <u, v> = 0 and <v, v> = 0. Therefore the theorem is true in this case, and we may assume that v0. Thus <v, v> > 0. Let

Linear Algebra: #17 Scalar Products, Norms, etc. equation pic 3

which gives the Cauchy-Schwarz inequality. When do we have equality?

If v = 0 then, as we have already seen, the equality |<u, v>| ≤ ||u|| · ||v|| is trivially true. On the other hand, when v0, then equality holds when <u − av, u − av> = 0. But since the scalar product is positive definite, this holds when u − av = 0. So in this case as well, {u, v} is linearly dependent.


Theorem 47
Let V be a vector space with scalar product, and define the non-negative function || · || : V → ℜ by  ||v|| = √<v, v> . Then || · || is a norm function on V.

Proof
The first and third properties in our definition of norms are obviously satisfied. As far as the triangle inequality is concerned, begin by observing that for arbitrary complex numbers z = x + yi ∈ ℂ we have

Linear Algebra: #17 Scalar Products, Norms, etc. equation pic 4

Tuesday, June 17, 2014

Linear Algebra: #16 Complex Numbers

  • Linear Algebra: #16 Complex Numbers


On the other hand, looking at the characteristic polynomial, namely x2 − 2x cos θ +1 in the previous example, we see that in the case θ = ±π this reduces to x2 + 1. And in the realm of the complex numbers ℂ, this equation does have zeros, namely ±i. Therefore we have the seemingly bizarre situation that a “complex” rotation through a quarter of a circle has vectors which are mapped back onto themselves (multiplied by plus or minus the “imaginary” number i). But there is no need for panic here! We need not follow the example of numerous famous physicists of the past, declaring the physical world to be “paradoxical”, “beyond human understanding”, etc. No. What we have here is a purely algebraic result using the abstract mathematical construction of the complex numbers which, in this form, has nothing to do with rotations of real physical space!

So let us forget physical intuition and simply enjoy thinking about the artificial mathematical game of extending the system of real numbers to the complex numbers. I assume that you all know that the set of complex numbers ℂ can be thought of as being the set of numbers of the form x + yi, where x and y are elements of the real numbers ℜ and i is an abstract symbol, introduced as a “solution” to the equation x2 + 1 = 0. Thus i2  = −1. Furthermore, the set of numbers of the form x + 0 · i can be identified simply with x, and so we have an embedding ℜ ⊂ ℂ. The rules of addition and multiplication in ℂ are

Linear Algebra: #16 Complex Numbers equation pic 1

Let z = x +yi be some complex number. Then the absolute value of z is defined to be the (non-negative) real number |z| = √(x2 + y2). The complex conjugate of z is z = x − yi. Therefore |z| = √(zz).

It is a simple exercise to show that ℂ is a field. The main result — called (in German) the Hauptsatz der Algebra — is that ℂ is an algebraically closed field. That is, let ℂ[z] be the set of all polynomials with complex numbers as coefficients. Thus, for P(z) ∈ ℂ[z] we can write P(z) = cnzn + cn−1zn−1 + · · · + c1z + c0, where cj ∈ ℂ, for all j = 0, . . . , n. Then we have:


Theorem 44 (Hauptsatz der Algebra)
Let P(z) ∈ ℂ[z] be an arbitrary polynomial with complex coefficients. Then P has a zero in ℂ. That is, there exists some λ ∈ ℂ with P(λ) = 0.

The theory of complex numbers (Funktionentheorie in German) is an extremely interesting and pleasant subject. Complex analysis is quite different from the real analysis.


Theorem 45
Every complex polynomial can be completely factored into linear factors. That is, for each P(z) ∈ ℂ[z] of degree n, there exist n complex numbers (perhaps not all different) λ1, . . . , λn, and a further complex number c, such that
P(z) = c(λ1 − z) · · · (λn − z).


Proof
Given P(z), theorem 44 tells us that there exists some λ1 ∈ ℂ, such that P(λ1) = 0. Let us therefore divide the polynomial P(z) by the polynomial (λ1− z). We obtain

P(z) =  (λ1− z) · Q(z) + R(z), 

where both Q(z) and R(z) are polynomials in ℂ[z]. However, the degree of R(z) is less than the degree of the divisor, namely  (λ1− z), which is 1. That is, R(z) must be a polynomial of degree zero, i.e. R(z) = r ∈ ℂ, a constant. But what is r? If we put λ1 into our equation, we obtain

0 = P((λ1) = (λ1 − λ1)Q(z) + r = 0 + r.

Therefore r = 0, and so

P(z) = (λ1− z)Q(z), 

where Q(z) must be a polynomial of degree n−1. Therefore we apply our argument in turn to Q(z), again reducing the degree, and in the end, we obtain our factorization into linear factors.

So the consequence is: let V be a vector space over the field of complex numbers ℂ. Then every linear mapping f : VV has at least one eigenvalue, and thus at least one eigenvector.

Monday, June 16, 2014

Linear Algebra: #15 Why is the Determinant Important?

  • Linear Algebra: #15 Why is the Determinant Important?

I am sure there are many points which could be advanced in answer to this question. But here I will concentrate on only two special points.
  • The transformation formula for integrals in higher-dimensional spaces.
    This is a theorem which is usually dealt with in the Analysis III lecture. Let G ⊂ ℜn be some open region, and let f : G → ℜ be a continuous function. Then the integral
    Linear Algebra: #15 Why is the Determinant Important? equation pic 1
    has some particular value (assuming, of course, that the integral converges). Now assume that we have a continuously differentiable injective mapping φ : G → ℜn and a continuous function F : φ(G) → ℜ. Then we have the formula
    Linear Algebra: #15 Why is the Determinant Important? equation pic 2
    Here, D(φ(x)) is the Jacobi matrix of φ at the point x.

    This formula reflects the geometric idea that the determinant measures the change of the volume of n-dimensional space under the mapping φ.

    If φ is a linear mapping, then take Q ⊂ ℜn to be the unit cube: Q = {(x1, . . . , xn) : 0 ≤ xi ≤ 1, ∀i}. Then the volume of Q, which we can denote by vol(Q) is simply 1. On the other hand, we have vol(φ(Q)) = det(A), where A is the matrix representing φ with respect to the canonical coordinates for ℜn. (A negative determinant — giving a negative volume — represents an orientation-reversing mapping.)

  • The characteristic polynomial.
    Let f : V → V be a linear mapping, and let v be an eigenvector of f with f(v) = λv. That means that (f − λIn)(v) = 0; therefore the mapping (f  − λIn) : V → V is singular. Now consider the matrix A, representing f with respect to some particular basis of V. Since λIn is the matrix representing the mapping λIn, we must have that the difference A − λIn is a singular matrix. In particular, we have det(A − λIn) = 0.

    Another way of looking at this is to take a “variable” x, and then calculate (for example, using the Leibniz formula) the polynomial in x

    P(x) = det(A − xIn). 

    This polynomial is called the characteristic polynomial for the matrix A. Therefore we have the theorem:


    Theorem 41
    The zeros of the characteristic polynomial of A are the eigenvalues of the linear mapping f : VV which A represents.

    Obviously the degree of the polynomial is n for an n × n matrix A. So let us write the characteristic polynomial in the standard form

    P(x) = cnxn + cn−1xn−1 + · · · + c1x + c0

    The coefficients c0, . . . , cn are all elements of our field F.

    Now the matrix A represents the mapping f with respect to a particular choice of basis for the vector space V. With respect to some other basis, f is represented by some other matrix A', which is similar to A. That is, there exists some C ∈ GL(n, F) with A' = C−1AC. But we have

    Linear Algebra: #15 Why is the Determinant Important? equation pic 3
    Therefore we have:


    Theorem 42
    The characteristic polynomial is invariant under a change of basis; that is, under a similarity transformation of the matrix.

    In particular, each of the coefficients ci of the characteristic polynomial P(x) = cnxn + cn−1xn−1 + · · · + c1x + c0 remains unchanged after a similarity transformation of the matrix A.

    What is the coefficient cn? Looking at the Leibniz formula, we see that the term xn can only occur in the product

    (a11 − x)(a22 − x) · · · (ann − x) = (−1)xn − (a11 + a22 + · · · + ann)xn−1 + · · · . 

    Therefore cn = 1 if n is even, and cn = −1 if n is odd. This is not particularly interesting.

    So let us go one term lower and look at the coefficient cn−1. Where does xn−1 occur in the Leibniz formula? Well, as we have just seen, there certainly is the term

    (−1)n−1(a11 + a22 + · · · + ann)xn−1

    which comes from the product of the diagonal elements in the matrix A − xIn. Do any other terms also involve the power xn−1? Let us look at Leibniz formula more carefully in this situation. We have

    Linear Algebra: #15 Why is the Determinant Important? equation pic 4

    Here, δij = 1 if i = j. Otherwise, δij = 0. Now if σ is a non-trivial permutation — not just the identity mapping — then obviously we must have two different numbers i1 and i2, with σ(i1) ≠ i1 and also σ(i2) ≠ i2. Therefore we see that these further terms in the sum can only contribute at most n − 2 powers of x. So we conclude that the (n − 1)-st coefficient is

    cn−1 + = (−1)n−1(a11 + a22 + · · · + ann).

    Definition
    Linear Algebra: #15 Why is the Determinant Important? equation pic 5

    Let A be an n × n matrix. The trace of A (in German, the spur of A) is the sum of the diagonal elements:

    tr(A) = a11 + a22 + · · · + ann


    Theorem 43
    tr(A) remains unchanged under a similarity transformation.

An example
Let f : ℜ2 → ℜ2 be a rotation through the angle θ. Then, with respect to the canonical basis of ℜ2, the matrix of f is
Linear Algebra: #15 Why is the Determinant Important? equation pic 6
That is to say, if λ ∈ ℜ is an eigenvalue of f, then λ must be a zero of the characteristic polynomial. That is,

 λ2 − 2λ cos θ + 1 = 0. 

But, looking at the well-known formula for the roots of quadratic polynomials, we see that such a λ can only exist if |cos θ| = 1. That is, θ = 0 or π. This reflects the obvious geometric fact that a rotation through any angle other than 0 or π rotates any vector away from its original axis. In any case, the two possible values of θ give the two possible eigenvalues for f, namely +1 and −1.

Sunday, June 15, 2014

Linear Algebra: #14 Leibniz Formula

  • Linear Algebra: #14 Leibniz Formula

Definition
A permutation of the numbers {1, . . . , n} is a bijection
σ : {1, . . . , n} → {1, . . . , n}. 

The set of all permutations of the numbers {1, . . . , n} is denoted Sn. In fact, Sn is a group: the symmetric group of order n. Given a permutation σ ∈ S , we will say that a pair of numbers (i, j), with i, j ∈ {1, . . . , n} is a “reversed pair” if i < j, yet σ(i) > σ(j). Let s(σ) be the total number of reversed pairs in σ. Then the sign of sigma is defined to be the number
sign(σ) = (−1)s(σ)


Theorem 37 (Leibniz)
Let the elements in the matrix A be aij , for i, j between 1 and n. Then we have

Linear Algebra: #14 Leibniz Formula equation pic 1

As a consequence of this formula, the following theorems can be proved:


Theorem 38
Let A be a diagonal matrix
Linear Algebra: #14 Leibniz Formula equation pic 2

Then det(A) = λ1 λ2 · · · λn .


Theorem 39
Let A be a triangular matrix
Linear Algebra: #14 Leibniz Formula equation pic 3
Then det(A) = a11a22 · · · ann .

Leibniz formula also gives:

Definition
Let A ∈ M(n × n, F). The transpose At of A is the matrix consisting of elements aijt  such that for all i and j we have aijt = aji, where aji are the elements of the original matrix A.


Theorem 40
det(At) = det(A).


14.1 Special rules for 2 × 2 and 3 × 3 matrices
Linear Algebra: #14 Leibniz Formula equation pic 4
For the 2 × 2 matrix, the Leibniz formula reduces to the simple formula

det(A) =  a11a22 - a12a21 

For the 3 × 3 matrix, the formula is a little more complicated.

det(A) =  a11a22a33 + a12a23a33 + a13a21a32 - a11a23a32 - a12a21a33 - a11a23a32


14.2 A proof of Leibniz Formula 
Let the rows of the n × n identity matrix be ε1 , . . . ,εn. Thus

ε1 = (1 0 0 · · · 0), ε2 = (0 1 0 · · · 0), . . . , εn = (0 0 0 · · · 1). 

Therefore, given that the i-th row in a matrix is
ξi = (ai1ai2· · · ain),

 then we have
Linear Algebra: #14 Leibniz Formula equation pic 5
So let the matrix A be represented by its rows,
Linear Algebra: #14 Leibniz Formula equation pic 6

It was an exercise to show that the determinant function is additive. That is, if B and C are n × n matrices, then we have det(B + C) = det(B) + det(C). Therefore we can write

Linear Algebra: #14 Leibniz Formula equation pic 7
To begin with, observe that if εjk = εjl for some jk ≠ jl, then two rows are identical, and therefore the determinant is zero. Thus we need only the sum over all possible permutations (j1, j2, . . . , jn) of the numbers (1, 2, . . . , n). Then, given such a permutation, we have the matrix
Linear Algebra: #14 Leibniz Formula equation pic 8

This can be transformed back into the identity matrix
Linear Algebra: #14 Leibniz Formula equation pic 9
by means of successively exchanging pairs of rows.

Each time this is done, the determinant changes sign (from +1 to -1, or from -1 to +1). Finally, of course, we know that the determinant of the identity matrix is 1.

Therefore we obtain the Leibniz formula
Linear Algebra: #14 Leibniz Formula equation pic 10

Saturday, June 14, 2014

Linear Algebra: #13 Determinant

  • Linear Algebra: #13 Determinant

Let M(n × n, F) be the set of all n × n matrices of elements of the field F.

Definition
A mapping det : M(n × n, F) → F is called a determinant function if it satisfies the following three conditions.

  1. det(In) = 1, where In is the identity matrix. 

  2. If A ∈ M(n×n, F) is changed to the matrix A' by multiplying all the elements in a single row with the scalar a ∈ F, then det(A') = a · det(A). (This is our row operation Si(a).) 

  3. If A' is obtained from A by adding one row to a different row, then det(A') = det(A). (This is our row operation Sij(1).) 

Simple consequences of this definition
Let A ∈ M(n × n, F) be an arbitrary n × n matrix, and let us say that A is transformed into the new matrix A' by an elementary row operation. Then we have:
  • If A' is obtained by multiplying row i by the scalar a ∈ F, then det(A') = a · det(A). This is completely obvious! It is just part of the definition of “determinants”.

  • Therefore, if A' is obtained from A by multiplying a row with −1 then we have det(A' ) = −det(A).

  • Also, it follows that a matrix containing a row consisting of zeros must have zero as its determinant. 

  • If A has two identical rows, then its determinant must also be zero. For can we multiply one of these rows with −1, then add it to the other row, obtaining a matrix with a zero row. 

  • If A ′ is obtained by exchanging rows i and j, then det(A') = −det(A). This is a bit more difficult to see. Let us say that A = (u1, . . . , ui, . . . , uj, . . . , un), where uk is the k-th row of the matrix, for each k. Then we can write
    Linear Algebra: #13 Determinant equation pic 1
    (This is the elementary row operation Sij.)

  • If A' is obtained from A by an elementary row operation of the form Sij(c), then det(A') = det(A). For we have: 
    Linear Algebra: #13 Determinant equation pic 2

Therefore we see that each elementary row operation has a well-defined effect on the determinant of the matrix. This gives us the following algorithm for calculating the determinant of an arbitrary matrix in M(n × n, F).

How to find the determinant of a matrix
Given: An arbitrary matrix A ∈ M(n × n, F).
Find: det(A).

Method:
  1. Using elementary row operations, transform A into a matrix in step form, keeping track of the changes in the determinant at each stage. 

  2. If the bottom line of the matrix we obtain only consists of zeros, then the determinant is zero, and thus the determinant of the original matrix was zero. 

  3. Otherwise, the matrix has been transformed into an upper triangular matrix, all of whose diagonal elements are 1. But now we can transform this matrix into the identity matrix In( by elementary row operations of the type Sij(c). Since we know that det(In) must be 1, we then find a unique value for the determinant of the original matrix A. In particular, in this case det(A) ≠ 0.
Note that in both this algorithm, as well as in the algorithm for finding the inverse of a regular matrix, the method of Gaussian elimination was used. Thus we can combine both ideas into a single algorithm, suitable for practical calculations in a computer, which yields both the matrix inverse (if it exists), and the determinant. This algorithm also proves the following theorem.


Theorem 34
There is only one determinant function and it is uniquely given by our algorithm. Furthermore, a matrix A ∈ M(n × n, F) is regular if and only if det(A) ≠ 0.

In particular, using these methods it is easy to see that the following theorem is true.


Theorem 35
Let A, B ∈ M(n×n, F). Then we have det(A· B) = det(A) · det(B).

Proof
If either A or B is singular, then A · B is singular. This can be seen by thinking about the linear mappings V → V which A and B represent. At least one of these mappings is singular. Thus the dimension of the image is less than n, so the dimension of the image of the composition of the two mappings must also be less than n. Therefore A · B must be singular. That means, on the one hand, that det(A · B) = 0. And on the other hand, that either det(A) = 0 or else det(B) = 0. Either way, the theorem is true in this case.

If both A and B are regular, then they are both in GL(n, F). Therefore, as we have seen, they can be written as products of elementary matrices. It suffices then to prove that det(S1)det(S2) = det(S1S2), where S1 and S2 are elementary matrices. But our arguments above show that this is, indeed, true.

Remembering that A is regular if and only if A ∈ GL(n, F), we have:

Corollary
If A ∈ GL(n, F) then det(A−1) = (det(A))−1

In particular, if det(A) = 1 then we also have det(A−1) = 1. The set of all such matrices must then form a group.

Another simple corollary is the following.

Corollary
Assume that the matrix A is in block form, so that the linear mapping which it represents splits into a direct sum of invariant subspaces (see theorem 29). Then det(A) is the product of the determinants of the blocks.

Proof
If
Linear Algebra: #13 Determinant equation pic 3

That is, for the matrix Ai*, all the blocks except the i-th block are replaced with identity-matrix blocks. Then A = A1* · · · Ap*, and it is easy to see that det(Ai*) = det(Ai) for each i.

Definition
The special linear group of order n is defined to be the set
 SL(n, F) = {A ∈ GL(n, F) : det(A) = 1}. 


Theorem 36
Let A' = C−1AC. Then det(A ′ ) = det(A).

Proof
This follows, since det(C−1) = (det(C))−1.
Currently Viewing: Physics Reference | High