Eigen vectors and eigenvalues

👋 About the exercise session

Welcome to the 8th exercise, I hope you enjoyed the fall break! In this session, we will be talking about eigenvectors and eigenvalues as the title of the session suggests. We will first talk about what these are, and then how we can compute them. Finally, in the last part, we will focus on its application. Let’s get into it!

📓 Notes

🏷 Eigenvectors and eigenvalues

Definition

Recall that in the second lecture, we were talking about linear tranformations. More specifically, let \(A\) be a matrix representing given linear transformation, then to obtain the transformed vector \(b\) given some original vector \(x\), we write:

\[Ax = b\]

In simple words, we have some initial vector \(x\) and we map it to some other vector \(b\) using \(A\). It turns out that for some vectors \(x\), it is equivalent to scale them by some scalar \(\lambda\) to obtain the transformed vector \(b\), i.e., we can write:

\[\lambda x = b\]

In other words, you can replace the whole matrix \(A\) by a single scalar \(\lambda\). Therefore, you can write:

\[Ax = \lambda x\]

In this case, we call \(x\) eigenvector and \(\lambda\) an eigenvalue.

Eigenspace

A natural question arises, how many eigenvalues and eigenvectors can we have for given square matrix \(A\)?

Let’s consider a simple example:

let \(c\) be a scalar
\(x\) be an eigenvector with corresponding eigenvalue \(\lambda\) of some square matrix \(A\)

What eigenvalue will have vector \(cx\)? Let’s try to derive it:

\[A(c \mathbf{x})=c(A \mathbf{x})=c(\lambda \mathbf{x})=\lambda(c \mathbf{x})\]

Therefore we found that \(A(c \mathbf{x}) = \lambda(c \mathbf{x})\), i.e., vector \(cx\) has the same value as vector \(x\), which is \(\lambda\). And the same applies for sum:

\[A\left(\mathbf{x}_1+\mathbf{x}_2\right)=A \mathbf{x}_1+A \mathbf{x}_2=\lambda \mathbf{x}_1+\lambda \mathbf{x}_2=\lambda\left(\mathbf{x}_1+\mathbf{x}_2\right)\]

Therefore, visually, you can imagine your are in 2D and draw a line. Then all points (vectors) on this line will have the same eigenvalue.

So what is the implication? It means that for any given eigenvalue \(\lambda\), we can have infinitely many eigenvectors. We call the set of all eigenvectors corresponding to the given eigenvalue eigenspace. Eigenspace can be denoted as a subspace of a given vector space if we extend the given eigenspace by zero vector. To be crystal clear, I wrote we have to extend eigespace by zero vector because zero vector can not be eigenvector by defition. If we would write:

\[A0 = \lambda 0\]

Then this would mean we can have infinitely many eigenvalues for a single vector since any real value \(\lambda\) would satisfy the condition. If on the other hand, we write:

\[Ax = 0x\]

then this is possible, since we can have set of eigenvectors that are mapped to zero vector, therefore the eigenvalue must be \(0\).

Computation

Let’s say we are given some square matrix \(A\), how do we compute its eigenvalues and eigenvectors? Well, our starting point is the following equation:

\[Ax = \lambda x\]

If we now get all variables at one site, we obtain:

\[0 = \lambda x - Ax\]

Next, without a loss of generality, we can multiply the equation by indentity matrix:

\[0 = \lambda xI - AxI\]

We can then factor out \(x\):

\[(\lambda I - A)x = 0\]

To find the eigenvector(s) \(x\), \((\lambda I - A)\) must be non-invertible. Since if there would exist inverse, then \(x = 0\) which is not possible as described in the previous section. We know from the third lecture that matrix is not invertible if its determinant is zero. Therefore, we need to find \(\lambda\) such that:

\[det(\lambda I - A) = 0\]

After that, we can plug found values to \((\lambda I - A)x = 0\) and find \(x\) (corresponding eigenvector) In summary:

Form the characteristic equation \(det(\lambda I-A) = 0\). It will be a polynomial equation of degree \(n\) in the variable \(\lambda\).
Find the real roots of the characteristic equation. These are the eigenvalues of \(A\).
For each eigenvalue \(\lambda_i\), find the eigenvectors corresponding to \(\lambda_i\) by solving the homogeneous system \(\left(\lambda_i I-A\right) \mathbf{x}=\mathbf{0}\). This can require row reducing an \(n \times n\) matrix. The reduced row-echelon form must have at least one row of zeros.

Last but not the least, if the matrix is triangular, i.e., one of its halfs is completely filled with zeros, then we can skip the first two steps since the eigenvalues of given matrix are on the main diagonal.

🏷 Diagonalization

Motivation

Below, we will discuss a new concept called diagonalization. Before we dive into details, I would like you to understand why such concept is important. The simple answer is that there are many use cases, the most concrete I can give you within the scope of this course is that it enables you to compute \(A^kx\) more efficiently. In real world, this means less resources spent on computation, and as a result more money saved. 💵

Definition

If given square matrix \(A\) is diagonalizable, then we can write \(D = P^{-1}AP\) where \(D\) is a diagonal matrix. That’s it, there is nothing more to it. Of course the question how you find such \(P\), which is the subject of the following subsection.

Computation

I will split this section into two based on what type of matrix we are trying to diagonalize:

Non-symmetric matrix
Symmetric matrix (\(A = A^T\))

Non-symmetric matrix. In this case, we can followw a simple procedure for any given square \(n \times n\) matrix \(A\):

Find \(n\) linearly independent eigenvectors \(\mathbf{p}_1, \mathbf{p}_2, \ldots, \mathbf{p}_n\) for \(A\) (if possible) with corresponding eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_n\). If \(n\) linearly independent eigenvectors do not exist, then \(A\) is not diagonalizable.
Let \(P\) be the \(n \times n\) matrix whose columns consist of these eigenvectors. That is, \(P=\left[\begin{array}{llll}\mathbf{p}_1 & \mathbf{p}_2 & \ldots & \mathbf{p}_n\end{array}\right]\).
The diagonal matrix \(D=P^{-1} A P\) will have the eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_n\) on its main diagonal. Note that the order of the eigenvectors used to form \(P\) will determine the order in which the eigenvalues appear on the main diagonal of \(D\).

As we discussed in the previous section, to obtain eigenvectors, you first need to compute eigenvalues of \(A\). According to the theorem 7.6 in the book If an \(n \times n\) matrix has distinct eigenvalues, then the corresponding eigenvectors are linearly independent and \(A\) is diagonalizable.

Symmetric matrix If the given \(n \times n\) matrix \(A\) is symmetric, then it is diagonalizable AND we can use special method for diagonalization called orthogonal:

Find all eigenvalues of \(A\) and determine the multiplicity of each.
For each eigenvalue of multiplicity 1 , find a unit eigenvector. (Find any eigenvector and then normalize it.)
For each eigenvalue of multiplicity \(k \geq 2\), find a set of \(k\) linearly independent eigenvectors. (You know from Theorem \(7.7\) that this is possible.)
The results of Steps 2 and 3 produce an orthonormal set of \(n\) eigenvectors. Use these eigenvectors to form the columns of \(P\). The matrix \(P^{-1} A P=P^T A P=D\) will be diagonal. (The main diagonal entries of \(D\) are the eigenvalues of \(A\).)

The process is pretty similar to the one for the non-symmetric matrices, except:

in the step 2, you have to normalize given eigenvector. This means diving the given vector \(v\) by its size (\(\vert v \vert\))
in the last step, you do not have to compute inverse of \(P\), instead you can use its transpose

🏷 Application

Eigenvalues and Eigenvectors

There are many applications of eigenvalues and eigenvectors. In this lecture, we focused on how to use them for diagonalization of a given matrix. In the next lecture, we will see how the eigenvalues can be applied to the PageRank problem. You can also check my network analysis note where I discuss how linear algrebra can be used to compute useful attributes of the given network. These notes are based on Michele Coscia’s (ITU professor) free book on network analysis.

Diagonalization

You might be still wondering what is the application of diagonalization. I have prepared a small notebook example where I explain its use case in a detail. In a similar fashion, we also use it to solve problems similar to Fibonacci one which was introduced by Rasmus in his notes. You can check exercise 8 for an example how to solve this kind of problems.

⛳️ Learning goals checklist

What a week! There has been a lot of ground to cover, but very interesting stuff. Do not worry, if you do not get all the nitty gritty details, you should focus on mastering these skills:

Compute the eigenvalues and eigenvectors of a matrix using the characteristic polynomial
Diagonalise a matrix and test if a matrix can be diagonalised
Use eigenvectors to compute closed formulas for sequences given by iteration as in the Fibonacci example

See you next week!