Systems of linear equations

👋 About the exercise session

What is Linear algebra

Let’s start with the word algebra, from Wikipedia:

Algebra is one of the broad areas of mathematics. Roughly speaking, algebra is the study of mathematical symbols and the rules for manipulating these symbols in formulas.

I assume that all of you are familiar with (in)equations and solving for one or more variables. So far nothing new right? Now, the word linear. You will hear it a lot throughout your studies but also in normal life. Let’s look at what Wikipedia has to say again:

Linearity is the property of a mathematical relationship (function) that can be graphically represented as a straight line. … Generalized for functions in more than one dimension, linearity means the property of a function of being compatible with addition and scaling…

So in practice, let me give you examples of some mathematical relationships:

Linear relationship: \(y = 2x + 1\), same thing expressed using function notation: \(f(x) = 2x + 1\)
Non-linear relationship: \(y = x^2 + 1\)

What is the difference between the two? The linear example meets the two most important criteria:

addition: we only add variable(s) with each other, therefore there can not be any thing like x^2, xz, …
scaling: we are allowed to scale the variables with some constants e.g. 2x

This is it. ☺️ So combining the two words together what is linear algebra? As Rasmus mentioned in his note, it is essentially a study of matrices. In this session, we will practice translating the linear equations to the matrix form. In the follow up lectures, we will then dive deep into the matrix properties.

Why is it useful (student perspective)

I believe Rasmus gave pretty good gist about what you can do with matrices in his note. To further motivate you, I can give you some concrete examples from the courses I have taken so far:

Machine learning: You have probably heard of the Neural Networks and perhaps now it can seem pretty fancy term, but in reality, you could say that from a high level, NN is just a series of matrix multiplications.
Introduction to Deep learning: in this course, we have covered the development of Natural language processing up until the most state of the art language models such as Bert. You can take a look in my notes about Bert where you can see the use of matrices quite extensively.
Network analysis: in this course, one of the lessons was even dedicated to linear algebra and its use is network analysis, you can check my notes from this lesson

📓 Notes

Recognize linear equation in n variables

First of all, it is important that you are able to tell linear equation from non-liner. I have already explained this in the intro section, so please refer there.

Number of solutions

There are three possible ways how the solution to the system of linear equations may look like after you reduce it as much as possible:

[consistent] exactly one solution: this means that for each variable you found corresponding value. Expected solution form: x = 1, y = 6, ...
[consistent] infinitely many solutions: this happens when you end up having more variables then equations. Expected solution form: see below
[inconsistent] no solution: this may happen if according to the one equation value of the given variable must be -1, and of the other 1. Expected solution form: The given system has no solution.

Solving a system of linear equations

The solution strategy introduced in the book has the following steps:

Use Gaussian elimination to reduce the equation into row echelon form
Use back substitution to solve the system given in the row echelon form

Let’s discuss this in more detail. The very first step is to always make sure that all variables are on the left and the constants on the right side of the equation. Now, what is row echelon form? Here is an example:

\[\begin{aligned} x + 2y = 1 \\ y = 3 \end{aligned}\]

You can see kind of a stair case form. Important thing is that the leading variable’s coefficient is always 1. And as we progress down the rows, we should have less and less variables. So how do we reach this form? There are three essential alowed operations:

You can interchange rows (useful when you have equations with unequal amount of variables)
You can multiply any row by some constant (useful when you for instance want certain leading varible to have coefficient equal to one)
You can multiply any row by a constant AND add it or subtract from any other row

If everything goes well, you will reach the row echelon form and then you can back substitute, so using my above example I back substitute 3 to the first row:

\[\begin{aligned} x + 2(3) = 1 \end{aligned}\]

Of course, as we know, it does not always go well and you may encounter some special situations. For instance:

\[\begin{aligned} x + 2y = 1 0 = -6 \end{aligned}\]

This is the case where there is no solution. Another situation may look like this:

\[\begin{aligned} x + 2y + z = 1 \\ y + 3z = 6 \end{aligned}\]

In this case, there is infinitely many solutions. Instead of back substitution, you separate x and y on the left site and the rest, including z on the right side. The resulting solution is then:

\[\begin{aligned} x &= -7z - 11 \\ y &= 3z + 6 \end{aligned}\]

But what about z? We turn it into a free variable and write:

\[\begin{aligned} x &= -7t - 11 \\ y &= 3t + 6 \\ z &= t \text{ where } t \in \mathbb{R} \end{aligned}\]

And this it!

Representing system of linear equations via matrix

At this point, you have been introduced to just part of the solution toolbox. This is because in practice, it is easier to work with systems of linear equations represented as matrices. So how do we do that?

First of all, you should make sure that all variables are on the left side and constants are on the right side of the equation. For instance:

\[\begin{aligned} x + 2y + z = 1 \\ y + 3z = 6 \\ 3y + z = 1 \end{aligned}\]

If it helps, you can add 0 for the missing variables (and 1 for variables with this coefficient) as follows:

\[\begin{aligned} 1x + 2y + 1z = 1 \\ 0x + 1y + 3z = 6 \\ 0x + 3y + 1z = 1 \end{aligned}\]

Notice that I have also aligned same variables into the same columns. We have the columns as well as rows, so if you picture the system just with the numbers, it should be now clear how to transfer into matrix. However, there are two options to do so. First, we can only obtain coefficient matrix:

\[\begin{bmatrix} 1 & 2 & 1 \\ 0 & 1 & 3 \\ 0 & 3 & 1 \end{bmatrix}\]

As the name suggests, you only focus on the left side of equations and ignore right side, i.e., constants. If you add the constants, then you have augmented matrix:

\[\left[\begin{array}{ccc|c} 1 & 2 & 1 & 1 \\ 0 & 1 & 3 & 6 \\ 0 & 3 & 1 & 1 \end{array}\right]\]

We will get back to when is useful either of the matrices. For now, it is important to know the above mentioned difference and stick to the correct notation. For solving the system of linear equations, augmented matrix is used.

Gauss-Jordan elimination and reduced row-echelon form

In practice, the process of reaching the row echelon form in matrix form is the same as how you would do it in a regular system of linear equations, i.e., use row equivalent operations. Recall that at this point we switched to back substitution to find the final solution. However, if we are using matrix form, this would be inpractical since we would have to translate the system back from matrix form to the equations form. Instead, it is better to reduce the matrix into reduced row-echelon form. How do you do this?

The exact same way as you have reached row echelon form, but now you are going from the bottom to the top such that you are left with a matrix where in each row leading 1 has zeros below and above it. This will perhaps make more sense when you see it being used in practice, so check the solution sheet for this week.

Applications: polynomial curve fitting

Let me first start by repeating the form of an n-th degree polynomial:

\[y = a_0 + a_1x + a_2x^2 + \dots a_nx^n\]

At first look this does not look linear, does it? It depends on the perspective. More specifically, we already know what are the values of x and y, so the variables in this case are actully the terms \(a_0\), \(a_1\) etc.

So in practice, you are given two information:

What kind of polynomial you are supposed to fit, i.e., what degree
And then you are given bunch of points in the form (x, y) that the polynomial line should go through

Therefore, to solve this problem, you apply the following strategy:

Write down the general form of the given polynomial. For instance, If I need to fit polynomial of degree 1, I write: \(y = a_0 + a_1x^1\)
You substitute each point in the equation from the first step, which should give you system of linear equations
Transfer the system to the matrix form
Now it is up to you which strategy to use: reduce to row echelon form and back substitute, or continue reducing to the reduced row-echelon form

Again, to see the practical example, see the solution sheet.

⛳️ Learning goals checklist

Congrats, you have made it through the first week! Here are most important learning objectives:

Solve systems of linear equations using row operations and back substitutions - we will practice this a lot in following weeks, so do not worry if this feels still quite new
Reduce a matrix to echelon form and reduced echelon form - again we will practice this in the following weeks, but it is important that you remember what are these forms
Construct the coefficient matrix and the augmented matrix for a given set of linear equations
Be able to fit polynomial to given set of points