Linear Algebra

Linear algebra has a deceptively simple premise: the study of linear systems. However, this gives rise to many interesting, unexpected, and extremely useful results. Let’s take a look!

The goal of this post is to touch upon introductory concepts, rather than memorizing or deriving equations. Please see my references at the end for more advanced topics, as well as formulas for computations.

Vectors & Matrices

According to physics, a vector is something with magnitude and direction. In linear algebra, it is commonly represented in two ways. The first is a column of numbers. The second is as an arrow, with its tail at the origin and its head at the coordinate described by the column of numbers. For example:

Vector [2;3], represented in two different ways

Vectors typically have 1 column, and have the same number of rows as dimensions. The example above is a 2×1 vector, which means it has 2 rows and 1 column, so it shows a vector in two dimensions. A matrix is similar, but typically has more columns, such as 2×2.

For this blog post, we’ll focus on two dimensional vectors and matrices since they are easiest to plot and understand. Also note that vectors in the text of this post will be written in Matlab notation.

Matrix Multiplication

Let’s take a look at the example above again. The vector [2;3] has its tail at the origin, and its head at coordinates (2,3). How did we get the coordinates for the head?

A unit vector is a vector with length 1. The +x unit vector points in the positive x-axis direction by 1 unit, or [1;0]. Likewise, the +y unit vector points in the positive y-axis direction by 1 unit, or [0;1]. These unit vectors are the building blocks for all other vectors. In our case, the 2 in [2;3] means go two times the +x unit vector, and the 3 means go three times the +y unit vector. Numerically, we get 2*[1;0], and 3*[0;1]. Combining the two gives 2*[1;0]+3*[0;1]=[2;3]. The result is shown below, graphically:

[2;3] vector, drawn with +x and +y unit vectors

Let’s recap. For [2;3], we take 2 times a vector, [1;0], and take 3 times another vector, [0;1], and add them together.

But why use the +x and +y unit vectors? Why use [1;0] and [0;1], instead of something else? Matrix multiplication allows us to do just that:

Matrix multiplication in 2 dimensions

Above is the equation for multiplying a vector by a matrix in 2 dimensions. The vector has components v1 and v2, and the matrix has entries a, b, c, and d. The equation demonstrates what we said previously; v1 scales a vector, as does v2. These two scaled vectors are then added together, resulting in the final vector. Let’s plug [2;3], +x unit vector, and +y unit vector into our equation:

Multiplication equation applied to previous example

Now we return to my previous question. What if we don’t want to use [1;0] and [0;1]? In that case, we change the entries in the matrix! Let’s try an arbitrary example:

Here, instead of 2 times +x unit vector, and 3 times +y unit vector, we get 2 * [2;2] and 3 * [1;3]. Adding them together, we get the final vector [7;13].

[7;13], composed of [2;2] and [1;3]

[7;13] vector is shown above in black. There are two [2;2] vectors shown in red, and three [1;3] vectors shown in blue. The occurrence of each vector corresponds to the entries in the initial vectors, [2;3].

Eigenvalue & Eigenvector

Matrix multiplication often causes vectors to point in different directions. In the previous example, [2;3] became [7;13], which is a decrease in slope, which means the vector has changed orientation. But there are some vectors that do not change orientation upon multiplication. This is mathematically described as: Av = λv, where A is a matrix, v is a vector, and λ is a scalar. This equation means that for a given matrix A, there may be a vector v that, upon multiplication, is the same as the original vector, just multiplied by a scalar. v is called an eigenvector of A, and λ is its eigenvalue.

Let’s look at the matrix from the previous example, and see what happens when we multiply it by its eigvenvectors:

In the first example, the final vector is the same as the initial vector: [1;-1]. This means the eigenvalue for that eigenvector is 1. In the second example, the final vector is 4 times the initial vector, so the eigenvalue is 4.

Let’s look at another matrix:

Matrix to examine

This matrix has elements 5/6 and 1/6, but the 1/6 has been factored out.

Let’s take a look at how this matrix transforms a grid:

Grid, before and after transformation

On the left is a grid, which shows that the points are evenly spaced out. The origin is a circle, the +x axis is blue, and the +y axis is red.

The grid is transformed by multiplying each coordinate by the matrix. The new coordinate is then used for the new grid, shown on the right.

The new grid has some interesting properties. The +x axis has been rotated up, and the +y axis has been rotated to the right. The end result is that the first quadrant looks squeezed. The fourth quadrant, on the other hand, looks like it is stretched thin: the -y axis and the +x axis are further apart now.

The eigenvectors for this matrix are [1;1], with an eigenvalue of 1, and [1;-1], with an eigenvalue of 2/3. Let’s see how they transform (note that eigenvectors have been scaled up for easier viewing):

Grid and eigenvectors, before and after transformation

It looks like the +x axis and the +y axis are converging on the eigenvector in the first quadrant, [1;1]. Conversely, the +x axis and -y axis are diverging from the eigenvector in the third quadrant, [1;-1]. Both of these changes compress space into a single line. This phenomenon becomes apparent if we multiply the grid by the matrix multiple times:

Transforming grid repeatedly
Grid converges to [1;1] line

Eigenvectors and eigenvalues are powerful tools to understand a matrix because they tell us how the matrix affects space. In this example, we can see that things on the [1;1] space are unchanged, and things on [1;-1] shrink; the end result is that space is compressed onto a single line.

The above understanding, where the space converges to [1;1], is very useful for understanding Markov chains:

Simple Markov chain

Imagine you have two nodes, v1 and v2. Each node starts off with a random amount of (say) chips, for a total of 100 chips. Every time increment, each node gives away 1/6th of its total chips, and keeps the rest. This is shown in the diagram above. After an infinite amount of time, how many chips will each node have?

For A, element in row i, column j means proportion of chips going to node i from node j. For example, elements in first row are proportion of chips going to v1 from v1 (first column) and to v1 from v2 (second column)

As it turns out, this is an eigenvector/eigenvalue problem. The system at a given time can be described by a vector v = [v1;v2], where v1 and v2 are how many chips those nodes have. The proportion of chips given away or kept can be described by matrix A, which is the same matrix we used in the previous example. Then, to calculate how many chips each node has after one time increment, we multiply A and v. Since we want a large amount of time to pass, we multiply A and v together repeatedly.

We already saw that multiplying by A repeatedly causes v to converge to [1;1], regardless of what v actually is. In this case, since v=[v1; v2]=[1;1], that means over a long enough time, v1 = v2. In other words, each node will end up with 50 chips, regardless of starting condition!

Distribution of chips with 3 different starting conditions

Solving differential equations involving matrices and vectors can also be an eigenvector problem:

Eigenvectors and eigenvalues are supremely important for describing how a matrix behaves, and how it transforms vectors. To understand linear algebra, understanding eigenvectors and eigenvalues is mandatory.

Change of Basis

When describing a vector or coordinate, we almost always use the x axis and y axis, using [1;0] and [0;1] as a basis for making more complicated vectors. However, it is sometimes necessary to describe a vector from another perspective; this is called a change of basis.

Vector. In standard basis, it is [6;12]

Examine the vector above. Using the standard basis, it consists of 6 times [1;0] and 12 times [0;1], so we would describe the vector as [6;12].

Instead of using [1;0] and [0;1], let’s try [1;1] and [1;-1]. By inspection, we see that to construct [6;12], we need 9 times [1;1] and -3 times [1;-1]. Using the new basis of [1;1] and [1;-1], the vector is [9;-3]:

Vector in new basis. Note [1;-1] is drawn backards due to negative scalar (-3)

If the vector or basis is more complicated, then inspection won’t work, so let’s take a look at the procedure for calculating the new coordinates:

C converts from standard basis to the new basis. It’s inverse, C^-1, converts the new basis back to standard basis. Let’s apply the equation above:

One instance where change of basis may be useful is changing perspective. Say a chef has a fridge with mangos and peach. The chef also makes peach smoothie (2 mangos and 3 peach) and mango juice (3 mangos and 1 peach). In order to calculate how many smoothies and juice you can make with given amount of each fruit, or how many fruits it takes to make a given amount of smoothies and juice, a change of basis is necessary.

Another example of change in perspective would be someone on the ground vs someone on the plane. The person on the ground facing North might say “I see something 30 degrees to my right, 3 km away”. The plane might be going West; in order to calculate where the spotted object is, they must convert the person on the ground’s basis to the plane’s basis.

Another instance where change of basis is useful is for simplifying computations, which we’ll see in the next section.

Diagonalization

Let’s return to our favorite matrix:

Try to compute A^5. That is, multiply A by itself 5 times. It’s very time consuming if you try to do it out by hand. Assuming a computer isn’t available, is there a better way to do this?

It’s too bad A isn’t simpler. For instance, if A merely doubled the x coordinate of a vector, and halved the y coordinate, then computing A^5 would be trivial…

But wait a second! Eigenvectors have the property of only scaling when transformed! Could we use that to our advantage? Let’s say we have a vector, v, and we can decompose it into scaled eigenvectors v1 and v2, which have eigenvalues λ1 and λ2. What would the transformation of v look like?

That means if we know A‘s eigenvectors and eigenvalues, then we can easily compute A^5, and what it does to a vector.

The next question is how do we compute the scaling values for the eigenvectors; that is, how do we determine a and b? This is actually a change of basis problem! Instead of describing v using a standard basis, you can describe it using a new basis, in this case the eigenvectors of A!

Let’s formalize our process:

We have matrix C, which is used to change basis from standard to eigenvectors, and matrix D, which is used to scale the eigenvectors.

The third to last line, and the last line, are the main takeaways. Av = (CDC^-1)v means that transforming v by A is the same as decomposing v to eigenvectors, scaling them by eigenvalues, then converting it back to standard basis. Let’s look at an example below, graphing each step:

Equations for our example
Vector, [6;12]
Vector with change of basis using eigenvectors. Now it is [9;-3]
Eigenvectors scaled by eigenvalues; original vectors shown as dotted
9*1=9
-3*2/3=-2
Transformed vector, restored to standard basis

Now let’s examine A^n = C(D^n)C^-1. Computing A^5 is a lot of work, since it takes a lot of computations. However, D^5 is very simple. D is a diagonal matrix, which means multiplying D by itself results in each eigenvalue multiplying itself as well:

The equation for D^n explains why everything converges to the [1;1] line when a vector is multiplied by A repeatedly. In our case, λ1 is equal to 1, so λ1^n = 1 as n increases. However, since λ2 = 2/3, λ2^n gets smaller as n increases. In other words, when vectors are decomposed into eigenvectors [1;1] and [1;-1], the [1;1] component remains the same (multiplied by 1), while [1;-1] component shrinks (multiplied by value less than 1) during transformation.

Conclusion

We looked at how matrices transform space, and how we can use eigenvectors and their eigenvalues to understand the transformation. We also learned change of basis and diagonalization, useful tools to simplify problems, as well as further examine the nature of matrices.

References

Essence of Linear Algebra Youtube playlist by 3Blue1Brown

Interactive Linear Algebra by Dan Margalit, Joseph Rabinoff

Leave a comment

Design a site like this with WordPress.com
Get started