Linear algebra has a deceptively simple premise: the study of linear systems. However, this gives rise to many interesting, unexpected, and extremely useful results. Let’s take a look!
The goal of this post is to touch upon introductory concepts, rather than memorizing or deriving equations. Please see my references at the end for more advanced topics, as well as formulas for computations.
Vectors & Matrices
According to physics, a vector is something with magnitude and direction. In linear algebra, it is commonly represented in two ways. The first is a column of numbers. The second is as an arrow, with its tail at the origin and its head at the coordinate described by the column of numbers. For example:

Vectors typically have 1 column, and have the same number of rows as dimensions. The example above is a 2×1 vector, which means it has 2 rows and 1 column, so it shows a vector in two dimensions. A matrix is similar, but typically has more columns, such as 2×2.
For this blog post, we’ll focus on two dimensional vectors and matrices since they are easiest to plot and understand. Also note that vectors in the text of this post will be written in Matlab notation.
Matrix Multiplication
Let’s take a look at the example above again. The vector [2;3] has its tail at the origin, and its head at coordinates (2,3). How did we get the coordinates for the head?
A unit vector is a vector with length 1. The +x unit vector points in the positive x-axis direction by 1 unit, or [1;0]. Likewise, the +y unit vector points in the positive y-axis direction by 1 unit, or [0;1]. These unit vectors are the building blocks for all other vectors. In our case, the 2 in [2;3] means go two times the +x unit vector, and the 3 means go three times the +y unit vector. Numerically, we get 2*[1;0], and 3*[0;1]. Combining the two gives 2*[1;0]+3*[0;1]=[2;3]. The result is shown below, graphically:

Let’s recap. For [2;3], we take 2 times a vector, [1;0], and take 3 times another vector, [0;1], and add them together.
But why use the +x and +y unit vectors? Why use [1;0] and [0;1], instead of something else? Matrix multiplication allows us to do just that:

Above is the equation for multiplying a vector by a matrix in 2 dimensions. The vector has components v1 and v2, and the matrix has entries a, b, c, and d. The equation demonstrates what we said previously; v1 scales a vector, as does v2. These two scaled vectors are then added together, resulting in the final vector. Let’s plug [2;3], +x unit vector, and +y unit vector into our equation:

Now we return to my previous question. What if we don’t want to use [1;0] and [0;1]? In that case, we change the entries in the matrix! Let’s try an arbitrary example:

Here, instead of 2 times +x unit vector, and 3 times +y unit vector, we get 2 * [2;2] and 3 * [1;3]. Adding them together, we get the final vector [7;13].

[7;13] vector is shown above in black. There are two [2;2] vectors shown in red, and three [1;3] vectors shown in blue. The occurrence of each vector corresponds to the entries in the initial vectors, [2;3].
Eigenvalue & Eigenvector
Matrix multiplication often causes vectors to point in different directions. In the previous example, [2;3] became [7;13], which is a decrease in slope, which means the vector has changed orientation. But there are some vectors that do not change orientation upon multiplication. This is mathematically described as: Av = λv, where A is a matrix, v is a vector, and λ is a scalar. This equation means that for a given matrix A, there may be a vector v that, upon multiplication, is the same as the original vector, just multiplied by a scalar. v is called an eigenvector of A, and λ is its eigenvalue.
Let’s look at the matrix from the previous example, and see what happens when we multiply it by its eigvenvectors:


In the first example, the final vector is the same as the initial vector: [1;-1]. This means the eigenvalue for that eigenvector is 1. In the second example, the final vector is 4 times the initial vector, so the eigenvalue is 4.
Let’s look at another matrix:

This matrix has elements 5/6 and 1/6, but the 1/6 has been factored out.
Let’s take a look at how this matrix transforms a grid:

On the left is a grid, which shows that the points are evenly spaced out. The origin is a circle, the +x axis is blue, and the +y axis is red.
The grid is transformed by multiplying each coordinate by the matrix. The new coordinate is then used for the new grid, shown on the right.
The new grid has some interesting properties. The +x axis has been rotated up, and the +y axis has been rotated to the right. The end result is that the first quadrant looks squeezed. The fourth quadrant, on the other hand, looks like it is stretched thin: the -y axis and the +x axis are further apart now.
The eigenvectors for this matrix are [1;1], with an eigenvalue of 1, and [1;-1], with an eigenvalue of 2/3. Let’s see how they transform (note that eigenvectors have been scaled up for easier viewing):

It looks like the +x axis and the +y axis are converging on the eigenvector in the first quadrant, [1;1]. Conversely, the +x axis and -y axis are diverging from the eigenvector in the third quadrant, [1;-1]. Both of these changes compress space into a single line. This phenomenon becomes apparent if we multiply the grid by the matrix multiple times:

Grid converges to [1;1] line
Eigenvectors and eigenvalues are powerful tools to understand a matrix because they tell us how the matrix affects space. In this example, we can see that things on the [1;1] space are unchanged, and things on [1;-1] shrink; the end result is that space is compressed onto a single line.
The above understanding, where the space converges to [1;1], is very useful for understanding Markov chains:

Imagine you have two nodes, v1 and v2. Each node starts off with a random amount of (say) chips, for a total of 100 chips. Every time increment, each node gives away 1/6th of its total chips, and keeps the rest. This is shown in the diagram above. After an infinite amount of time, how many chips will each node have?

As it turns out, this is an eigenvector/eigenvalue problem. The system at a given time can be described by a vector v = [v1;v2], where v1 and v2 are how many chips those nodes have. The proportion of chips given away or kept can be described by matrix A, which is the same matrix we used in the previous example. Then, to calculate how many chips each node has after one time increment, we multiply A and v. Since we want a large amount of time to pass, we multiply A and v together repeatedly.
We already saw that multiplying by A repeatedly causes v to converge to [1;1], regardless of what v actually is. In this case, since v=[v1; v2]=[1;1], that means over a long enough time, v1 = v2. In other words, each node will end up with 50 chips, regardless of starting condition!

Solving differential equations involving matrices and vectors can also be an eigenvector problem:

Eigenvectors and eigenvalues are supremely important for describing how a matrix behaves, and how it transforms vectors. To understand linear algebra, understanding eigenvectors and eigenvalues is mandatory.
Change of Basis
When describing a vector or coordinate, we almost always use the x axis and y axis, using [1;0] and [0;1] as a basis for making more complicated vectors. However, it is sometimes necessary to describe a vector from another perspective; this is called a change of basis.

Examine the vector above. Using the standard basis, it consists of 6 times [1;0] and 12 times [0;1], so we would describe the vector as [6;12].
Instead of using [1;0] and [0;1], let’s try [1;1] and [1;-1]. By inspection, we see that to construct [6;12], we need 9 times [1;1] and -3 times [1;-1]. Using the new basis of [1;1] and [1;-1], the vector is [9;-3]:

If the vector or basis is more complicated, then inspection won’t work, so let’s take a look at the procedure for calculating the new coordinates:

C converts from standard basis to the new basis. It’s inverse, C^-1, converts the new basis back to standard basis. Let’s apply the equation above:

One instance where change of basis may be useful is changing perspective. Say a chef has a fridge with mangos and peach. The chef also makes peach smoothie (2 mangos and 3 peach) and mango juice (3 mangos and 1 peach). In order to calculate how many smoothies and juice you can make with given amount of each fruit, or how many fruits it takes to make a given amount of smoothies and juice, a change of basis is necessary.
Another example of change in perspective would be someone on the ground vs someone on the plane. The person on the ground facing North might say “I see something 30 degrees to my right, 3 km away”. The plane might be going West; in order to calculate where the spotted object is, they must convert the person on the ground’s basis to the plane’s basis.
Another instance where change of basis is useful is for simplifying computations, which we’ll see in the next section.
Diagonalization
Let’s return to our favorite matrix:

Try to compute A^5. That is, multiply A by itself 5 times. It’s very time consuming if you try to do it out by hand. Assuming a computer isn’t available, is there a better way to do this?
It’s too bad A isn’t simpler. For instance, if A merely doubled the x coordinate of a vector, and halved the y coordinate, then computing A^5 would be trivial…
But wait a second! Eigenvectors have the property of only scaling when transformed! Could we use that to our advantage? Let’s say we have a vector, v, and we can decompose it into scaled eigenvectors v1 and v2, which have eigenvalues λ1 and λ2. What would the transformation of v look like?

That means if we know A‘s eigenvectors and eigenvalues, then we can easily compute A^5, and what it does to a vector.
The next question is how do we compute the scaling values for the eigenvectors; that is, how do we determine a and b? This is actually a change of basis problem! Instead of describing v using a standard basis, you can describe it using a new basis, in this case the eigenvectors of A!
Let’s formalize our process:

We have matrix C, which is used to change basis from standard to eigenvectors, and matrix D, which is used to scale the eigenvectors.
The third to last line, and the last line, are the main takeaways. Av = (CDC^-1)v means that transforming v by A is the same as decomposing v to eigenvectors, scaling them by eigenvalues, then converting it back to standard basis. Let’s look at an example below, graphing each step:




9*1=9
-3*2/3=-2

Now let’s examine A^n = C(D^n)C^-1. Computing A^5 is a lot of work, since it takes a lot of computations. However, D^5 is very simple. D is a diagonal matrix, which means multiplying D by itself results in each eigenvalue multiplying itself as well:

The equation for D^n explains why everything converges to the [1;1] line when a vector is multiplied by A repeatedly. In our case, λ1 is equal to 1, so λ1^n = 1 as n increases. However, since λ2 = 2/3, λ2^n gets smaller as n increases. In other words, when vectors are decomposed into eigenvectors [1;1] and [1;-1], the [1;1] component remains the same (multiplied by 1), while [1;-1] component shrinks (multiplied by value less than 1) during transformation.
Conclusion
We looked at how matrices transform space, and how we can use eigenvectors and their eigenvalues to understand the transformation. We also learned change of basis and diagonalization, useful tools to simplify problems, as well as further examine the nature of matrices.
References
Essence of Linear Algebra Youtube playlist by 3Blue1Brown
Interactive Linear Algebra by Dan Margalit, Joseph Rabinoff