Applications to linear models  Orthogonality and Least Squares
Applications to linear models
Lessons
Notes:
In this section, we will apply least square problems to economics.
Instead of finding the least squares solution of $Ax=b$, we will be finding it for $X\beta =y$ where
$X$→ design matrix
$\beta$→ parameter vector
$y$→ observation vector
LeastSquares Line
Suppose we are given data points, and we want to find a line that best fits the data points. Let the best fit line be the linear equation
$y=\beta_0 + \beta_1 x$
And let the data points be $(x_1,y_1 ),(x_2,y_2 ),\cdots,,(x_n,y_n )$. The graph should look something like this:
Our goal is to determine the parameters $\beta_0$ and $\beta_1$. Let's say that each data point is on the line. Then
This is a linear system which we can write this as:
Then the least squares solution to $X\beta=y$ will be $X^T X \beta =X^T y$.
General Linear Model
Since the data points are not actually on the line, then there are residual values. Those are also known as errors. So we introduce a vector called the residual vector $\epsilon$, where
$\epsilon = y  X\beta$
→$y = X\beta + \epsilon$
Our goal is to minimize the length of $\epsilon$ (the error), so that $X\beta$ is approximately equal to $y$. This means we are finding a leastsquares solution of $y=X\beta$ using $X^T X\beta=X^T y$.
LeastSquares of Other Curves
Let the data points be $(x_1,y_1 ),(x_2,y_2 ),\cdots,,(x_n,y_n )$ and we want to find the best fit using the function $y=\beta_0+\beta_1 x+\beta_2 x^2$, where $\beta_0,\beta_1,\beta_2$ are parameters. Technically we are using a best fit quadratic function instead of a line now.
Again, the data points don't actually lie on the function, so we add residue values $\epsilon_1,\epsilon_2,\cdots,\epsilon_n$ where
Since we are minimizing the length of $\epsilon$, then we can find the leastsquares solution $\beta$ using $X^T X\beta=X^T y$. This can also be applied to other functions.
Multiple Regression
Let the data points be $(u_1,v_1,y_1 ),(u_2,v_2,y_2 ),\cdots,,(u_n,v_n,y_n )$ and we want to use the best fit function $y=\beta_0+\beta_1 u+\beta_2 v$, where $\beta_0,\beta_1,\beta_2$ are parameters.
Again, the data points don't actually lie on the function, so we add residue values $\epsilon_1,\epsilon_2,\cdots,\epsilon_n$ where
Since we are minimizing the length of $\epsilon$, then we can find the leastsquares solution $\beta$ using $X^T X\beta=X^T y$. This can also be applied to other multivariable functions.
→$y = X\beta + \epsilon$

Intro Lesson
Applications to Linear Models Overview: