Least-squares problem

Get the most by viewing this topic in your current grade. Pick your course now.

?
Intros
Lessons
  1. Least Squares Problem Overview:
  2. The Least Squares Solution
    Ax=bAx=b give no solution
    • Approximate closest solution x^\hat{x}
    • The least Squares Solution x^=(ATA)1ATb\hat{x} =(A^T A)^{-1} A^T b
    • Not always a unique solution
  3. The Least Squares Error
    • Finding the error of the solution x^\hat{x}
    • Use the formula bAx^\lVert b-A\hat{x}\rVert
  4. Alternative Calculation to Least-Squares Solutions
    • Orthogonal columns of AA Ax^=b^ A\hat{x} =\hat{b}
    • QR factorization A=QRA=QR Rx^=QTb R\hat{x} =Q^T b
?
Examples
Lessons
  1. Finding the Least Squares Solutions with ATAx^=ATbA^T A\hat{x} =A^T b
    Find a least-squares solution of Ax=bAx=b if
    Find a least-squares solution
    1. Describe all least-squares solutions of equation Ax=bAx=b if
      Describe all least-squares solutions of equation Ax=b
      1. Finding the Least Squares Error
        You are given that the least-squares solution of Ax=bAx=b is least-squares solution of Ax=b. Compute the least-square error if
        Find the Least Squares Error
        1. Finding the Least Squares Solutions with Alternative Ways
          Find the orthogonal projections of bb onto the columns of AA and find a least-squares solution of Ax=bAx=b.
          Finding the Least Squares Solutions with Alternative Ways
          1. Use the factorization A=QRA=QR to find the least-squares solution of Ax=bAx=b
            Use the factorization A=QR to find the least-squares solution of Ax=b
            Topic Notes
            ?

            Introduction to the Least-Squares Problem

            The least-squares problem is a fundamental concept in linear algebra applications with wide-ranging applications in data analysis and scientific computing. Our introduction video provides a comprehensive overview of this crucial topic, serving as an essential foundation for understanding more advanced mathematical concepts. The least-squares problem extends basic linear algebra applications principles, focusing on finding the best approximation to a system of linear equations that may not have an exact solution. It involves minimizing the sum of squared differences between observed and predicted values, typically represented as a matrix equation. This method is particularly useful when dealing with overdetermined systems, where there are more equations than unknowns. By mastering the least-squares problem, students gain valuable insights into data fitting, regression analysis, and optimization techniques. The concept's significance extends beyond mathematics, finding applications in fields such as engineering, physics, and economics, making it an indispensable tool for researchers and professionals alike.

            Understanding the Least-Squares Problem

            The least-squares problem is a fundamental concept in linear algebra and optimization, widely used in data analysis, curve fitting, and various scientific applications. Unlike traditional linear algebra solutions that seek exact solutions to systems of equations, the least-squares approach aims to find the best approximation when an exact solution may not exist or when dealing with overdetermined systems.

            In a typical linear algebra problem, we encounter equations of the form Ax = b, where A is a matrix, x is the unknown vector, and b is a known vector. The goal is to find x that satisfies this equation exactly. However, in many real-world scenarios, an exact solution may not exist due to measurement errors, inconsistencies in data, or simply because the system is overdetermined (more equations than unknowns).

            This is where the least-squares problem comes into play. Instead of seeking an exact solution, it aims to find an x that minimizes the sum of the squared differences between Ax and b. Mathematically, we can express this as finding x that minimizes ||Ax - b||², where ||·|| denotes the Euclidean norm.

            The least-squares solution is defined by a crucial inequality. For any vector y, the least-squares solution x satisfies:

            ||Ax - b|| ||Ay - b||

            This inequality essentially states that the least-squares solution x produces the smallest possible error term when compared to any other vector y. In other words, it provides the best approximation to the solution in terms of minimizing the overall error.

            To illustrate this concept, let's consider a simple example. Imagine we're trying to fit a straight line to a set of data points that don't perfectly align. The traditional approach might struggle to find an exact solution that passes through all points. The least-squares method, however, would find the line that minimizes the sum of the squared distances between the line and each data point, providing the best overall fit.

            The power of the least-squares approach lies in its ability to handle overdetermined systems. For instance, in a scenario where we have more equations than unknowns, an exact solution might not exist. The least-squares method provides a way to find the best compromise, minimizing the overall error across all equations.

            One of the key advantages of the least-squares problem is its mathematical tractability. The solution can often be found analytically using the normal equations: A^T Ax = A^T b, where A^T is the transpose of A. This formulation transforms the problem into a square system that can be solved using standard linear algebra techniques.

            In practice, the least-squares problem finds applications in various fields. In statistics, it forms the basis of linear regression. In signal processing, it's used for noise reduction and signal estimation. In computer vision, least-squares solutions help in camera calibration and image reconstruction.

            It's important to note that while the least-squares approach is powerful, it has limitations. It assumes that errors are normally distributed and can be sensitive to outliers. In cases where these assumptions don't hold, more robust methods might be necessary.

            The concept of least-squares extends beyond linear problems. Non-linear least-squares problems arise in many complex systems, requiring iterative methods like the Gauss-Newton algorithm or Levenberg-Marquardt algorithm for solution.

            In conclusion, the least-squares problem represents a paradigm shift from seeking exact solutions to finding best approximations. Its ability to handle overdetermined systems, provide meaningful solutions in the presence of noise, and its wide applicability make it an indispensable tool in modern mathematics and data analysis. By understanding the underlying principles and the defining inequality, we gain insight into how this method balances accuracy and practicality in solving real-world problems.

            Calculating the Least-Squares Solution

            The least-squares solution is a fundamental concept in linear algebra and statistics, used to find the best-fitting line or curve for a set of data points. This method minimizes the sum of the squared differences between observed and predicted values. Let's walk through the process of calculating the least-squares solution, focusing on the formula for x-hat and its components.

            Step 1: Understand the formula for x-hat

            The least-squares solution, denoted as x-hat, is given by the formula:

            x-hat = (ATA)-1ATb

            Where:

            • A is the matrix of independent variables
            • AT is the transpose of matrix A
            • (ATA)-1 is the inverse of the product of AT and A
            • b is the vector of dependent variables

            Step 2: Prepare the data

            Organize your data into matrix A for independent variables and vector b for dependent variables. Ensure that A has the correct dimensions: m rows (number of observations) and n columns (number of variables).

            Step 3: Calculate A transpose (AT)

            Transpose matrix A by switching its rows and columns. This operation is crucial for the subsequent matrix multiplication.

            Step 4: Multiply AT and A

            Perform matrix multiplication of AT and A. This results in a square matrix of size n x n.

            Step 5: Calculate the inverse of ATA

            Find the inverse of the product obtained in step 4. This step can be computationally intensive for large matrices and may require numerical methods.

            Step 6: Multiply (ATA)-1 with AT

            Perform matrix multiplication between the inverse calculated in step 5 and AT.

            Step 7: Multiply the result with b

            Finally, multiply the result from step 6 with vector b to obtain x-hat.

            Example calculation:

            Let's consider a simple example to demonstrate the process. Suppose we have the following data:

            A = [1 2; 3 4; 5 6], b = [3; 7; 11]

            Step 1: AT = [1 3 5; 2 4 6]

            Step 2: ATA = [35 44; 44 56]

            Step 3: (ATA)-1 = [0.0455 -0.0357; -0.0357 0.0284]

            Step 4: (ATA)-1AT = [-0.0714 0.0000 0.0714; 0.0357 0.0000 -0.0357]

            Step 5: x-hat = [-0.0714 0.0000 0.0714; 0.0357 0.0000 -0.0357] * [3; 7; 11] = [1; 2]

            This result means that the least-squares solution suggests a line with a slope of 2 and a y-intercept of 1.

            Uniqueness of the Least-Squares Solution

            Understanding the conditions under which the least-squares solution is unique is crucial in linear algebra and data analysis. The least-squares method is widely used to find approximate solutions to overdetermined systems, but the uniqueness of this solution depends on specific conditions. In this section, we'll explore the three key conditions that guarantee a unique least-squares solution and provide examples to illustrate each case.

            The first condition for a unique least-squares solution is that the columns of the coefficient matrix A must be linear independence. Linear independence means that no column can be expressed as a linear combination of the other columns. When this condition is met, it ensures that each predictor variable contributes unique information to the model. For example, consider a matrix A with columns [1 0 2], [0 1 1], and [2 1 5]. These columns are linearly independent because none can be formed by combining the others, thus satisfying the first condition for uniqueness.

            The second condition is that the matrix A^T A (where A^T is the transpose of A) must be invertible. This condition is closely related to the first, as linear independence of A's columns guarantees that A^T A is invertible. An invertible matrix has a non-zero determinant and a full rank equal to its dimension. For instance, if A is a 3x2 matrix with linearly independent columns, A^T A will be a 2x2 invertible matrix, fulfilling this condition.

            The third condition, which is essentially a restatement of the first two in matrix terms, is that the rank of matrix A must be equal to the number of columns in A. This condition ensures that there are no redundant or dependent columns in the matrix. For example, if A is a 4x3 matrix and its rank is 3, it satisfies this condition, indicating that all three columns contribute unique information to the system.

            When these three conditions are met, the least-squares solution x = (A^T A)^(-1) A^T b is guaranteed to be unique. This unique solution minimizes the sum of squared residuals between the observed values and the values predicted by the linear model. It's important to note that these conditions are interconnected; if one is satisfied, the others typically follow.

            Understanding these conditions is crucial for several reasons. Firstly, it helps in assessing the reliability and stability of the least-squares solution. A unique solution implies that small changes in the input data will result in small changes in the solution, making the model more robust. Secondly, it aids in diagnosing and addressing issues in the data or model specification. If uniqueness is not achieved, it may indicate multicollinearity among predictors or other structural problems in the data.

            In practical applications, such as regression analysis or signal processing, ensuring these conditions are met can significantly improve the quality and interpretability of results. For instance, in multiple regression analysis, checking for multicollinearity (a violation of linear independence) is a standard practice to ensure the reliability of coefficient estimates.

            In conclusion, the uniqueness of the least-squares solution hinges on the linear independence of the columns of A, the invertibility of A^T A, and the full column rank of A. These conditions collectively ensure that each predictor variable contributes unique information to the model, leading to a stable and interpretable solution. Practitioners in fields ranging from statistics to engineering should be aware of these conditions to effectively apply and interpret least-squares methods in their work.

            Least-Squares Error

            Least-squares error is a fundamental concept in data analysis and statistical modeling, widely used to measure the discrepancy between observed values and predicted values in a dataset. This method is crucial for optimizing models and making accurate predictions across various fields, including science, engineering, and economics.

            The least-squares error is calculated by summing the squared differences between each observed value and its corresponding predicted value. The formula for calculating the least-squares error (E) is:

            E = Σ(yi - ŷi)2

            Where yi represents the observed value, and ŷi represents the predicted value for each data point i.

            To demonstrate the use of this formula, let's consider a simple example. Suppose we have a dataset with five observed values: [2, 4, 5, 7, 9] and corresponding predicted values: [1.5, 3.8, 5.2, 6.5, 8.7]. We can calculate the least-squares error as follows:

            E = (2 - 1.5)2 + (4 - 3.8)2 + (5 - 5.2)2 + (7 - 6.5)2 + (9 - 8.7)2
            E = 0.25 + 0.04 + 0.04 + 0.25 + 0.09
            E = 0.67

            The significance of minimizing the least-squares error in practical applications cannot be overstated. By reducing this error, we can:

            1. Improve model accuracy: A lower error indicates that our predictions are closer to the actual observed values, resulting in more reliable models.
            2. Optimize parameter estimation: In regression analysis, minimizing the least-squares error helps in finding the best-fit line or curve that represents the relationship between variables.
            3. Enhance decision-making: More accurate models lead to better-informed decisions in various fields, such as finance, healthcare, and engineering.
            4. Facilitate model comparison: The least-squares error provides a standardized metric for comparing different models and selecting the most appropriate one for a given dataset.

            In the context of vector subtraction and length calculation, the least-squares error can be interpreted as the squared Euclidean distance between the observed and predicted vectors. This geometric interpretation helps in visualizing the error and understanding its properties in multidimensional spaces.

            Minimizing the least-squares error often involves iterative algorithms, such as gradient descent, which adjust model parameters to reduce the error systematically. These optimization techniques are crucial in machine learning and artificial intelligence applications, where complex models with numerous parameters need to be fine-tuned for optimal performance.

            It's important to note that while the least-squares method is widely used and effective in many scenarios, it can be sensitive to outliers. In cases where outliers are a concern, alternative error metrics or robust regression analysis techniques may be more appropriate.

            In conclusion, understanding and effectively utilizing the least-squares error concept is essential for anyone working with data analysis, predictive modeling, or optimization problems. By mastering this fundamental tool, practitioners can develop more accurate models, make better predictions, and ultimately drive more informed decision-making processes across various domains.

            Alternative Methods for Solving Least-Squares Problems

            When it comes to solving least-squares problems, two alternative methods stand out for their efficiency and applicability in different scenarios: the Orthogonal Set Method and the QR Factorization Method. Both approaches offer unique advantages and can be employed under specific conditions to find optimal solutions to overdetermined systems of linear equations.

            1. Orthogonal Set Method

            The Orthogonal Set Method is particularly useful when dealing with systems where the columns of the coefficient matrix form an orthogonal set. This method leverages the properties of orthogonality to simplify calculations and provide a straightforward solution.

            Conditions for Use:

            • The columns of the coefficient matrix A must form an orthogonal set.
            • The system Ax = b is overdetermined (more equations than unknowns).

            Step-by-Step Instructions:

            1. Verify that the columns of A are orthogonal.
            2. Compute the dot product of each column of A with itself (ai·ai).
            3. Calculate the dot product of each column of A with the vector b (ai·b).
            4. For each unknown xi, compute: xi = (ai·b) / (ai·ai)

            Example:

            Consider the system Ax = b, where:

            A = [1 0; 0 1; 1 1], b = [2; 3; 4]

            The columns of A are orthogonal. Following the steps:

            x1 = (2 + 4) / (1 + 0 + 1) = 3

            x2 = (3 + 4) / (0 + 1 + 1) = 3.5

            The least-squares solution is x = [3; 3.5].

            2. QR Factorization Method

            The QR Factorization Method is a more general approach that can be applied to any overdetermined system, regardless of whether the columns of A are orthogonal. This method involves decomposing the coefficient matrix A into an orthogonal matrix Q and an upper triangular matrix R.

            Conditions for Use:

            • The system Ax = b is overdetermined.
            • No specific requirements for the structure of A.

            Step-by-Step Instructions:

            1. Perform QR factorization on A to obtain Q and R: A = QR
            2. Compute QTb
            3. Solve the upper triangular system Rx = QTb using back-substitution

            Example:

            Consider the system Ax = b, where:

            A = [1 1; 1 2; 1 3], b = [4; 5; 6]

            After QR factorization:

            Q [0.577 -0.816; 0.577 0; 0.577 0.816]

            R [1.732 3.464; 0 1.414]

            Conclusion and Practical Applications

            The least-squares problem is a fundamental concept in data fitting and optimization. As introduced in the video, it involves minimizing the sum of squared differences between observed and predicted values. This method is crucial for finding the best-fit line or curve through a set of data points. Least-squares has wide-ranging applications across various fields, including engineering, physics, economics, and machine learning. It's used in regression analysis, signal processing, and parameter estimation. In engineering, it helps in system identification and control. Economists use it for trend analysis and forecasting. In physics, it's applied to experimental data analysis. The introduction video provides a solid foundation for understanding this concept, but further exploration is encouraged to grasp its full potential. As you delve deeper, you'll discover how least-squares optimization shapes our understanding of data and drives decision-making in numerous industries. It's used in regression analysis, signal processing, and parameter estimation.

            Least Squares Problem Overview:

            Least Squares Problem Overview: The Least Squares Solution
            Ax=bAx=b give no solution
            • Approximate closest solution x^\hat{x}
            • The least Squares Solution x^=(ATA)1ATb\hat{x} =(A^T A)^{-1} A^T b
            • Not always a unique solution

            Step 1: Introduction to the Least Squares Problem

            The least squares problem is a fundamental concept in linear algebra and numerical analysis. It arises when we have a system of linear equations Ax=bAx = b that does not have an exact solution. In such cases, we aim to find an approximate solution that minimizes the error. This approximate solution is known as the least squares solution.

            In linear algebra, we often encounter the matrix equation Ax=bAx = b. When solving for xx, we may find that there are either infinitely many solutions, a unique solution, or no solutions at all. In cases where Ax=bAx = b has no solution, we do not simply discard the problem. Instead, we seek an approximate solution xx such that AxAx is as close as possible to bb. This is the essence of the least squares problem.

            Step 2: Understanding the Approximate Solution

            When Ax=bAx = b has no exact solution, we approximate a solution xx such that AxAx is approximately equal to bb. This means that the difference between AxAx and bb is minimized. Mathematically, we denote this approximate solution as x^\hat{x}.

            The goal is to find x^\hat{x} such that the length (or norm) of the vector bAx^b - A\hat{x} is minimized. This length represents the error between the actual vector bb and the product Ax^A\hat{x}. The smaller this error, the better the approximation.

            Step 3: The Least Squares Solution Formula

            The least squares solution x^\hat{x} can be found using the formula: \[ \hat{x} = (A^T A)^{-1} A^T b \] Here, ATA^T denotes the transpose of the matrix AA, and (ATA)1(A^T A)^{-1} is the inverse of the matrix ATAA^T A. This formula provides the best approximation for xx in the least squares sense.

            To derive this formula, we start with the matrix equation ATAx^=ATbA^T A \hat{x} = A^T b. By multiplying both sides of this equation by the inverse of ATAA^T A, we isolate x^\hat{x} and obtain the least squares solution.

            Step 4: Calculating the Least Squares Solution

            To find the least squares solution x^\hat{x}, follow these steps:

            1. Compute the transpose of the matrix AA, denoted as ATA^T.
            2. Multiply ATA^T by AA to obtain the matrix ATAA^T A.
            3. Find the inverse of the matrix ATAA^T A, denoted as (ATA)1(A^T A)^{-1}.
            4. Multiply (ATA)1(A^T A)^{-1} by ATA^T to obtain (ATA)1AT(A^T A)^{-1} A^T.
            5. Finally, multiply (ATA)1AT(A^T A)^{-1} A^T by the vector bb to obtain the least squares solution x^\hat{x}.

            Step 5: Example Calculation

            Let's consider an example to illustrate the calculation of the least squares solution. Suppose we have the following matrix AA and vector bb:

            \[ A = \begin{pmatrix} 1 & 1 <br/><br/> 1 & 0 <br/><br/> 0 & 1 \end{pmatrix}, \quad b = \begin{pmatrix} 1 <br/><br/> 1 <br/><br/> 2 \end{pmatrix} \]

            First, compute the transpose of AA: \[ A^T = \begin{pmatrix} 1 & 1 & 0 <br/><br/> 1 & 0 & 1 \end{pmatrix} \]

            Next, multiply ATA^T by AA: \[ A^T A = \begin{pmatrix} 1 & 1 & 0 <br/><br/> 1 & 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 1 <br/><br/> 1 & 0 <br/><br/> 0 & 1 \end{pmatrix} = \begin{pmatrix} 2 & 1 <br/><br/> 1 & 2 \end{pmatrix} \]

            Find the inverse of ATAA^T A: \[ (A^T A)^{-1} = \frac{1}{3} \begin{pmatrix} 2 & -1 <br/><br/> -1 & 2 \end{pmatrix} \]

            Multiply (ATA)1(A^T A)^{-1} by ATA^T: \[ (A^T A)^{-1} A^T = \frac{1}{3} \begin{pmatrix} 2 & -1 <br/><br/> -1 & 2 \end{pmatrix} \begin{pmatrix} 1 & 1 & 0 <br/><br/> 1 & 0 & 1 \end{pmatrix} = \frac{1}{3} \begin{pmatrix} 1 & 2 & -1 <br/><br/> 2 & -1 & 2 \end{pmatrix} \]

            Finally, multiply (ATA)1AT(A^T A)^{-1} A^T by bb to obtain x^\hat{x}: \[ \hat{x} = \frac{1}{3} \begin{pmatrix} 1 & 2 & -1 <br/><br/> 2 & -1 & 2 \end{pmatrix} \begin{pmatrix} 1 <br/><br/> 1 <br/><br/> 2 \end{pmatrix} = \frac{1}{3} \begin{pmatrix} 1 + 2 - 2 <br/><br/> 2 - 1 + 4 \end{pmatrix} = \frac{1}{3} \begin{pmatrix} 1 <br/><br/> 5 \end{pmatrix} = \begin{pmatrix} \frac{1}{3} <br/><br/> \frac{5}{3} \end{pmatrix} \]

            Step 6: Uniqueness of the Least Squares Solution

            The least squares solution x^\hat{x} is not always unique. There are certain conditions under which the solution is unique:

            1. The equation Ax=bAx = b has a unique least squares solution for each bb in RmRm.
            2. The columns of AA are linearly independent.
            3. The matrix ATAA^T A is invertible.

            If any of these conditions hold, then the least squares solution x^\hat{x} is unique. The most straightforward condition to check is whether ATAA^T A is invertible. If you can find the inverse of ATAA^T A, then the least squares solution is unique.

            FAQs

            1. What is the least-squares problem?

              The least-squares problem is a mathematical approach used to find the best-fitting solution to an overdetermined system of equations. It minimizes the sum of the squared differences between observed and predicted values, making it ideal for data fitting and optimization tasks.

            2. How is the least-squares solution calculated?

              The least-squares solution is typically calculated using the formula x-hat = (ATA)-1ATb, where A is the coefficient matrix, AT is its transpose, and b is the vector of observed values. This involves matrix operations such as multiplication, transposition, and inversion.

            3. What conditions ensure a unique least-squares solution?

              A unique least-squares solution is guaranteed when: 1) the columns of the coefficient matrix A are linearly independent, 2) the matrix ATA is invertible, and 3) the rank of matrix A equals the number of its columns. These conditions ensure that each predictor variable contributes unique information to the model.

            4. What are some alternative methods for solving least-squares problems?

              Two alternative methods for solving least-squares problems are the Orthogonal Set Method and the QR Factorization Method. The Orthogonal Set Method is useful when the columns of the coefficient matrix form an orthogonal set, while the QR Factorization Method is more general and can be applied to any overdetermined system.

            5. What are some practical applications of the least-squares method?

              The least-squares method has numerous practical applications across various fields. It's widely used in regression analysis for trend prediction, in signal processing for noise reduction, in engineering for system identification and control, in physics for experimental data analysis, and in economics for forecasting and trend analysis. It's also fundamental in many machine learning algorithms for parameter estimation and model optimization.

            Prerequisite Topics for Understanding Least-Squares Problem

            To fully grasp the concept of the least-squares problem, it's crucial to have a solid foundation in several key areas of mathematics. Understanding these prerequisite topics will not only make learning about least-squares problems easier but also provide valuable context for its applications and significance in various fields.

            One of the fundamental concepts to master is the applications of linear equations. This knowledge forms the backbone of many mathematical models used in least-squares problems, particularly in data fitting and optimization scenarios. Closely related to this is the ability to determine the number of solutions to linear equations, which is crucial when dealing with systems of equations in least-squares problems.

            A strong grasp of matrix equations, especially in the form Ax=b, is essential. This format is frequently encountered in least-squares problems, where we seek to find the best solution that minimizes the sum of squared residuals. Understanding matrix operations and their properties, particularly matrix multiplication, is vital for manipulating these equations efficiently.

            The concept of linear independence plays a significant role in least-squares problems, especially when dealing with overdetermined systems. It helps in understanding the uniqueness and existence of solutions, which is crucial for interpreting the results of least-squares methods.

            Furthermore, familiarity with regression analysis provides valuable context for the practical applications of least-squares problems. Regression analysis is a statistical method that uses least-squares to model the relationship between variables, making it a prime example of how least-squares techniques are applied in real-world scenarios.

            By mastering these prerequisite topics, students will be well-equipped to tackle the complexities of least-squares problems. They'll understand not just the mathematical mechanics but also the underlying principles and real-world applications. This comprehensive foundation will enable them to approach least-squares problems with confidence, whether in academic settings or practical applications in fields such as engineering, economics, and data science.

            Remember, the journey to understanding complex mathematical concepts like least-squares problems is built upon a solid grasp of these fundamental topics. Each prerequisite serves as a building block, contributing to a deeper, more intuitive understanding of the subject matter. As you progress in your studies, you'll find that these interconnected concepts continually reinforce each other, creating a robust framework for advanced mathematical thinking and problem-solving.

            In linear algebra, we have dealt with questions in which Ax=bAx=b does not have a solution. When a solution does not exist, the best thing we can do is to approximate xx. In this section, we will learn how to find a xx such that it makes AxAx as close as possible to bb.

            If AA is an m×nm \times n matrix and bb is a vector in Rn\Bbb{R}^n, then a least-squares solution of Ax=bAx=b is a x^\hat{x} in Rn\Bbb{R}^n where
            bAx^bAx\lVert b-A \hat{x}\rVert \leq \lVert b-Ax\rVert

            For all xx in Rn\Bbb{R}^n.

            The smaller the distance, the smaller the error. Thus, the better the approximation. So the smallest distance gives the best approximation for xx. So we call the best approximation for xx to be x^\hat{x}.

            The Least-Squares Solution

            The set of least-square solutions of Ax=bAx=b matches with the non-empty set of solutions of the matrix equation ATAx^=ATbA^T A \hat{x}=A^T b.

            In other words,
            ATAx^=ATbA^T A \hat{x}=A^T b
            x^=(ATA)1ATb \hat{x} = (A^TA)^{-1}A^Tb

            Where xx is the least square solutions of Ax=bAx=b.

            Keep in mind that xx is not always a unique solution. However, it is unique if one of the conditions hold:
            1. The equation Ax=bAx=b has unique least-squares solution for each b in Rm\Bbb{R}^m.
            2. The columns of AA are linearly independent.
            3. The matrix ATAA^T A is invertible.

            The Least-Squares Error
            To find the least-squares error of the least-squares solution of Ax=bAx=b, we compute

            bAx^\lVert b - A \hat{x} \rVert

            Alternative Calculations to Least-Squares Solutions
            Let AA be a m×nm \times n matrix where a1,,ana_1,\cdots,a_n are the columns of AA. If Col(A)=Col(A)={a1,,ana_1,\cdots,a_n } form an orthogonal set, then we can find the least-squares solutions using the equation
            Ax^=b^A \hat{x}=\hat{b}

            where b^=projCol(A)b.\hat{b}=proj_{Col(A)}b.

            Let AA be a m×nm \times n matrix with linearly independent columns, and let A=QRA=QR be the QRQR factorization of AA . Then for each bb in Rm\Bbb{R}^m, the equation Ax=bAx=b has a unique least-squares solution where
            x^=R1QTb\hat{x}=R^{-1} Q^T b
            Rx^=QTb R\hat{x}=Q^T b