Master the Equation of the Best Fit Line: Key to Data Analysis

Equation of the best fit line

This lesson is a continuation of our past two lessons, where we talked about bivariate data, scatter plots and correlation, and then learnt about regression analysis. Therefore, we will be using the concepts we acquired throughout those two lessons and construct on them to study the line of best fit definition and characteristics.

What is a line of best fit

As we saw in our past lesson, a line of best fit (or best fit line) is simply straight line that tries to represent the data points in a scatter plot as best as possible. This doesnt mean that this line will touch every single point from the data in the plot, actually a line of best fit may touch a few, all or NONE of the data points plotted in the graph. For that reason, the line of best fit is also called the trend line because instead of exactly representing each single point of the data set, it does all it can by presenting the overall trend that the data points follow, it provides a view of the behaviour of the data points and how the variables are correlated with each other.

Equation of the best fit line

Figure 1: Examples of lines of best fit in bivariate data scatter plots

How to find line of best fit

Since the line of best fit is simply a straight line, it can be mathematically defined through the equation for a straight line:

y=mx+b \;

→

\; y=ax+b

Equation 1: Equation for the best fit line (equation for a straight line)

Where we know that:

y =

dependent variable

x =

independent variable

m=a =

slope of the line (the name can be different depending on the textbook you are using)

b=y-

intercept (point in the graph where the line crosses the $y$ axis

Notice the slope can have either one of two names:

m

a

, the name differs depending on which textbook you are using in your class or to study; for this lesson, we will keep the name

a

, just remember that we are talking about the slope of best fit line.

For the cases in which we are looking at a linear regression analysis graph where a bivariate set of data has been plotted, we will always have the values of the variables

x_i

and

y_i

(since these are the values given in the bivariate data set) and so, we will usually have to solve for the slope and the y-intercept from the equation for the line of best fit.

In other words, when having a bivariate data set,

x_i

and

y_i

are provided, so a and b have to be calculated (this is not always the case, the line of best fit equation can be used to solve for the values of the variables themselves when given the slope of the line and the y-intercept, but if the data table is provided, then we will be solving for

a

and

b

Figure 2: Example of Bivariate set of data in a scatter plot

The formulas for the slope and the y-intercept are as follows:

\large a = \frac{n \sum_{i=1}^{n}x_iy_i \;- \; \sum_{i=1}^{n}x_i \sum_{i=1}^{n}y_i}{n \sum_{i=1}^{n}x_i^2 \; - \; (\sum_{i=1}^{n}x_i)^2} \quad

and

\enspace = b = \overline{y} - a\overline{x}

Equation 2: formulas for slope and y-intercept

Where:

n=

number of data points

y_i=

dependent variable data value

x_i=

independent variable data value

a=

slope of the best fit line

b=y

-intercept

\overline{x}

= mean for the sample of $x$ values

\overline{y}

= mean for the sample of $y$ values

\sum_{i=1}^{n}

is the symbol for summation
therefore:

\large \sum_{i=1}^{n}x_i = x_1 \;+\; x_2 \; + \, ... \, + \, x_n

In equation 2, notice that b is defined in terms of a, therefore, you will always solve for a first; b is also defined in terms of the means

\overline{x}

and

\overline{y}

, which takes us to an important realization: the data points in the set shown in a regression analysis scatter plot count as a sample, not as a whole population. If you think about it, this makes sense, since a regression analysis scatter plot is usually used to find missing points that have not been graphed, but can be inferred by the relationship shown throughout the given data points.
Therefore, when obtaining the mean of the values for each of the variables used in the analysis, we are taking the mean of sample data points and so the notation for the mean of a sample:

\overline{x}

.

After solving

a

and

b

, we can use these values to solve the best fit line equation as shown in equation 1, and plot the best fit line graph in the scatter plot.

How to draw a line of best fit

Let us use the method described above to obtain the best fit line of the bivariate data scatter plot shown in figure 2. We start by producing its corresponding data table so we know the values of

x_i

and

y_i

Figure 3: Data table for bivariate data set in figure 2

So let us solve for a by making the calculations in pieces:

\large a = \frac{n \sum_{i=1}^{n}x_iy_i \;- \; \sum_{i=1}^{n}x_i \sum_{i=1}^{n}y_i}{n \sum_{i=1}^{n}x_i^2 \; - \; (\sum_{i=1}^{n}x_i)^2}

where:

n =13

\large \sum_{i=1}^{13}x_iy_i = x_1y_1 \;+\; x_2y_2 \; + \, ... \, + \, x_{13}y_{13}

= 9+14+24+20+40+30+28+32+9+50+33+12+26=327

\large \sum_{i=1}^{13}x_i= x_1 \;+\; x_2 \; + \, ... \, + \, x_{13}

= 1+2+3+4+5+6+7+8+9+10+11+12+13=91

\large \sum_{i=1}^{13}y_i= y_1 \;+\; y_2 \; + \, ... \, + \, y_{13}

= 9+7+8+5+8+5+4+4+1+5+3+1+2=62

\large \sum_{i=1}^{13}x_i^2

= 1+4+9+16+25+36+49+64+81+100+121+144+169=819

\large (\sum_{i=1}^{13}x_i)^2

=(91)^{2}=8,281

therefore:

\large a = \frac{13(327)-(91)(62)}{13(819)-8,281} = \frac{4,251-5,642}{10,647-8,281} = \frac{-1,391}{2,366} = - 0.59

Equation 3: Solving for the slope of the best fit line

Now we solve for b:

b = \overline{y}- a\overline{x}

where:

\overline{x}= \frac{1+2+3+4+5+6+7+8+9+10+11+12+13}{13} = \frac{91}{13} =7

\overline{y}= \frac{9+7+8+5+8+5+4+4+1+5+3+1+2}{13} = \frac{62}{13} =4.77

therefore:

\; b =4.77 - (-0.59)(7) = 4.77 + 4.13 = 8.9

Equation 4: Solving for the y-intercept

And so, we can obtain the points for our trend line using the line of best fit formula from equation 1:

y = ax +b

when

\,x = 0 \,

→

\, y = 0 +8.9 = 8.9

when

\,x = 13 \,

→

\, y = (-0.59)(13) +8.9 = -7.67 + 8.9 = 1.23

Equation 5: Obtaining two points for the line of best fit

And now we can graph the two points found above: (0, 8.9) and (13, 1.23); we connect them with a straight line and we find the line of best fit!

Equation of the best fit line

Figure 4: Plotting the best fit line

And so, for the scatter plot of the line of best fit as seen in figure 4, we can see that the points (0, 8.9) and (13, 1.23) are shown in green, and the best fit line is shown in blue.

Let us work through another example so you can get more practice:

Example 1

Given the following bivariate data, what is the line of best fit?
Use the the equation for the line of best fit and plot it in the diagram provided.

Equation of the best fit line

Figure 5: Data table for bivariate data set

Figure 6: Bivariate set of data in a scatter plot

We start by doing the calculation for the slope of the line of best fit:

\large a = \frac{n \sum_{i=1}^{n}x_iy_i \;- \; \sum_{i=1}^{n}x_i \sum_{i=1}^{n}y_i}{n \sum_{i=1}^{n}x_i^2 \; - \; (\sum_{i=1}^{n}x_i)^2}

where:

n =13

\large \sum_{i=1}^{4}x_iy_i = x_1y_1 \;+\; x_2y_2 \; + \, x_3y_3 \, + \, x_{4}y_{4}

=(1)(2)+(2)(2)+(3)(4)+(4)(5)=38

\large \sum_{i=1}^{4}x_i = x_1 \;+\; x_2 \; + \, x_3 \, + \, x_{4}

=1+2+3+4=10

\large \sum_{i=1}^{4}y_i = y_1 \;+\; y_2 \; + \, y_3 \, + \, y_{4}

=2+2+4+5=13

\large \sum_{i=1}^{4}x_i^2

= 1+4+9+16=30

\large (\sum_{i=1}^{4}x_i)^2

=(10)^{2}=100

therefore:

\large a = \frac{4(38)-(10)(13)}{4(30)-100} = \frac{152-130}{120-100} = \frac{22}{20} = 1.1

Equation 6: Solving for the slope of the best fit line

Now we solve for b:

b = \overline{y}- a\overline{x}

where:

\overline{x}= \frac{1+2+3+4}{4} = \frac{10}{4} =2.5

\overline{y}= \frac{2+2+4+5}{4} = \frac{13}{4} =3.25

therefore:

\; b =3.25 - (1.1)(2.5) = 3.25 - 2.75 = 0.5

Equation 7: Solving for the y-intercept

And so, we can obtain the points for our trend line using the line of best fit formula from equation 1:

y = ax +b

when

\,x = 0 \,

→

\, y = 0 +0.5 = 0.5

when

\,x = 4 \,

→

\, y = (1.1)(4) +0.5 = 4.4 + 0.5 = 4.9

Equation 8: Obtaining two points for the line of best fit

And now we can graph the two points found above: (0, 0.5) and (4, 4.9); we connect them with a straight line and we obtain the line of best fit:

Equation of the best fit line

Figure 7: Plotting the best fit line

No we end this lesson with a few recommendations: this lesson on the equation of the line of best fit provides many more examples that you can work through so you continue practice what you learned today. And for even more practice on you own, this lines of best fit worksheet can be printed out and worked through!

This is it for our lesson of today, see you in the next one!

The best fit line has the equation:

y=ax+b

, where

a

and

b

are given as:
•

a=\frac{n\sum xy-\sum x \sum y}{n\sum x^2-(\sum x)^2}

•

b=\overline{y}-a\overline{x}

# of dragons killed	Corresponding level
1	Level 4
2	Level 5
3	Level 6
4	Level 6
5	Level 7

Mastering the Equation of the Best Fit Line
Unlock the power of data analysis by learning to calculate and interpret the line of best fit equation. Improve your statistical skills and make accurate predictions with our step-by-step guide.

Free to Join!

Easily See Your Progress

Make Use of Our Learning Aids

Last Viewed

Practice Accuracy

Suggested Tasks

Earn Achievements as You Learn

Create and Customize Your Avatar

Table of Contents:

Equation of the best fit line

What is a line of best fit

How to find line of best fit

How to draw a line of best fit

Example 1

Mastering the Equation of the Best Fit Line Unlock the power of data analysis by learning to calculate and interpret the line of best fit equation. Improve your statistical skills and make accurate predictions with our step-by-step guide.

Easily See Your Progress

Make Use of Our Learning Aids

Last Viewed

Practice Accuracy

Suggested Tasks

Earn Achievements as You Learn

Create and Customize Your Avatar

Table of Contents:

Equation of the best fit line

What is a line of best fit

How to find line of best fit

How to draw a line of best fit

Example 1

Become a member to get more!

Mastering the Equation of the Best Fit Line
Unlock the power of data analysis by learning to calculate and interpret the line of best fit equation. Improve your statistical skills and make accurate predictions with our step-by-step guide.