Measures of relative standing - z-score, quartiles, percentiles

0/4
Introduction
Lessons
  1. Z-Score
  2. Quartiles
  3. InterQuartile Range
  4. Percentiles
0/10
Examples
Lessons
  1. Using Z-score to Compare the Variation in Different Populations
    Charlie got a mark of 85 on a math test which had a mean of 75 and a standard deviation of 5. Daisy got a mark of 75 on an English test which had a mean of 69 and a standard deviation of 2. Relative to their respective mean and standard deviation, who got the better grade?
    1. Determining the Quartiles
      Find the quartiles for each data set:
      1. {9, 3, 7, 5, 2, 8, 12}
      2. {2, 3, 5, 7, 8, 9, 12, 15}
      3. {2, 3, 5, 7, 8, 9, 12, 15, 35}
    2. Interquartile Range & Box-and-Whisker Plot
      For the data set: {8, 2, 20, 4, 9, 5, 6, 12, 10, 1}
      1. Determine the quartiles.
      2. Find the interquartile range.
      3. Construct a box-and-whisker plot.
      4. Which data points, if any, are outliers?
    3. Determining the Percentile
      Sidney is taking a biology course in university. She got a mark of 78% and the list of all marks from her class (including her mark) is given by {56, 83, 74, 67, 47, 54, 82, 78, 86, 90}.
      1. What percentile did she score in?
      2. Sidney's friend Billy knows he got in the 70% percentile, what was his mark?
    Free to Join!
    StudyPug is a learning help platform covering math and science from grade 4 all the way to second year university. Our video tutorials, unlimited practice problems, and step-by-step explanations provide you or your child with all the help you need to master concepts. On top of that, it's fun - with achievements, customizable avatars, and awards to keep you motivated.
    • Easily See Your Progress

      We track the progress you've made on a topic so you know what you've done. From the course view you can easily see what topics have what and the progress you've made on them. Fill the rings to completely master that section or mouse over the icon to see more details.
    • Make Use of Our Learning Aids

      Last Viewed
      Practice Accuracy
      Suggested Tasks

      Get quick access to the topic you're currently learning.

      See how well your practice sessions are going over time.

      Stay on track with our daily recommendations.

    • Earn Achievements as You Learn

      Make the most of your time as you use StudyPug to help you achieve your goals. Earn fun little badges the more you watch, practice, and use our service.
    • Create and Customize Your Avatar

      Play with our fun little avatar builder to create and customize your own avatar on StudyPug. Choose your face, eye colour, hair colour and style, and background. Unlock more options the more you use StudyPug.
    Topic Notes

    Measures of relative standing


    A measure of relative standing is a way to describe the relationship between a specific value in a data set with the rest of the values in the set, or, a way to compare values coming from different data sets with each other. Specifically, a measure of relative standing refers to mathematical tricks that allow you to scale a data set and its distribution in a way that you can meaningfully compare this data in many ways (be it within itself, or with other proportionally scaled data sets); for that, a measure of relative standing focuses on the relative position of a data value within the data set and they are also called measures of location or measures of position.
    The three basic measures of relative standing are the z-score (also called the standard score), the percentiles (and their percentile rank) and quartiles.

    Measures of relative standing
    Figure 1: Representation of the measures of relative standing on a normal distribution


    What is a z score?


    The z score let us know of how far away a data point is from the mean of its set, in units of the standard deviation of the set. In other words, once you have calculated the mean of a data set and its distribution, you can calculate how many of these standard deviations separate each particular data point from the mean, that is the z score for each value.

    \quad What does z score mean

    The z score definition above may seem too simple but the process is quite remarkable, let us expand on this. The z score, also called the standard score or the standardized score, is used to re-scale a data set and its distribution so we can meaningfully compare it with others. What does that mean? Imagine you are collecting statistical data from the people in the city of Richmond by checking on records done by official government agencies or specialized companies. After some research you obtain data from the ages of the city population, the rate of car ownership, information on how many of them have a professional degree and how many own a house. You have all of these sets of data (which we assume are normally distributed), for which you can create frequency distributions and histograms to compare them, and after you do, you arrive to an issue: No matter how well done are your distribution graphs, you cannot accurately compare them because the samples of the population used in each statistical data set are different, and so, the proportions do not fit with one or the other. Here is where normalization and the z-score come to play a role! Calculating the z score of the values in each data set you can produce re-scaled distributions that can literally be overlapped on each other for comparison.

    The process can get quite complicated, so let us first start with the basic calculation for the z score, and once we have learned more about the normal distribution we can come back to the use of the z score for higher difficulty, unrelated data set, comparisons.

    \quad How to calculate a z score

    In order to calculate the z score of a population we follow the next formula:

    Zx=xμσ\large Z_{x} = \frac{x- \mu}{\sigma}
    Equation 1: Z score of a population

    Where:
    ZxZ_x = Z score
    xx = to the data value
    μ \mu = mean of the data set
    σ\sigma = standard deviation of the data set (which is a population in this case)

    Equation 1 is also called the standard score formula and it represents the mathematical z-score definition.
    Accordingly, the z score equation for a sample is defined as:

    ZX=xxˉs\large Z_{X} = \frac{x- \bar{x}}{s}
    Equation 2: Z score of a sample

    Where:
    ZxZ_x = Z score
    xx = to the data value
    xˉ \bar{x} = mean of the data set
    ss = standard deviation of the data set (which is a population in this case)

    Let us look at the usage of the z score in the next example:

    Example 1

    Using Z-score to Compare the Variation in Different Populations, look at the next case:
    Charlie got a mark of 85 on a math test which had a mean of 75 and a standard deviation of 5. Daisy got a mark of 75 on an English test which had a mean of 69 and a standard deviation of 2. Relative to their respective mean and standard deviation, who got the better grade?

    We need to calculate the z-score for the grades of Charlie and Daisy and see who (if any) was among the best on their classes. We have the following information:

    ZChZ_{Ch} = Z score for Charlie
    xx = 85
    μ \mu = 75
    σ\sigma = 5

    ZDZ_D = Z score for Daisy
    xx = 75
    μ \mu = 69
    σ\sigma = 2

    Therefore, using the z score formula from equation 1, we calculate the z scores for each student and find:

    ZCh=xμσ=85755=105=2\large Z_{Ch} = \frac{x- \mu}{\sigma} = \frac{85-75}{5} = \frac{10}{5} = 2

    ZD=xμσ=75692=62=3 \large Z_{D} = \frac{x- \mu}{\sigma} = \frac{75-69}{2} = \frac{6}{2} = 3
    Equation 3: Z scores for Charlie and Daisy

    So after we have gotten the corresponding z scores, how do we know which of their grades is better? Well, the results from equation 3 tell us that Charlie got a test mark 2 standard deviations higher than the mean of the class, while Daisy got a mark that is 3 standard deviations higher than the mean in her class. Therefore, proportionally speaking, Daisy did better within her class in comparison to Charlie.

    NOTICE: Daisy did better WITHIN her class, in comparison to how Charide did WITHIN his class; thus, the z score calculation let us know how they proportionately did within their classes (meaning that Daisy was probably among the people with the highest marks for that test in her class). This does not mean that Charlies grade is absolutely worse than Daisys. If taken as an absolute value only, Charlie still got a higher mark compared to Daisy; still, proportionally speaking, it appears that people in Charlies class got higher marks too and so he wasnt among the very highest marks in his class.

    What is a percentile?


    Now let us talk about another measure of relative standing, the percentile. Percentiles indicate the percentage of data outcomes in a set which fall under a certain value.

    \quad How do percentiles work

    Percentiles divide the whole data set into a hundred equal parts, when translating this into a distribution graph, the percentiles produce 99 division marks that denote the percentage of data located up to a certain value. Each of the 99 division marks within the distribution is what we call a percentile. When looking at a percentile mark on a specific data value, we can see the percentage of data that is found below (or up to) that value, therefore, percentiles do not necessarily lay equally separated on a distribution (look at the bottom of figure 1 to see for yourself).

    \quad How to calculate percentiles

    In order to calculate the percentile of a certain value XX from the data set we follow the next equation:

    Percentile of  X=number  of  data  points  less  than  Xtotal  number  of  data  points×100\; X = \frac{number\;of\;data\;points\;less\;than\;X}{total\;number\;of\;data\;points} \times 100
    Equation 4: Percentiles formula

    Let us look at an example so you see the process of finding percentiles in action:

    Example 2

    Sidney is taking a biology course in university. She got a mark of 78% and the list of all marks from her class (including her mark) is given by {56, 83, 74, 67, 47, 54, 82, 78, 86, 90}.
    1. What percentile did she score in?
    2. Sidneys friend Billy knows he got in the 70% percentile, what was his mark?

    First we order the scores from lowest to highest: {47, 54, 56, 67, 74, 78, 82, 83, 86, 90}. Notice we put Sidneys score in bold. Now, solving for the percentile Sidney scored in, we use the percentile formula shown in equation 4:

    Percentile for Sidney's score=510×100=50 = \frac{5}{10} \times 100 = 50
    Equation 5: Sidneys percentile

    So we have that Sidney scored in the 50th percentile (or above the 50%).
    Now to answer the second question of this problem, let see what is Billys mark if he is in the 70th percentile: Using the percentile equation (equation 4) we solve for the number of data points less than X so we can then go and check back which score meets this condition in the set:

    Percentile  of  X  ×  total  number  of  data  points100=\frac{Percentile\;of\;X\; \times\; total\;number\;of\;data\;points}{100} = number of data points less than  X \; X

    70×10100=7\frac{70 \times10}{100} = 7
    Equation 6: Finding Billys score

    Therefore, there are 7 data values in the set before Billys score, which means Billy got a 83% in his Biology course.

    What are quartiles


    Just as its name indicates, a quartile focuses on dividing the data distribution into four parts, where each quartile is the specific point marking the division between the first quarter and the second, the second quarter and the third or the third quarter and the fourth. In simple words, quartiles are values that divide a data set into quarters after the data set has been ordered; each quartile has a name and they are: Q1,Q2Q_1, Q_2 and Q3Q_3.

    Where:
    Q1Q_1 = splits the lowest 25% of the sorted data
    Q2Q_2 = Median=splits the lowest 50% of the sorted data
    Q2Q_2 = splits the lowest 75% of the sorted data

    The middle 50% of the data in the data set and its proper distribution comprises the interval named the interquartile range, which is equal to subtracting the first quartile from the third quartile.
    Do not confuse a quartile with a quarter, while each quarter refers to the whole fraction of the data representing 25% of it, the quartile is the point that marks the division between one quarter and the other.

    \quad How to calculate quartiles


    Let us explain the method to calculate quartiles with the next example:

    Example 3

    Find the quartiles for each data set:

    a) \quad {9, 3, 7, 5, 2, 8, 12}
    We first find the median, for which you have to order the data values from lowest to highest first and then find the value in the midpoint.

    {9, 3, 7, 5, 2, 8, 12} = {2, 3, 5, 7, 8, 9, 12}

    The media represents the second (or middle) quartile, for this case Q2=7Q_2 = 7.
    Then we just obtain the median for each half of data values on the left and right of 7, and so:

    {2, 3, 5, 7, 8, 9, 12} = {2, 3, 5, 7, 8, 9, 12}

    And we obtain that Q1=3Q_1=3 and Q3=9Q_3 = 9.

    b) \quad {2, 3, 5, 7, 8, 9, 12, 15}
    This particular data set has its values already ordered from lowest to highest, therefore, we just find the median:

    {2, 3, 5, 7, 8, 9, 12, 15}

    Since the data set has an even amount of values, we obtain the median by averaging the two center values on the set:

    Q2=7  +  83=7.5Q_2 = \frac{7\;+\;8} {3} = 7.5
    Equation 7: Second quartile

    Therefore Q2=7.5.Q_2 = 7.5.
    And then we find the median for the range of values on each half of the data set:

    {2, 3, 5, 7, 8, 9, 12, 15} = {2, 3, 5, 7},{ 8, 9, 12, 15}

    Calculating first and third quartiles:

    Q1=3  +  52=5Q_1 = \frac{3\;+\;5}{2} = 5

    Q3=9  +  122=10.5Q_3 = \frac{9\;+\;12}{2} = 10.5
    Equation 8: First and third quartiles

    Therefore Q1=4Q_1 = 4 and Q3=10.5.Q_3 = 10.5.

    c) \quad {2, 3, 5, 7, 8, 9, 12, 15, 35}
    Data set cc is already ordered too, and given that it has an odd amount of values we can easily find its median:

    {2, 3, 5, 7, 8, 9, 12, 15, 35}

    And so Q2=8.Q_2 = 8.
    Now we get the median of each half of the data set at each side of the median we just got:

    {2, 3, 5, 7, 8, 9, 12, 15, 35} = {2, 3, 5, 7}, {8}, {9, 12, 15, 35}

    Calculate the first and third quartiles:

    Q1=3  +  52=4Q_1 = \frac{3\;+\;5}{2} = 4

    Q3=12  +  152=13.5Q_3 = \frac{12\;+\;15}{2} = 13.5
    Equation 9: First and third quartiles

    Therefore Q1=4Q_1=4 and Q3=13.5.Q_3=13.5.

    Therefore, the steps for finding quartiles are:
    • Find the median of the data set.
      • If the set has an odd number of values, then the median is the value in the middle and is equal to the second quartile.
      • If the set has an even number of values, then the median is obtained by averaging the two middle values.

    • If the set had an odd number of values, then the first and third quartile will be the median of the values before the middle value, and the median of the values after the middle value respectively.
      • If the values before and after the middle value are an odd number of values, then their middle values will be the first and third quartiles.
      • If the values before and after the middle value are an even number of values, then the median of each side is obtained by averaging the pair of middle values on each side. These will be the first and third quartile.

    • If the set had an even number of values, the second quartile is calculated by averaging the two middle terms (obtaining the median of the set).
      • Then, the set is divided by a midpoint. The whole first half is used to obtain the first quartile, and the whole second half is used to obtain the third quartile.
      • If the values before and after the midpoint are an odd number of values, then their middle values will be the first and third quartiles.
      • If the values before and after the midpoint are an even number of values, then the median of each side is obtained by averaging the pair of middle values on each side. These will be the first and third quartile.

    The process has been summarized in the next diagram for each type of data you might found:

    Measures of relative standing
    Figure 2: Process to calculate quartiles


    ***

    In summary, the measures of relative standing are those point marks or calculations that allow you to see where a particular data value is within the complete data set (or its proper distribution); the z-score will tell you how many standard deviations is a certain value away from the mean (either above or below it), the percentiles will tell you in which of the 99 points that divide the data set into 100 equal parts is your data point located and even provide you with a rank on how much data is above or below it, and the quartiles will do the same as the percentiles but dividing the data in four equal parts only.

    Now, we recommend you to take a look at the next links so you can continue your independent studies in what you learned today. This lesson covers the most important measure of relative standing: the z-score, this short article contains an explanation of what is percentile rank and how is it different from percentage, and this page talks about other locations in a distribution, where they describe not only quartiles but deciles too! We suggest you to take a look to them so you can see more example problems.

    This is it for the lesson of today, see you in the next one!
    \cdot zxz_x: z-score, a measure of how many standard deviations a data item xx is from the mean.

    population: zx=xμσz_x= \frac{x- \mu}{\sigma}

    sample: zx=xxsz_x= \frac{x- \overline{x}}{s}

    z-score allows comparison of the variation in different populations/samples.

    \cdot Quartiles: values that divide the data set into quarters.

    Q1=Q_1= bottom 25% of data
    Q2=Q_2= Median == bottom 50% of data
    Q3=Q_3= bottom 75% of data

    \cdot InterQuartile Range (IQR): represents the middle 50% of the data set.

    IQR=Q3Q1IQR= Q_3-Q_1

    \cdot Percentiles: indicates what percentage of the data falls below a certain value

    Percentile  of  X=number  of  data  points  less  than  Xtotal  number  of  data  pointsPercentile\;of\;X= \frac{number\;of\;data\;points\;less\;than\;X}{total\;number\;of\;data\;points}

    \cdot Outliers: an outlier is a data point which lies an abnormal distance from all other data points.

    Outliers are either,

    a) above Q3+1.5(IQR) Q_3+1.5(IQR)
    or
    b) below Q11.5(IQR) Q_1- 1.5(IQR)