Questions about Measures of Central Tendency
These is info only ill be attaching the questions work CJ 301 Measures of Dispersion/Variability
Think back to the description of measures of central tendency that describes these statistics as measures of how the data in a distribution are clustered, around what summary measure are most of the data points clustered.
But when comes to descriptive statistics and describing the characteristics of a distribution, averages are only half story. The other half is measures of variability.
In the most simple of terms, variability reflects how scores differ from one another. For example, the following set of scores shows some variability:
7, 6, 3, 3, 1
The following set of scores has the same mean (4) and has less variability than the previous set:
3, 4, 4, 5, 4
The next set has no variability at all the scores do not differ from one another but it also has the same mean as the other two sets we just showed you.
4, 4, 4, 4, 4
Variability (also called spread or dispersion) can be thought of as a measure of how different scores are from one another. It is even more accurate (and maybe even easier) to think of variability as how different scores are from one particular score. And what score do you think that might be? Well, instead of comparing each score to every other score in a distribution, the one score that could be used as a comparison is that is right- the mean. So, variability becomes a measure of how much each score in a group of scores differs from the mean.
Remember what you already know about computing averages that an average (whether it is the mean, the median or the mode) is a representative score in a set of scores. Now, add your new knowledge about variability- that it reflects how different scores are from one another. Each is important descriptive statistic. Together, these two (average and variability) can be used to describe the characteristics of a distribution and show how distribution differ from one another.
Measures of dispersion/variability describe how the data in a distribution are scattered or dispersed around, or from, the central point represented by the measure of central tendency.
We will discuss four different measures of dispersion, the range, the mean deviation, the variance, and the standard deviation.
RANGE
The range is a very simple measure of dispersion to calculate and interpret. The range is simply the difference between the highest score and the lowest score in a distribution.
Consider the following distribution that measures the Age of a random sample of eight police officers in a small rural jurisdiction.
Officer X = Age_
41
20
35
25
23
30
21
32
First, lets calculate the mean as our measure of central tendency by adding the individual ages of each officer and dividing by the number of officers. The calculation is 227/8 = 28.375 years.
In general, the formula for the range is:
R=h-l
Where:
r is the range
h is the highest score in the data set
l is the lowest score in the data set
The range of this distribution would be the difference between 41 and 20, or 21 years. The variable age has a range of 21 years. We can say that this sample has a mean age of 28.375 years with a range of 21 years ranging from 20 to 41 years. Although the range was quite easy to calculate using the eyeball technique if there was a sample of 1,000 officers from Phoenix, Arizona the eyeball technique would be more difficult to use and we would resort to the computer to request the range and it would be very quickly and accurately computed and reported to us and we could have a very quick sense of how the data in the variable Age are dispersed.
Take the following set of scores, for example:
98, 86, 77, 56, 48
In this example, 98-48=50. The range is 50. In a set of 500 numbers, where the largest is 98 and the smallest is 37, then the range would be 61.
The range is used almost exclusively to get a very general estimate of how wide or different scores are from one another that is, the range shows how much spread there is from the lowest to the highest point in a distribution.
STANDARD DEVIATION
Now we get to the most frequently used measure of variability, the standard deviation. Just think about what the term implies; its a deviation from something (guess what?) that is standard. Actually, the standard deviation (sd) represents the average amount of variability in a set of scores. In practical terms, its the average distance from the mean. The larger the standard deviation, the larger the average distance each data point is from the mean of the distribution.
LETS LEARN HOW TO CALCULATE THE STANDARD DEVIATION:
1. First, you need to determine the mean. The mean of a list of numbers is the sum of those numbers divided by the quantity of items in the list (read: add all the numbers up and divide by how many there are).
2. Then, subtract the mean from every number to get the list of deviations. Create a list of these numbers. It's OK to get negative numbers here.
3. Next, square the resulting list of numbers (read: multiply them with themselves) to get the squared deviation.
4. Add up all of the resulting squares to get their total sum.
5. To get the standard deviation, Divide your result by one less than the number of items in the list and just take the square root of the resulting number
I know this sounds confusing, but just check out several examples below and practice each of them and you will be able to calculate the standard deviation easily:
Example 1:
Your list of numbers:
X (X Mean) (X- Mean)
1 (1- 7) = - 6 36
3 (3-7) = -4 16
4 (4-7) = -3 9
6 (6-7) = -1 1
9 (9-7) = 2 4
19 (19-7) = 12 144
____________ ___________
= 0 = 210
Explanations below:
1. Mean: (1+3+4+6+9+19) / 6 = 42 / 6 = 7
2. List of deviations: -6, -4, -3, -1, 2, 12
3. Squares of deviations: 36, 16, 9, 1, 4, 144
4. Sum of deviations: 36+16+9+1+4+144 = 210
5. Standard Deviation =
S =? 210 / 5 = ? 42 = 6.48
Explanation: divided by one less than the number of items in the list: 210 / 5 = 42
Square root of this number: square root (42) = about 6.48
LETS GO BACK TO OUR FIRST EXAMPLE (WHEN WE CALCULATED THE RANGE):
Consider the following distribution that measures the Age of a random sample of eight police officers in a small rural jurisdiction.
Officer X = Age_ (X Mean) -Deviation (X- Mean) - Squared Deviation
41 (41 28.375) = 12.625 159.391
2 20 (20 28.375) = -8.375 70.141
3 35 (35 28.375) = 6.625 43.891
4 25 (25 28.375) = -3.375 11.391
5 23 (23 28.375) = -5.375 28.891
6 30 (30 28.375) = 1.625 2.641
7 21 (21 28.375) = -7.375 54.391
8 32 (32 28.375) = 3.625 13.141
? = 227 ? = 0.000 ? = 383.878
Lets calculate the mean as our measure of central tendency by adding the individual ages of each officer and dividing by the number of officers. The calculation is 227/8 = 28.375 year
It is important to note from this calculation that the sum of the deviations of each score from the mean is equal to zero. When doing hand calculations of the mean deviation, variance, and standard deviation this is an excellent place to check your math. If the sum of the deviations of each score from the mean does not equal zero (or a number very, very close to zero in situations when you are rounding decimal places) then you have made a mathematical error either in your subtractions or your calculation of the mean.
Then, subtract the mean from every number to get the list of deviations. Create a list of these numbers. It's OK to get negative numbers here.
Next, square the resulting list of numbers (read: multiply them with themselves) to get the squared deviation.
Add up all of the resulting squares to get their total sum.
To get the standard deviation, Divide your result by one less than the number of items in the list and just take the square root of the resulting number.
It is important for you to note that the last step in the calculation of the variance that I have described requires you to reduce the sample size by 1. This is done because we are using a sample rather than the entire population of officers. If our data is the actual entire population we would not subtract 1 from N. We would simply divide by the size of the entire population.. In this example the eight observations of the age of police officers is a sample from the total population of police officers in this jurisdiction. By subtracting 1 from the sample size (N 1) we are adjusting the final value of the variance (s2) resulting in a value that is larger than if we were divide by N. When using sample data it is better to overstate the measure of dispersion than to understate it.
STANDARD DEVIATION - s
The standard deviation (s) is very simple to calculate.
In our example with the sample of 8 police officers and their Age, the standard deviation is:
S = ?383.878/7 s = ?54.84 s = 7.41years
The standard deviation has another very important advantage over the other measures of dispersion in that we are able to use the standard deviation to estimate the number of variable values within certain areas under the curve representative of those values.
Using the Standard Deviation
I have posted a pdf file under the notes link that outlines the areas falling under/within the normal curve/bell curve (or normal distribution). Please refer to that graphical display for the remainder of this discussion.
Please read:
Our calculated mean is 28.375 years. When using the normal curve/bell curve (or normal distribution) to represent our variable we would place the mean, 28.375 years at the center of the distribution above X bar.
The numbers that correspond to the -1s and +1s are 20.965 (mean standard deviation) and 35.785 (mean + standard deviation) respectively. These numbers are calculated by adding one standard deviation unit (7.41 years) to the mean of 28.375 years and subtracting one standard deviation unit (7.41 years) from the mean of 28.375 years. This represents the range of ages between which we would expect to find approximately 68.26% of the total population of police officers in this jurisdiction. We would expect approximately 34.13% to have an age between 20.965 years and 28.375 years. Similarly, we would expect approximately 34.13% to have an age between 28.375 years and 35.785 years.
The numbers that correspond to the -2s and +2s are 13.555 and 43.195 respectively. These numbers are calculated by adding two standard deviation units (7.41 years x 2 = 14.82 years) to the mean of 28.375 years and subtracting two standard deviation units (7.41 years x 2 = 14.82 years) from the mean of 28.375 years. This represents the range of ages between which we would expect to find approximately 95.44% of the total population of police officers in this jurisdiction. We would expect approximately 47.72% to have an age between 13.55 years and 28.375 years. Similarly, we would expect approximately 47.72% to have an age between 28.375 years and 43.195 years.
The numbers that correspond to the -3s and +3s are 6.145 and 50.605 respectively. These numbers are calculated by adding three standard deviation units (7.41 years x 3 = 22.23 years) to the mean of 28.375 years and subtracting three standard deviation units (7.41 years x 3 = 22.23 years) from the mean of 28.375 years. This represents the range of ages between which we would expect to find approximately 99.74% of the total population of police officers in this jurisdiction. We would expect approximately 49.87% to have an age between 6.145 and 28.375 years. Similarly, we would expect 49.87% to have an age between 28.375 years and 50.605 years.
More examples for practice:
STEP 1: Find the Mean for the distribution.
X
9 Mean= ?X/N
8 = 30/6 = 5
6
4
2
1
?X = 30
STEP 2: Subtract the Mean from each raw score to get the DEVIATION.
X (X- Mean)-Deviation
9 (9-5) +4
8 (8-5) +3
6 (6-5) +1
4 (4-5) -1
2 (2-5) -3
1 (1-5) -4
STEP 3: Square each deviation before adding the SQUARED DEVIATIONS TOGETHER.
X (X- Mean)-Deviation (X- Mean) - Squared Deviation
9 (9-5) +4 16
8 (8-5) +3 9
6 (6-5) +1 1
4 (4-5) -1 1
2 (2-5) -3 9
1 (1-5) -4 16
?(X- Mean) = 52
STEP 4: Divide by N-1 and get the SQUARE ROOT OF THE RESULT FOR THE STANDARD DEVIATION.
S= ?52/5 s= ?10.4 s= 3.22
More examples:The following data represent the number of crime calls at Hot Spots in a year. Calculate and interpret the standard deviation of crime calls at these hot spots.
Hot Spot Number # of Calls (X- Mean)-Deviation (X- Mean) -Squared Deviation
1 2 -19.5 380.25
2 9 -12.5 156.25
3 11 -10.5 110.25
4 13 -8.5 72.25
5 20 -1.5 2.25
6 20 -1.5 2.25
7 20 -1.5 2.25
8 24 2.5 6.25
9 27 5.5 30.25
10 29 7.5 56.25
11 31 9.5 90.25
12 52 30.5 930.25
?=258 ?=0 ?= 1839
Mean = 258/12= 21.5 crime calls
s= ?1839/11 s= ?167.181 s= 12.93 crime calls