what is meant by a variable |
a characteristic of people or things |
what is the difference between a categorical variable and a quantitative variable |
categorical: gender, eye color qualitative (categories) quantitative: age, height (numerical) |
what is meant by exploratory data analysis |
Exploratory data analysis uses graphs and numerical summaries to describe the variables in a data set and the relations among them. |
what is meant by the distribution of the variable |
The distribution of a variable tells us what values the variable takes and how often it takes these values. |
what two types of charts/graphs are usually most appropriate for categorical data |
bar graph, pie chart |
when describing the overall pattern of a distribution of a quantitative variable what 3 features should you mention |
shape center spread |
what is a simple way to describe the center of a distribution of a quantitative variable |
in a normal distribution the mean is the center. in a skewed it is the median |
how do you describe the spread of a distribution of a quantitative variable |
standard deviation |
informally define an outlier |
data value that is either much smaller or much larger than the rest of the data |
list four graphs that are used for quantitative data |
dot plot, histogram, stem and leaf plot |
what information is lost when you choose a histogram over a dot plot or stem plot |
individual data points |
in statistics what are the most common measures of the center |
mode and mean |
explain how to calculate the mean |
add the data values then divide by the sample size |
explain how to find the median |
put the data values in order from smallest to largest from there find the median if there is an odd number of values if there is an even find the average of the two |
explain why the median is resistant to extreme observations, but the mean is non resistant |
median will always be the middle value. outliers affect the mean |
the mean and the median are close together if the distribution is what? |
normal distribution |
in a skewed distribution which will be farther towards the long tail the mean or the median |
mean |
which measure is most appropriate for a highly skewed distribution the mean or the median? |
median |
what is the definition of the range |
the distance spanned by the entire data set |
explain how to calculate the first quartile q1 and the third quartile q3 |
you find the median of the entire data set, then q1 is the median of the lower half and q3 is the median of the upper half |
what is the interquartile range |
the range of the middle 50% of the data |
explain why it might be better to use the IQR instead of the range to describe the spread of the distribution |
the IQR is not subject to peculiarities of the data set and it is not sensitive to outliers |
what is the IQR based "rule of thumb" |
a potential outliers is a data value that is a distance of more than 1.5 interquartile ranges below the first quartile or above the third quartile IQR=Q3-Q1. QI-1.5(1QR) AND Q3+1.5(IQR) |
what is the five number summary |
minimum, 1st quartile, median, 3rd quartile, maximum |
what type of graph gives the picture of the five number summary |
box plot |
the box in a box plot represents what percentage of the data |
50% |
the middle line of a box plot represents |
the median |
can the value of the mean be identified from a box plot |
no |
what does standard deviation measure |
the measure of the variance or spread |
can the standard deviation ever be negative |
no |
is the standard deviation resistant or nonresistant to extreme observations |
nonresistant, outliers do affect it |
when is it better to use the five number summary versus the mean and standard deviation |
when the distribution is skewed left or skewed right |
the box of box plot contains about half the data |
true |
when a distribution is strongly skewed to the right, the median is less than the mean |
true |
a data set always has a mode |
false |
when a distribution is strongly skewed to the right the 5 number summer is a better measure of the center and spread than the mean and standard deviation |
true |
How can we use IQR to determine outliers? |
An observations is an outlier it if is more than 1.5*IQR above the third quartile of below the first quartile. |
Explain why the median is resistant to extreme observations, but the mean is nonresistant. |
The median is resistant because it is only based on the middle one or two observations of the ordered list. The mean is sensitive to the influence of a few extreme observations. Even if there are no outliers a skewed distribution will pull the mean toward the long tail. |
When does standard deviation equal zero? |
The standard deviation = 0 only when there is no spread. This happens only when all observations have the same value. Otherwise s > 0. As the observations become more spread out about their mean, s gets larger. |
What is the relationship between variance and standard deviation? |
The standard deviation s is the square root of the variance s2. |
How can we use IQR to determine outliers? |
An observations is an outlier it if is more than 1.5*IQR above the third quartile of below the first quartile. |
what does standard deviation measure? How do we calculate it? |
The standard deviation is a measure of spread. It measures spread around the mean and should only be used when the mean is chosen as the measure of center. |
Is standard deviation resistant or nonresistant to extreme observations? Explain. |
s, like the mean, is not resistant. Strong skewness or a few outliers can make s very large. |
stats ch. 1-ch. 5 sec. 2
Share This
Unfinished tasks keep piling up?
Let us complete them for you. Quickly and professionally.
Check Price