Talked a bit about the project - supernatural_fic is so going to be my assignment topic. Lecturer recommends regression analysis... there are NUMBERS EVERYWHERE.
Back to Probability, we can do a Sample Space for the sum of two dice that gives us a pretty good idea of what outcomes are likely.
e.g. Sample Space for Sum of Two Dice
Analysis: Pretty! Symmetrical! 7 is more likely
We can also represent this as a Probability Distribution
Analysis: Pretty! Symmetrical! 7 is more likely
We could also do a Histogram which I shall not draw but have a pretty link to wikipedia which talks about histograms in detail.
Summary Measures:
Binomial Probability Distribution; characteristics of
Normal Distribution is regarded as the most important theoretical distribution on business statistics. It approximates the observed frequency distributions of many natural and physical measurements such as height, weight, sales, IQ, product lifetimes and the variability of human and machine outputs.
We can find the probability of events occurring by looking at the area underneath the bell curve and we have tables that allow us to look them up. The tables only work for a single normal distribution curve so we use z-scores to standardise the data - yes, I got scaled in high school too.
Discussion of how to use tables to look up probabilities.
Pause to work through examples - have very messy scribbled notes for this :)
Assessing Normality:
Off to Chapter 7 Sampling and Sample Distributions p 7-9.
As far as I can make out, if you take a sample, then take the mean of that sample then depending on your sample it can vary a bit. If you take the mean of the sample mean you get a better result. The bigger the sample size the less variation.
How large is large enough?
Back to Probability, we can do a Sample Space for the sum of two dice that gives us a pretty good idea of what outcomes are likely.
e.g. Sample Space for Sum of Two Dice
| D1/D2 | 1 | 2 | 3 | 4 | 5 | 6 |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
We can also represent this as a Probability Distribution
| Sum | P(Sum) |
| 2 | 1/36 |
| 3 | 2/36 |
| 4 | 3/36 |
| 5 | 4/36 |
| 6 | 5/36 |
| 7 | 6/36 |
| 8 | 5/36 |
| 9 | 4/36 |
| 10 | 3/36 |
| 11 | 2/36 |
| 12 | 1/36 |
| Total | 36/36 |
We could also do a Histogram which I shall not draw but have a pretty link to wikipedia which talks about histograms in detail.
Summary Measures:
- Expected value (the mean) which is the weighted average of the probability distribution
- Standard Deviation which is the weighted average of the squared deviations about the mean
- Covariance which is the combined variance of X and Y
- mean represents expected return on investment
- standard deviation is a measure of the associated risk
Binomial Probability Distribution; characteristics of
- 'n' identical trials e.g. 15 tossed of a coin, 10 light bulbs taken from a warehouse
- two mutually exclusive outcomes on each trial e.g. head or tail in each toss of a coin
- trials are independent e.g. what happens previously does not affect next outcome
- constant probability for each trial e.g. probability of getting a tail is the same each time we toss (assumes 'fair' coin)
Normal Distribution is regarded as the most important theoretical distribution on business statistics. It approximates the observed frequency distributions of many natural and physical measurements such as height, weight, sales, IQ, product lifetimes and the variability of human and machine outputs.
We can find the probability of events occurring by looking at the area underneath the bell curve and we have tables that allow us to look them up. The tables only work for a single normal distribution curve so we use z-scores to standardise the data - yes, I got scaled in high school too.
Discussion of how to use tables to look up probabilities.
Pause to work through examples - have very messy scribbled notes for this :)
Assessing Normality:
- Construct charts
- For small datasets, do stem-and-leaf display & box-and-whisker display look symmetric?
- For large datasets does the histogram or polygon appear bell shaped? (FYI supernatural_fic doesn't)
- Compute descriptive summary measures
- Do the mean, median and mode have similar values?
- is the interquartile rages approximately 1.35?
- Is the range approximately 6?
- Observe the distribution of the dataset
- Do approximately 2/3 of the observations lie between 1 standard deviation?
- Do approximately 4/5 of the observations lie between 2 standard deviations?
- Do approximately 19/20 of the observations lie between 3 standard deviations?
Off to Chapter 7 Sampling and Sample Distributions p 7-9.
As far as I can make out, if you take a sample, then take the mean of that sample then depending on your sample it can vary a bit. If you take the mean of the sample mean you get a better result. The bigger the sample size the less variation.
How large is large enough?
- for most distributions, n greater than or equal to 30
- for 'fairly symmetric' distributions, n is greater than or equal to 15
- for normal distribution (sampling distribution of the mean is always normally distributed for all values of n, n is greater than or equal to 1
- is a Categorical variable i.e. gender, voted in last election, pregnant
- ps = X / n - if two outcomes X has a binomial distribution