samvara: Photo of Modesty Blaise with text "All this and brains as well" (Default)
samvara ([personal profile] samvara) wrote2010-06-16 02:57 pm

Data Analysis #5: Confidence Interval Estimation

Test on previous 4 weeks worth of material, was weirdly enough on exactly what I thought it would be on - knew a lot of it, didn't know some of it. Results next week! We had a discussion about the next assignment and how to write up our project proposals, it's comforting to know we don't have to be able to commit to exactly what goes in, we can develop it further over time.

Big discussion of multiple regression and workshop on how to use PHStat to calculation regression coefficients (Data > Data Analysis > Correlation > Select data), how to produce scatter diagrams (Add-Ins > PHStats > Regression > Simple linear > Data) and how to make them pretty.
  • always labels axis
  • make sure font is readable
  • can pick most appropriate trendline
  • visual explanation of data is always going to be better
Discussion of how to eliminate data that correlates too highly to be useful (alas, this describes most of the SPN_fic data) VIF 'safe' is less than 2, looking for highest R2 but lowest standard deviation

Off to Chapter 8 Confidence Interval Estimation

We like to be able to say that we have a level of confidence in our results i.e. I am 95% confident that the mean is between 40 and 60
  • X = 50 +- 10 (40, 60)
Estimation is the use of a sample statistic to estimate a population parameter. Statistics involves two types of estimates
  • A point estimate is a single value measured from a sample and is used as an estimate of the corresponding population parameter
  • An interval estimate establishes an interval within which is it quite likely that the population parameter lies. The likelihood is expressed by the confidence level (90%, 95%, 99%)
  • takes into consideration variation in sample statistics from sample to sample
  • based on observation from one sample
  • gives information about closeness to unknown population parameters
  • never 100% sure
Confidence interval for mean (std dev unknown)

e.g. A sample of 100 MBA graduates from UWA had a mean agge of 33.5 years. Assuming a populations standard deviation of 3 years, construct a 95% CI for the average age of all MBA graduates in Australia.

...lots of maths...

= 33.5 +- 0.59 (32.91, 34.09)

...and then we ran away