Tuesday, June 1st, 2010 11:30 pm
I turned up bright and early to bully support A. into working on one of his assignments (I provide many vital services) which lent itself to doing my readings and chatting to other students about our projects. I think I'm going to do mine using supernatural_fic - am running it by lecturer to see if it's a suitable data set.

Quick review of stuff we had to stop on last week
  • InterQuartile Range
  • Variance (population and sample)
  • Standard Deviation (effect of adding N to each data value and effect of multiplying)
  • Z-Scores = (Score - Mean) / Std. Dev. = measures of how far a given data value is from the mean
  • Identifying Outliers - more than 3 standard deviations from the mean (could be measurement errors) - important for quality control
  • Coefficient of variation - measures relative variation and shows variation relative to mean, used to compare two or more sets of data measured in different units.
e.g. Evaluate risk of trading two different currencies relative to the Australian dollar

CurrencyRupeeRiyal
Mean32.503.25
Standard Deviation2.800.35
Coefficient of Variation10%12.5%

Analysis: Even though the Standard Deviation is higher, a Coefficient of Variation of 10% makes the Rupee the less risky currency.

Skewness of Data
  • Right skew (positive): house prices, salaries of major corporations, ages of people in nightclubs
  • Zero skew (normal): height, weight, IQ
  • Left skew (negative): incomes of members of Royal Perth Club (wealthy), age of bowling club patrons
Huge discussion of mean, mode, median and box-and-whisker plots ensures; mostly to do with skew.

Examples of dot-scale diagrams provided plus Frequency Distribution.
Example of calculation of Coefficient of Correlation along with assurances we will not have to use it.
Exercise calculating the number of people you have to have in a room for the chance of at least two sharing a birthday approaches 50% (23)
Pause to work out the odds of an infinite number of monkeys with typewriters randomly producing 'Hamlet'

Ch #4 Probability: the numerical measure of the likelihood that an event will occur.
  • A Priori Classical (Theoretical): from mathematical theory, given a set of assumptions (ace in a deck = 1 in 13)
  • Empirical Classical (Relative Frequency): from observation given collected data (based on records, 31% chance of rain)
  • Subjective: anyone's opinion, perhaps even without data or theory (my gut says chance of nuclear war is 3%)
I remember Alison Parmeter in British Summertime by Paul Cornell making me very happy with her ability to calculate odds.

A Sample Space is a collection of all possible outcomes (heads and tails are the two outcomes for flipping a coin)
  • Simple event: outcome from a sample space with one characteristic (red card from a deck)
  • Joint event: two outcomes simultaneously (a red ace from a deck)
  • Complementary event: for event A, all events on in A (queen of diamonds, all cards that are no queen of diamonds)
  • Mutually exclusive events: two events cannot happen together (queens of diamonds and queen of clubs)
You can draw up tables! You can call them Contingency Tables
 AceNon-AceTotal
Red22426
Black22426
Total44852
  • Probability of an event = number of event outcomes / total number of outcomes (ace = 4/52 or 1/13)
  • Joint probability = number of event outcomes for A and B / total number of outcomes (red and an ace 2/52 = 1/26)
  • Compound probability = number of outcomes from A or B or both / total number of outcomes (red or an ace 28/52 = 7/13)
  • Conditional probability = number of outcomes from A given B / total number of outcomes from B (red given ace 2/4 = 1/2)
Fun exercise practicing using a Beer Drinkers versus Gender table.

Ch #5 Discrete Probability Distributions
  • Random Variable: outcomes of an experiment expressed numerically (throw a dice twice and count the number of times 4 appears). Obtained by counting.
Then we ran away...
Saturday, June 12th, 2010 03:10 am (UTC)
how complicated were the coefficient of variation setups? was it for a single independently sampled variable? (I'm trying to find anything on estimating it when there is related data in the sample, such as running duplicates on the same sample, and nothing seems to talk about it explicitly)