Wednesday, May 19th, 2010 10:47 am
Introduction was pleasant, presenter is funny and well spoken, my impression is that he knows his stuff and is looking to make it easy for us to learn it. He was very clear about his expectations relating to attendance, assignments, and the final exam and gave us a lot of ways to contact him. He loses points1 for using 'gestapo' to describe the exam monitors, printing half a tree of handouts, not using the smart-board (prefers white-boards and has a vast pen collection) and for being relentlessly blokey.

The text is Statistics for Managers Using Microsoft Excel and you need an Excel plugin called PHStat which I will muck around with this evening. I look forward to dumping a really big dataset in it and clicking 'Go.'

Why do Managers need to know about Sadistics Statistics?
  • Presenting information
  • Drawing conclusions about information
  • Forecasting information
  • Using information to improve things (like picking out problem areas of productivity that are outside normal deviation and resolving them)
Yes, you're surprised too, aren't you :)

Terminology:
  • A population (universe) is the collection of units under consideration. eg Entire Australian population
  • A sample is a portion of the population selected for analysis. eg MBA students in Australia
  • A parameter is a summary measure computed to describe a characteristic of the sample. eg: Female MBA students in Australia
  • A statistic is a summary measure computed to describe a characteristic of the sample eg: How many Female MBA students in Australia graduate within 3 years
  • A dataset can be both a population and a sample.
Thankfully these terms make sense to me so I'm not going to be frantically trying to translate them from 'natural language' to 'statistics' in my head all the time.

Data sources:
  • Primary
    • Observation - look at it
    • Experimentation - poke it
    • Survey - ask it questions
  • Secondary
    • Print - read it
    • Electronic - read it some more but with less photocopying
Type of Variables:
  • Categorical (qualitative)
    • Nominal (has no logical ranking eg: eye colour)
    • Ordinal (ranked eg: likert scale)
  • Numerical (quantitative)
    • Interval (continuous measure eg: temperature)
    • Ratio (discrete count eg: number of people in a room)
Pause for class exercise to practice identifying variables - potential variables to investigate to be able to predict median house prices in a suburb.

Ways to slice data:
  • Time-Series Data: values recorded in a meaningful sequence such as days, quarters or years.
    • Y (forecast variable) = T x S x C x I
    • T = trend (is the response to some facebook photos linked to age?)
    • S = seasonal (is more fanfiction written on hiatus?)
    • C = cyclical (is porn writing in fandom cycilcal?)
    • I = irregular / random (acts of god - or terrorism)
  • Cross-Sectional Data: data has no meaningful sequence such as sales figures for multiple companies
Discussion of project assignment (2,000 words) We can pick any data set or create it ourselves and can work in pairs or solo. Examples of previous assignment topics include: House buying, bank queue waiting times, wine sales, first week of takings for Johnny Depp movies, Eurovision rankings and bar waiting times - I'm picking out the more amusing ones here.

Sampling Methods:
  • Probability Samples
    • Simple Random (lotto!) - simple to use but may not be a good representation
    • Systematic (grab every 5th person)
    • Stratified (select representatives based on some significant quality of the population eg: gender, nationality, location) - may be time consuming and costly
    • Cluster (select clusters based on their representing larger population) - may be more cost-effective, but less efficient
  • Non-Probability Samples
    • Judgment
    • Quota (must ask 1,000 people!)
    • Chunk
Stuff to review for this week: 1.2, 1.3, 2.1, 7.1 & 7.2, do questions 2.6 and 2.12 pp47 & 7.12 pp 259
Stuff to read for next week: 2.4 - 2.7 & 3.1 - 3.4

Then we ran away.

1Points will be added/subtracted continuously, current score is +3. The purpose of my points systems is to measure first impressions versus final impressions, stay tuned for the 12 week update :p.
Wednesday, May 19th, 2010 09:21 am (UTC)
Yes, you're surprised too, aren't you :)

Shocked.

And possibly stunned.
Wednesday, May 19th, 2010 10:45 am (UTC)
- I think you have misunderstood the term 'parameter'. I'm happy to explain my ideas if you don't think I'm right.
- categorical is not qualitative necessarily, especially if you are talking Likert scales
- ratio - interval data with a meaningful zero, so that the term 'half as many' has real meaning (ie Celsius is not ratio, as 10 degrees isn't half as hot as 20 degrees.
- trend - is there a consistent change over time (as facebook users age, do they put up more photos per month)
- cross sectional data - no meaningful *time based* sequence.

as for data sets - I've been collecting fuelwatch data (as emailed to me daily) for a few years - that would have interesting weekly cycles, possible seasonals, and long-term trends (and deviations that can be linked to world wide events...) if you are in need of something.

also - Excel? eeeewwww. @#$@#% unreliable piece of @@@@@, addin or no addin.

(let me show you this loooovely open source data manipulation program I have over here)
Wednesday, May 19th, 2010 10:47 am (UTC)
hmm, my problem with your definition of parameter may be me being unable to read. I keep reading it, and not actually understanding what you have said, which caused me to come to the conclusion that the presentation was faulty, but an equally likely answer is that I'm tired, and shouldn't be on the internet unsupervised.
Wednesday, May 19th, 2010 01:58 pm (UTC)
for the first three points, I've sent you an email with an attachment.

for the last one, hmm. Because is is *specifically* about time series data, then the term trend refers to the long term change over time. Eg. the price of oil has been trending up over the last 50 years, even though there have been short term drops.

Trend gets used in other contexts as well - age related trends are certainly talked about. Also, when a pattern is seen in the data, but it isn't quite extreme enough to be statistically significant, this is sometimes called a trend in the data.

re supernatural fic output wodging - you are welcome to run it past me to see if you have workable ideas.

as to open source data manipulation programs - see http://cran.r-project.org/
can be run under most systems. Has a learning curve that feels like running into a brick wall repeatedly and then running straight through said brick wall, and in to freedom, but for anyone who has any real coding experience, it should be very quick.

When are you on campus this trimester?