samvara

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Introduction was pleasant, presenter is funny and well spoken, my impression is that he knows his stuff and is looking to make it easy for us to learn it. He was very clear about his expectations relating to attendance, assignments, and the final exam and gave us a lot of ways to contact him. He loses points¹ for using 'gestapo' to describe the exam monitors, printing half a tree of handouts, not using the smart-board (prefers white-boards and has a vast pen collection) and for being relentlessly blokey.

The text is Statistics for Managers Using Microsoft Excel and you need an Excel plugin called PHStat which I will muck around with this evening. I look forward to dumping a really big dataset in it and clicking 'Go.'

Why do Managers need to know about ~~Sadistics~~ Statistics?

Presenting information
Drawing conclusions about information
Forecasting information
Using information to improve things (like picking out problem areas of productivity that are outside normal deviation and resolving them)

Yes, you're surprised too, aren't you :)

Terminology:

A population (universe) is the collection of units under consideration. eg Entire Australian population
A sample is a portion of the population selected for analysis. eg MBA students in Australia
A parameter is a summary measure computed to describe a characteristic of the sample. eg: Female MBA students in Australia
A statistic is a summary measure computed to describe a characteristic of the sample eg: How many Female MBA students in Australia graduate within 3 years
A dataset can be both a population and a sample.

Thankfully these terms make sense to me so I'm not going to be frantically trying to translate them from 'natural language' to 'statistics' in my head all the time.

Data sources:

Primary
- Observation - look at it
- Experimentation - poke it
- Survey - ask it questions
Secondary
- Print - read it
- Electronic - read it some more but with less photocopying

Type of Variables:

Categorical (qualitative)
- Nominal (has no logical ranking eg: eye colour)
- Ordinal (ranked eg: likert scale)
Numerical (quantitative)
- Interval (continuous measure eg: temperature)
- Ratio (discrete count eg: number of people in a room)

Pause for class exercise to practice identifying variables - potential variables to investigate to be able to predict median house prices in a suburb.

Ways to slice data:

Time-Series Data: values recorded in a meaningful sequence such as days, quarters or years.
- Y (forecast variable) = T x S x C x I
- T = trend (is the response to some facebook photos linked to age?)
- S = seasonal (is more fanfiction written on hiatus?)
- C = cyclical (is porn writing in fandom cycilcal?)
- I = irregular / random (acts of god - or terrorism)
Cross-Sectional Data: data has no meaningful sequence such as sales figures for multiple companies

Discussion of project assignment (2,000 words) We can pick any data set or create it ourselves and can work in pairs or solo. Examples of previous assignment topics include: House buying, bank queue waiting times, wine sales, first week of takings for Johnny Depp movies, Eurovision rankings and bar waiting times - I'm picking out the more amusing ones here.

Sampling Methods:

Probability Samples
- Simple Random (lotto!) - simple to use but may not be a good representation
- Systematic (grab every 5th person)
- Stratified (select representatives based on some significant quality of the population eg: gender, nationality, location) - may be time consuming and costly
- Cluster (select clusters based on their representing larger population) - may be more cost-effective, but less efficient
Non-Probability Samples
- Judgment
- Quota (must ask 1,000 people!)
- Chunk

Stuff to review for this week: 1.2, 1.3, 2.1, 7.1 & 7.2, do questions 2.6 and 2.12 pp47 & 7.12 pp 259
Stuff to read for next week: 2.4 - 2.7 & 3.1 - 3.4

Then we ran away.

¹Points will be added/subtracted continuously, current score is +3. The purpose of my points systems is to measure first impressions versus final impressions, stay tuned for the 12 week update :p.

Threaded | Top-Level Comments Only

no subject

ironed_orchid

Wednesday, May 19th, 2010 09:21 am (UTC)

Yes, you're surprised too, aren't you :)

Shocked.

And possibly stunned.

samvara

Wednesday, May 19th, 2010 09:27 am (UTC)

*grins at you*

*offer smelling salts*

quibbling

fred_mouse

Wednesday, May 19th, 2010 10:45 am (UTC)

- I think you have misunderstood the term 'parameter'. I'm happy to explain my ideas if you don't think I'm right.
- categorical is not qualitative necessarily, especially if you are talking Likert scales
- ratio - interval data with a meaningful zero, so that the term 'half as many' has real meaning (ie Celsius is not ratio, as 10 degrees isn't half as hot as 20 degrees.
- trend - is there a consistent change over time (as facebook users age, do they put up more photos per month)
- cross sectional data - no meaningful *time based* sequence.

as for data sets - I've been collecting fuelwatch data (as emailed to me daily) for a few years - that would have interesting weekly cycles, possible seasonals, and long-term trends (and deviations that can be linked to world wide events...) if you are in need of something.

also - Excel? eeeewwww. @#$@#% unreliable piece of @@@@@, addin or no addin.

(let me show you this loooovely open source data manipulation program I have over here)

Re: quibbling

Wednesday, May 19th, 2010 10:47 am (UTC)

hmm, my problem with your definition of parameter may be me being unable to read. I keep reading it, and not actually understanding what you have said, which caused me to come to the conclusion that the presentation was faulty, but an equally likely answer is that I'm tired, and shouldn't be on the internet unsupervised.

Wednesday, May 19th, 2010 11:35 am (UTC)

*hugs*

Thank you for quibbling at me!

My interpretation of parameter is that it could be gender/age/nationality/income/distance from the beach etc. I have probably expressed it badly by putting an example of a gender rather than saying gender - or have I got the wrong end of this stick?
categorical is not qualitative necessarily... Likert scales measure the level to which you agree with something don't they? I notice wikipedia says they are regarded as ordered-categorical or interval-level which is interesting. Lecturer presented them as ordered-categorical.
ratio - interval data with a meaningful zero *sweats* that's a lot more complex than what I was given - so a ratio is a continuous scale for which zero is the lowest value?
trends I was thinking of responses per month to different ages groups - does that apply?

Datasets - I'm trying to think how to wodge the output from Supernatural_fic in to get something meaningful out...

You can totally show me loooovely open source data manipulation programs ;)

Edited 2010-05-19 11:40 am (UTC)

Wednesday, May 19th, 2010 01:58 pm (UTC)

for the first three points, I've sent you an email with an attachment.

for the last one, hmm. Because is is *specifically* about time series data, then the term trend refers to the long term change over time. Eg. the price of oil has been trending up over the last 50 years, even though there have been short term drops.

Trend gets used in other contexts as well - age related trends are certainly talked about. Also, when a pattern is seen in the data, but it isn't quite extreme enough to be statistically significant, this is sometimes called a trend in the data.

re supernatural fic output wodging - you are welcome to run it past me to see if you have workable ideas.

as to open source data manipulation programs - see http://cran.r-project.org/
can be run under most systems. Has a learning curve that feels like running into a brick wall repeatedly and then running straight through said brick wall, and in to freedom, but for anyone who has any real coding experience, it should be very quick.

When are you on campus this trimester?

Wednesday, May 19th, 2010 03:13 pm (UTC)

I luuuuuuuurve you!

I'm at uni 6-9 on Tue/Fri and can be there earlier if there is the possibility of company :)

Cogito nimium, ergo sum dementis

October 2025

Links

Page Summary

Most Popular Tags

Active Entries

Expand Cut Tags

Data Analysis #1: Data Collection and Sampling

no subject

no subject

quibbling

Re: quibbling

Re: quibbling

Re: quibbling

Re: quibbling

Style Credit