I've been looking forward to this one for some time now, I have a fairly good background in stats and we were pretty stats heavy during our time at LSE looking at econometric data and learning all about variance and standard deviation and sampling and significance etc. It will be interesting to see how CFAI approaches statistics in general and how they decide to apply it to finance. Glancing at the learning objectives, there are a few concepts I've never done in detail, so I should learn some new things - frequency polygon, harmonic mean, Chebysev's Inequality, and kurtosis.
The total stats material is about 90 pages, so we'll see how these go.
Start time - 2:45 pm
Nature of stats
The total stats material is about 90 pages, so we'll see how these go.
Start time - 2:45 pm
Nature of stats
- Can refer to data or to method
- Can be descriptive (summarizing) or inferential (making judgments about a larger group from observations on a smaller group)
- Inferential is based in probability theory
- This first reading is only about descriptive stats meaning it should be fairly simple
- Population - all members of a specified group
- Parameter - any descriptive measure of a population
- Sample - a subset of a population
- Sample Statistic - a quantity computed from or used to describe a sample
Measurement scales, weakest to strongest
- Nominal - categorizes but does not rank
- Ordinal - categorizes and ranks, maybe even by number, but numbers don't speak to relative difference
- Interval - ranks and assures that difference between scale values are equal (i.e. temperature)
- Ratio Scales - same as interval except it also includes a true zero point as the origin (i.e. rates of return and quantities of money)
- Note here - it is really hard to think of other interval examples
Frequency distributions
- Tabular display of data summarized into relatively small intervals
- Sort data ascending, decide on a number of intervals k
- Count number of observations in each interval
- Deciding on a number k requires exercising judgment to make the data useful and summarize the proper amount (ex. when examining distribution of S&P 500 returns from 1926-2002)
- Relative Frequency - absolute frequency of each interval divided by total number of observations
- Cumulative relative frequency - adds up relative frequencies as you go up the data
- Cumulative frequency - tells the number of observations below the upper limit of the given interval
- Knowing detail about the tails can be important
- Histogram - bar chart of data grouped into a frequency distribution
- Frequency Polygon - replace the histogram's bars with points at the midpoint and connect the dots (meh)
- Cumulative Frequency Distribution - tells how many (or what percent) of values lie beneath a given value, i.e. it can use relative or absolute frequency.
- Slope at any point is proportional to the number of observations at that point
Measures of central tendency - Mean
- Mean - sum of observations divided by # of observations
- Population mean - u and big N
- Sample mean - xbar and small n
- Cross-sectional - measuring across units at a specific period in time (e.g. the average ROE of 300 companies) - this is the 'cross sectional mean'
- Time-series - examining one unit over time (e.g. monthly returns over 5 years - take the average and this is called the 'time series mean')
- Properties of a mean
- Can be likened to the center of gravity of an object
- Deviation - Distance from mean to an observation
- Sum of deviations around a mean always equals 0, mathematically
- Deviations indicate risk
- Advantages
- Uses all info about size and magnitude of all observations
- Easy to work with
- Disadvantages
- Sensitive to extreme values
Measures of central tendency - Median
- Median is the middle value in an odd numbered sample, or mean of the two middle samples in an even numbered sample
- Advantages
- Extreme values do not affect it
- Disadvantage
- Does not use all the information available - only focuses on relative position of observations
- More complex to calculate - "less mathematically tractable"
Measures of central tendency - Mode
- Mode - most frequently occurring value in a distribution
- Distribution can have more than one mode or even no mode
- One mode - unimodal
- Two modes - bimodal
- Three - trimodal
- When all are different, no mode - no value occurs more frequently than any other
- When data are grouped e.g. in a histogram the highest freq. interval(s) is(are) the modal interval(s)
Other Means
- Weighted mean
- In context of a portfolio, e.g. weight return by value in stocks (short stocks have negative weight)
- Notion of a constant weighted portfolio - PM adjusts to keep a constant blend of stocks vs. bonds - in this case you can calc returns of stocks and bonds each individually and then weight them
- Weighted mean of forward looking data is called expected value
- Geometric mean
- Mostly used to average growth rates over time or compute growth
- G = nroot(X1*X2*...Xn)
- Only works when product under root sign is positive - all Xs are > 0
- Add 1 to each of the returns - e.g. (1+r) - worst you would ever get is 0, making the geometric mean 0 (makes sense - funds all go to 0 in that year and cannot recover)
- Multiply all of them together, then raise to the 1/n power
- For returns, add 1 to all returns, then subtract 1 at the end
- Geometric mean is always LESS or EQUAL TO arithmetic mean
- only equal when there is no variation in returns - difference increases with variability of observations
- Harmonic Mean
- Special type of weighted mean used only in a few applications
- Take the reciprocal of each observation, sum them, divide by n, and then take reciprocal of that
- Example: Cost averaging - periodic investment of a fixed amount of money
- Ex, you buy $1,000 of a security on month at 10, and the next at 15 - what is average price paid per share?
- (1/10 + 1/15) = (3/30 + 2/30) = 5/30
- Divide by 2 -> 5/60
- Reciprocal -> 60/5 = 12
- 12 is the harmonic mean
- Only works when investing the same amount each time
- Harmonic will be less than or equal to the Geometric (which is less than/equal to arithmetic mean)
Calculating quantiles (aka Fractiles)
- Quartile - 25% lie at or below that number
- Quintile - 20% (1/5), Decile - 10%, Percentile = 1%
- Estimating a given percentile:
- First locate the position of the percentile in the observation
- Next determine (or estimate) the value associated with that position
- Ly = (n + 1) * (y/100)
- This gives the location for a given y percentile
- Might not be a whole number
- Ex. we have 16 observations and are seeking the 75%ile, so Ly = (16 + 1) * (75/100) = 12.75, which is between the 12th and 13th observation
- The .75 on the 12.75 means the observation is 75% of the distance from observation 12 to 13
- We use linear interpolation to find this
- Simply find the distance from observation 12 to observation 13, then multiply that distance by 0.75 and add that to observation 12
Measures of Dispersion
- Dispersion - variability around the central tendency - measure of risk
- Four measures of dispersion
- Range - difference between max and min
- Mean Absolute Deviation - Calculate mean, then distances from mean, and average these
- Population Variance
- Average of the squared absolute deviations
- Standard Deviation
- Square root of the population variance
- At the sample level, things are slightly different
- Variance gets divided by samples size minus 1 (n-1)
- Reflects losing 1 degree of freedom - because you use the sample once to calculate a sample mean, there are only n-1 remaining deviations from the mean
- Different than Population because in Population, you get the true mean, where here you are estimating a mean
- Standard deviation is again the root of the variance (variance = sdev^2)
Semivariance, Semideviation and Related
- Analysts have developed this because investors are only concerned with downside risk (ok...)
- Semivariance - average squared deviation below the mean
- Calc mean and take only observations that are smaller or equal to mean
- Compute sum of squared deviations using only that subsample
- Divide by (n-1)
- Semideviation - sqrt of semivariance
- Slight variant - the 'target semivariance'
- Define a target, i.e. returns below 10% - and use that instead of the mean as your basis
- Calculate variances in relation to the target (not the mean) - include both values above and below the target
- When returns are symmetrical around a mean, semivariance=variance
Chebyshev's Inequality
- Uses standard deviation as a measure of dispersion
- For any distribution with finite variance, the proportion of observations within k standard deviations of the arithmetic mean is at least the proportion = (1 - 1/k^2) for all k>1
- Ex, within 2 sdevs, there must be (1 - 1/(2^2)) = 1 - 1/4 = 75%
- Means that at least 75% must be within 2 standard devs and at least 89% within 3
- This does NOT depend on the data being normally distributed - which is why this is so useful
- Only gives the minimum that must be within the band, says nothing of maximum
Brain is full. Going to take a break.
5:30 pm
About 2.75 hours
No comments:
Post a Comment