My CFA Journal: Quantitative Methods - Statistical Concepts and Market Returns

I've been looking forward to this one for some time now, I have a fairly good background in stats and we were pretty stats heavy during our time at LSE looking at econometric data and learning all about variance and standard deviation and sampling and significance etc. It will be interesting to see how CFAI approaches statistics in general and how they decide to apply it to finance. Glancing at the learning objectives, there are a few concepts I've never done in detail, so I should learn some new things - frequency polygon, harmonic mean, Chebysev's Inequality, and kurtosis.

The total stats material is about 90 pages, so we'll see how these go.

Start time - 2:45 pm

Nature of stats

Can refer to data or to method
Can be descriptive (summarizing) or inferential (making judgments about a larger group from observations on a smaller group)

Inferential is based in probability theory

This first reading is only about descriptive stats meaning it should be fairly simple

Populations and Samples

Population - all members of a specified group

Parameter - any descriptive measure of a population

Sample - a subset of a population

Sample Statistic - a quantity computed from or used to describe a sample

Measurement scales, weakest to strongest

Nominal - categorizes but does not rank
Ordinal - categorizes and ranks, maybe even by number, but numbers don't speak to relative difference
Interval - ranks and assures that difference between scale values are equal (i.e. temperature)
Ratio Scales - same as interval except it also includes a true zero point as the origin (i.e. rates of return and quantities of money)

Note here - it is really hard to think of other interval examples

Frequency distributions

Tabular display of data summarized into relatively small intervals

Sort data ascending, decide on a number of intervals k
Count number of observations in each interval
Deciding on a number k requires exercising judgment to make the data useful and summarize the proper amount (ex. when examining distribution of S&P 500 returns from 1926-2002)

Relative Frequency - absolute frequency of each interval divided by total number of observations

Cumulative relative frequency - adds up relative frequencies as you go up the data
Cumulative frequency - tells the number of observations below the upper limit of the given interval

Knowing detail about the tails can be important

Graphic presentation

Histogram - bar chart of data grouped into a frequency distribution

Frequency Polygon - replace the histogram's bars with points at the midpoint and connect the dots (meh)

Cumulative Frequency Distribution - tells how many (or what percent) of values lie beneath a given value, i.e. it can use relative or absolute frequency.

Slope at any point is proportional to the number of observations at that point

Measures of central tendency - Mean

Mean - sum of observations divided by # of observations

Population mean - u and big N
Sample mean - xbar and small n

Cross-sectional - measuring across units at a specific period in time (e.g. the average ROE of 300 companies) - this is the 'cross sectional mean'
Time-series - examining one unit over time (e.g. monthly returns over 5 years - take the average and this is called the 'time series mean')
Properties of a mean

Can be likened to the center of gravity of an object

Deviation - Distance from mean to an observation

Sum of deviations around a mean always equals 0, mathematically
Deviations indicate risk

Advantages

Uses all info about size and magnitude of all observations
Easy to work with

Disadvantages

Sensitive to extreme values

Measures of central tendency - Median

Median is the middle value in an odd numbered sample, or mean of the two middle samples in an even numbered sample
Advantages

Extreme values do not affect it

Disadvantage

Does not use all the information available - only focuses on relative position of observations
More complex to calculate - "less mathematically tractable"

Measures of central tendency - Mode

Mode - most frequently occurring value in a distribution
Distribution can have more than one mode or even no mode

One mode - unimodal
Two modes - bimodal
Three - trimodal
When all are different, no mode - no value occurs more frequently than any other

When data are grouped e.g. in a histogram the highest freq. interval(s) is(are) the modal interval(s)

Other Means

Weighted mean

In context of a portfolio, e.g. weight return by value in stocks (short stocks have negative weight)
Notion of a constant weighted portfolio - PM adjusts to keep a constant blend of stocks vs. bonds - in this case you can calc returns of stocks and bonds each individually and then weight them
Weighted mean of forward looking data is called expected value

Geometric mean

Mostly used to average growth rates over time or compute growth
G = nroot(X1*X2*...Xn)
Only works when product under root sign is positive - all Xs are > 0
Add 1 to each of the returns - e.g. (1+r) - worst you would ever get is 0, making the geometric mean 0 (makes sense - funds all go to 0 in that year and cannot recover)
Multiply all of them together, then raise to the 1/n power

For returns, add 1 to all returns, then subtract 1 at the end

Geometric mean is always LESS or EQUAL TO arithmetic mean

only equal when there is no variation in returns - difference increases with variability of observations

Harmonic Mean

Special type of weighted mean used only in a few applications
Take the reciprocal of each observation, sum them, divide by n, and then take reciprocal of that
Example: Cost averaging - periodic investment of a fixed amount of money

Ex, you buy $1,000 of a security on month at 10, and the next at 15 - what is average price paid per share?
(1/10 + 1/15) = (3/30 + 2/30) = 5/30
Divide by 2 -> 5/60
Reciprocal -> 60/5 = 12
12 is the harmonic mean
Only works when investing the same amount each time

Harmonic will be less than or equal to the Geometric (which is less than/equal to arithmetic mean)

Calculating quantiles (aka Fractiles)

Quartile - 25% lie at or below that number
Quintile - 20% (1/5), Decile - 10%, Percentile = 1%
Estimating a given percentile:

First locate the position of the percentile in the observation
Next determine (or estimate) the value associated with that position
Ly = (n + 1) * (y/100)

This gives the location for a given y percentile
Might not be a whole number

Ex. we have 16 observations and are seeking the 75%ile, so Ly = (16 + 1) * (75/100) = 12.75, which is between the 12th and 13th observation

The .75 on the 12.75 means the observation is 75% of the distance from observation 12 to 13

We use linear interpolation to find this

Simply find the distance from observation 12 to observation 13, then multiply that distance by 0.75 and add that to observation 12

Measures of Dispersion

Dispersion - variability around the central tendency - measure of risk

Four measures of dispersion

Range - difference between max and min
Mean Absolute Deviation - Calculate mean, then distances from mean, and average these
Population Variance

Average of the squared absolute deviations

Standard Deviation

Square root of the population variance

At the sample level, things are slightly different

Variance gets divided by samples size minus 1 (n-1)
Reflects losing 1 degree of freedom - because you use the sample once to calculate a sample mean, there are only n-1 remaining deviations from the mean

Different than Population because in Population, you get the true mean, where here you are estimating a mean

Standard deviation is again the root of the variance (variance = sdev^2)

Semivariance, Semideviation and Related

Analysts have developed this because investors are only concerned with downside risk (ok...)
Semivariance - average squared deviation below the mean

Calc mean and take only observations that are smaller or equal to mean
Compute sum of squared deviations using only that subsample
Divide by (n-1)

Semideviation - sqrt of semivariance
Slight variant - the 'target semivariance'

Define a target, i.e. returns below 10% - and use that instead of the mean as your basis
Calculate variances in relation to the target (not the mean) - include both values above and below the target

When returns are symmetrical around a mean, semivariance=variance

Chebyshev's Inequality

Uses standard deviation as a measure of dispersion
For any distribution with finite variance, the proportion of observations within k standard deviations of the arithmetic mean is at least the proportion = (1 - 1/k^2) for all k>1

Ex, within 2 sdevs, there must be (1 - 1/(2^2)) = 1 - 1/4 = 75%

Means that at least 75% must be within 2 standard devs and at least 89% within 3

This does NOT depend on the data being normally distributed - which is why this is so useful

Only gives the minimum that must be within the band, says nothing of maximum

Brain is full. Going to take a break.

5:30 pm

About 2.75 hours

My CFA Journal

Wednesday, September 12, 2012

Quantitative Methods - Statistical Concepts and Market Returns

No comments:

Post a Comment