Suppose {X1, X2, …} is a sequence of i.i.d. random variables with $ E[X_i] = \mu $ and $ Var(X_i) = \sigma^2 $. Let $ S_n = \frac{X_1 + X_2 + .... + X_n}{n}$. Then, $ S_n \to N(\mu, \sigma^2 / n) $
$ P[X = x] = C x^{- \alpha} \quad \quad x > x_{min} $
where,
$ C = (\alpha - 1) x_{min}^{(\alpha - 1)} $ and $ \alpha > 1 $
Q1. What happens to expectation when $\alpha < 2$ ?
Q2. What happens to variance when $ \alpha < 3$ ?
$$ P[ \# \; heads = 80] = p^{80} . (1 - p)^{20} $$
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
p = np.arange(0, 1, 1/2000)
L = (p**8) * ((1-p)**2)
plt.plot(p, L)
plt.show()
L = (p**800) * ((1-p)**200)
plt.plot(p, L)
Wikipedia: https://en.wikipedia.org/wiki/Bias_of_an_estimator
Definition:
Suppose we have a statistical model, parameterized by a real number ''θ'', giving rise to a probability distribution for observed data, $ P_\theta(X) = P(X\mid\theta) $, and a statistic $\hat\theta$ which serves as an estimator of θ based on any observed data $x$. That is, we assume that our data follow some unknown distribution $ P(X\mid\theta) $ (where ''θ'' is a fixed constant that is part of this distribution, but is unknown), and then we construct some estimator $ \hat\theta $ that maps observed data to values that we hope are close to θ. The bias of $ \hat\theta $ relative to $ \theta $ is defined as
$ \operatorname{Bias}_\theta[\,\hat\theta\,] = \operatorname{E}_{X\mid\theta}[\,\hat{\theta}\,]-\theta = \operatorname{E}_{X\mid\theta}[\, \hat\theta - \theta \,], $
where $ \operatorname{E}_{X\mid\theta} $ denotes expected value over the distribution $ P(X\mid\theta) $, i.e. averaging over all possible observations $ x $. The second equation follows since ''θ'' is measurable with respect to the conditional distribution $ P(X\mid\theta) $.
An estimator is said to be '''unbiased''' if its bias is equal to zero for all values of parameter ''θ''.
Let $ \hat{\theta} $ be an estimate for a parameter $ \theta $.
Bias: $ E_{X|\theta}[\hat{\theta} - \theta] = E_{X|\theta}[\hat{\theta}] - \theta$
Variance: $ E_{X|\theta}[(\hat{\theta} - E_{X|\theta}[\hat{\theta}])^2] $
Unbiased estimators: Bias = 0 i.e. $ E_{X|\theta}[\hat{\theta}] = \theta$
Let $ x_1 , x_2 , . . . , x_n $ be a sample from a normal distribution with parameters $ \mu $ and $ \sigma^2 $ . Derive maximum likelihood estimates of $ \mu $ and $ \sigma^2 $.
[Hint: log transform ]
$$ P[X = i] = \theta_{i} \quad \quad (2) $$
where, $ i = 1, 2, 3, 4, 5, 6 $ and $ \sum_{i = 1}^{6} \; \theta_{i} = 1 $
Mathematically efficient way to write (2) is,
$$ P[X = x] = \prod_{i=1}^{6} \theta_{i} ^ {I(i=x)} $$
Q1. What is the support of 2D Gaussian distribution?
Q2. Write the expression for P[X = (x, y)] for a 2D Gaussian distribution with mean $ \mu = (\mu_1, \mu_2) $ and $ \sum = [[a_{11}, a_{12}], [a_{21}, a_{22}]] $.
Q3. What do the terms $ a_{11}, a_{12}, a_{21}, a_{22} $ represent?
Q4. Is the Matrix $ \sum $ symmetric?
Q5. When the Matrix $ \sum $ is diagonal, list some the properties of the distribution.
Q6. Given observations $ x_1, x_2 , .... , x_n$ of an $ N $ dimensional Gaussian distribution with parameters $ \mu $ and $ \sum $. Find the MLE estimates for $ \mu $ and $ \sum $. (First obtain for N = 2 and then generalize)
The formulation.
Probabilistic Interpretation.
Fit a line whose equation is of the form $ \hat{Y} = a + b X $
Minimise $ L = \frac{1}{n} \sum_i d_i^2 = \frac{1}{n} \sum_i (Y_i − \hat{Y}_i )^2 $
where $$ \epsilon \sim N(0, \sigma^2) $$
[Pause after you equate $ \frac{\partial}{\partial \alpha} LL $ and $ \frac{\partial}{\partial \beta}LL$ to $ 0 $ ]
... To be continued in next class