Suppose I have three dogs with the following attributes:
1 2 3 4 5 |
> data Age Height 1 1 15 2 3 20 3 5 70 |
Covariance
Covariance measures the relationship between two variables. In our case, age and height. Positive covariance means positively correlated, while negative covariance means negatively correlated.
$$Cov_{x,y}=\frac{\sum_{i=1}^n(x_{i}-\mu_{x})(y_{i}-\mu_{y})}{n}\\$$
$$\begin{align*}
&i_{1}=(1-3)(15-17.3)=4.7\\
&i_{2}=(3-3)(20-17.3)=0\\
&i_{3}=(5-3)(17-17.3)=-0.7\\
\end{align*}$$
$$Cov_{x,y}=\frac{4.7+0-0.7}{3}=1.3\\$$
Correlation
The issue with covariance is that it can only tell if variables are positively or negatively correlated. But covariance cannot tell the degree. Correlation has value between -1 and 1 where 1 is perfect positive correlation, and -1 is perfect negative correlation.
$$\begin{align*}
&\rho_{x,y}=\frac{Cov_{x,y}}{\sigma_{x}\sigma_{y}} \\
&\rho_{x,y}=\frac{1.3}{(1.6)(2.1)}\\
&\rho_{x,y}=0.4
\end{align*}$$
Population or Sample
So, will the calculations for these two be different as in SD? The answer is yes. Let’s say I have 5 dogs and would like to use these 3 to represent. Then Covariance and Correlation would be…
Covariance
$$Cov_{x,y}=\frac{\sum_{i=1}^n(x_{i}-\bar{x})(y_{i}-\bar{y})}{n-1}$$
$$\begin{align*}
&i_{1}=(1-3)(15-17.3)=4.7\\
&i_{2}=(3-3)(20-17.3)=0\\
&i_{3}=(5-3)(17-17.3)=-0.7
\end{align*}$$
$$Cov_{x,y}=\frac{4.7+0-0.7}{2}=2$$
Correlation
$$\begin{align*}
&\rho_{x,y}=\frac{Cov_{x,y}}{\sigma_{x}\sigma_{y}}\\
&\rho_{x,y}=\frac{2}{(2.0)(2.5)}\\
&\rho_{x,y}=0.4
\end{align*}$$