In this page we will look at concepts about Pearson Correlation. Pearson correlation a way to estimate the strength of the Linear correlation between two variables of which one is dependent and the other is independent. Let us start by looking at a scatter plot.
What do we see in the above plot?
- There are points that have been plotted that are green in color. The random data set has been generated in Mathematica
- There is a circle of radius 14 units. That is just for the reference.
- There are reference X and Y axes.
- There are numeric values of correlation and covariance being shown in the graph. They are there on the top.
Our goal in this article is to observe what happens to the covariance and the correlation when the data points are manipulated. Let us start our observations.
Observation 01: Changing the scatter of the lines. Here I have included in the program, a way to control the scatter of the data points. We will observe that Pearson correlation increases but the covariance stays more or less that same when the points’ positioning as a collective is changed.
Observation 02: We will now keep the level of scatter about the same and see what happens to the correlation when we change the slope or the angle of the points with respect to the axes.
The following set of images will show the prearson correlation and covariance with various levels of scattered data. We will start with the high scatter and slowly go to the lower levels scattering
From the series of images, we see that no matter what the direction in which the progression of the points is, ther correlation is NOT DEPENDENT on the slope of the points and is instead dependent on the strength of variation of one variable (dependent variable) with respect to the another variable (Independent Variable).
Like I have mentioned before, the goal of this article is observation and inference. This fact is not so apparent when there is too much variation in the data. It will become apparent when the data is closer (less scattered).
The following are two conditions where the correlation should be zero (If the data is perfectly correlated which is the ideal case).
Please let me know.