Interpretation of a scatter-gram
Definition of simple linear correlation
Interpretation of the Pearson linear
Words and Terms:
Correlation, correlation coefficient
Correlation, linear correlation
Correlation, negative correlation
Correlation, perfect correlation
Correlation, positive correlation
Relation, linear relation
Relation, non-linear relation
Correlation analysis is used as preliminary data analysis before applying more sophisticated methods. Correlation
describes the relation between 2 random variables (bivariate relation) about the same person or object with no prior evidence
of inter-dependence. Correlation indicates only association; the association is not necessarily causative. Correlation analysis
has the objectives of describing the relation between x and y, prediction of y if x is known, prediction of x if y is known,
studying trends, and studying the effect of a third factor on the relation between x and y.
The first step in correlation analysis is to inspect a scatter plot of the data to obtain a visual impression of
the data layout and identify out-liers. Then Pearson’s coefficient of correlation (product moments correlation), r,
is the commonest statistic for linear correlation. It has a complicated formula but can be computed easily by modern computers.
It essentially is a measure of the scatter of the data.
PEARSON'S CORRELATION COEFFICIENT, r
Inspecting a scatter-gram helps interpret the coefficient. The correlation is not interpretable for small samples.
Values of 0.25 - 0.50 indicate a fair degree of association. Values of 0.50 - 0.75 indicate moderate to fair relation. Values
above 0.75 indicate good to excellent relation. Values of r = 0 indicate either no correlation or that the two variables are
related in a non-linear way. In perfect positive correlation, r=1. In perfect negative correlation, r=-1. In cases of no correlation,
r=0. In cases of no correlation with r=0, the scatter-plot is circular. The linear correlation coefficient is not used when
the relation is non-linear, outliers exist, the observations are clustered in 2 or 4 groups, and if one of the variables is fixed in advance.
NON-PARAMETRIC CORRELATION ANALYSISThe Spearman rank correlation coefficient is used for small data sets
for which the Pearson linear correlation coefficient would be invalid.