It is a statistical technique for determining the extent to which variations in the values of one variable are associated with variations in values of another.
FOR EXAMPLE, if we found that relatively high values of one variable tended to be associated with relatively high values of another, and also relatively low values tended to occur together, we would say that the variables were closely correlated or associated.
Statisticians have made this notion precise, and have devised methods of measuring the degree of association, the most frequently used of which is the correlation coefficient (or, strictly, the product-moment correlation coefficient), This coefficient measures the degree of association on a scale which varies between ?1 and + I, inclusive. If the sign of the coefficient, measured for a set of pairs of values of two variables is negative, this tells us that relatively high values of one variable tend to be associated with relatively low values of the other, and vice versa; i.e. there is an inverse association. If the sign of the coefficient is positive, this tells us that relatively high values of both variables tend to occur together, as do relatively low values (throughout this explanation we are using 'relatively high' and 'relatively low' in the sense of 'above average' and 'below average' respectively).
The actual value of the number tells us how strong the association is. Thus, a value closeto + 1 tells us that relatively high values of one variable are very often associated with relatively high values of the other, and similarly for relatively low values. A value close to ?1 tells us that relatively high values of one variable are very often associated with relatively low values of the other, and vice versa. On the other hand, a value close to zero, whether positive or negative, indicates that relatively high values of one variable are just about as often associated with relatively high as relatively low values of the other. Thus, stronger and stronger degrees of association are indicated as the coefficient varies from zero to ± 1.
The usefulness of correlation analysis lies in testing hypotheses about the relationships between variables. Thus, we could assert the following hypotheses: (a) the higher is household income, the higher will be household expenditure; (b) the higher the rate of interest, the lower the level of business investment; (c) the greater the rate of cigarette smoking, the greater the incidence of lung cancer; and (d) the larger the size of the family, the shorter the duration of each child?s full-time education (given the statutory minimum). These hypotheses could be tested by measuring values of the variables for households; years; groups of smokers (classified by consumption) and non-smokers; and families, respectively, and then by calculating the correlation coefficients. These would show us how closely the variables were associated in practice, and hence how confident we could be that the hypotheses were correct (or, at least, not clearly wrong).
Statisticians stress several limitations of correlation analysis in terms of the correlation coefficient here described, the most important of which is that the correlation coefficient does not itself prove anything about causation; it is possible for values of variables to be associated without there being a causal connection flowing from one variable to another. One reason for this may be that both variables are in fact determined by some third variable: changes in values of the latter cause changes in the former to be associated, without there being any causal relationship between them.
An important special case of this is where time is the third variable: two variables may have strong time-trends which lead to their being highly correlated without there necessarily being a causal relation. Alternatively, a high correlation may arise for purely chance reasons, as, for example, the well-known high correlations between the number of storks nesting in Scandinavia and the birth rate in London. Thus. correlation does not prove causation, and we are invariably thrown back on theoretical arguments for interpretation of the facts.