New -- Kwikstat
Generate data sets with continuous, categorical, grouped and correlated data. Great for creating data sets for examples and for quality assurance testing. Data sets are created in standard .CSV format so they can be opened in most statistical programs such as SAS, SPSS, WINKS and into spreadsheet programs such as Microsoft Excel. Order now.
Calculate Pearson's Correlation
Using Microsoft Excel
These tutorials briefly explain the use and interpretation of standard statistical analysis techniques. The examples include how-to instructions for Excel. Although there are different version of Excel in use, these should work about the same for most recent versions. They also assume that you have installed the Excel Analysis Pak which is free and comes with Excel (Go to Tools, Addins... if it is not already installed in your version of Excel.)
See www.stattutorials.com/EXCELDATA for files mentioned in this tutorial. © TexaSoft, 2008
Correlation Analysis using Excel
The correlation coefficient allows researchers to determine if there is a possible linear relationship between two variables measured on the same subject (or entity). When these two variables are of a continuous nature (they are measurements such as weight, height, length, etc.) the measure of association most often used is Pearson’s correlation coefficient.
This association may be expressed as a number (the correlation coefficient) that ranges from –1 to +1. The population correlation is usually expressed as the Greek letter rho (r) and the sample statistic (correlation coefficient) is r.
The correlation measures how well a straight line fits through a scatter of points when plotted on an x – y axis. If the correlation is positive, it means that when one variable increases, the other tends to increase. If the correlation is negative, it means that when one variable increases, the other tends to decrease. When a correlation coefficient is close to +1 (or –1), it means that there is a strong correlation – the points are scattered along a straight line. For example, a correlation r = 0.7 may be considered strong. However, the closer a correlation coefficient gets to 0, the weaker the relationship, where the cloud (scatter) of points is not close to a straight line. For example, a correlation r = 0.1 might be considered weak. For scientific purposes, a t-test is utilized to determine if the correlation coefficient is “strong” or “significant” or not. This will be discussed later.
Assumptions: Before using the Pearson correlation coefficient as a measure of association, you should be aware of its assumptions and limitations. As mentioned earlier, this correlation coefficient measures a linear relationship. That is, the relationship between the two variables measures how close the two measurements form a straight line when plotted on an x-y chart. Therefore, it is important that data be graphed before the correlation is interpreted. For example, it is possible that data, when plotted, may show a curved relationship instead of a straight line. When this is the case, a Pearson correlation may not be the best measure of association. There are other conditions when a correlation coefficient may appear important, but when considered in light of a graph, is not a good measure of relationship. In the following graphs, all of them have a correlation coefficient of about 0.72, yet most do not fit the assumption of a linear relationship. To avoid misinterpreting a correlation, always accompany the calculation with a graph.
Another assumption of correlation is that the both of the variables (the measurements) be of continuous data measured on an interval/ratio scale. Data that are not continuous, such as categorical (i.e. hair color) or binomial (i.e., gender) data would not be acceptable. Also, each variable should be approximately normally distributed.
For this example, we’ll look at the data set called EXAMPLE.XLS. The first few records are shown here:
Suppose you want to want to find the Pearson’s correlation between the variables TIME1 and TIME2.
Note: Example 1 requires that you’ve installed the Analysis Toolpak (Tools/Addins/Analysis Toolpak)
Step1: To make this calculation select Tools/Data Analysis/Correlation… The following dialog box is displayed:
Step 2: In the input range textbox enter the range of the data (include the first row containing the variable name) or click on the data selection icon and mark the range to use.
Step 3: Notice that the “Labels in First Row” checkbox is checked.
Step 4: Click on OK and the following information will appear in a new worksheet:
The Pearson’s correlation for these two variables is 0.764 (rounded.)
A second way to calculate the correlation is with a function.
Step1: In the Example worksheet, enter some labels in column I to indicate that you are calculating a correlation.
Step 2: In the J3 (or wherever you want it) cell, you will enter an Excel function that will calculate the desired correlation.
Step 3: Enter the formula
Note that it is of the form, =CORREL(array1,array2)
Where the first array and second array contain the paired numbers to correlate. It is IMPORTANT that the numbers be paired correctly.)
The answer will appear in the cell. In this case, the Pearson’s correlation is 0.764 (rounded.)
Example 3: Visualize the relationship between the two variable by displaying a scatterplot.
Step 1: Select Insert/Chart (or click chart icon)
Step 2: From the Chart Wizard step 1, select XY(Scatter) and select the top chart sub-type. Click Next.
Step 3: Indicate the data range as =EXAMPLE!$C$1:$D$51 (either type it in or use the data selector to highlight the values in the spreadsheet – include the column names). Indicate that the data are in columns as shown here. Click Next.
Step 4: Indicate an appropriate chart title and axis labels. Click Next.
Step 5: Indicate to either place the chart in the same worksheet or in another. The initial scatterplot looks like this:
Step 6: Click on the axis labels for both the x and y axis and adjust the scale to display the data better. For example, by adjusting the axes to begin at 15 instead of 0, the plot becomes:
The scatterplot helps you understand the correlation and to determine if it is indeed a linear relationship or something else.
End of tutorial
For more information... we recommend:
- WINKS -- a simple to use and affordable statistical software program that will help you analyze, interpret and write-up your results. Download a free trial copy. WINKS reads Excel files, so you can read your Excel data into WINKS and do many more types of analysis than in Excel alone.
- SAS Essentials: Mastering SAS for Research - provides an introduction to SAS statistical software, the premiere statistical data analysis tool for scientific research. Through its straightforward approach, the text presents SAS with step-by-step examples. SAS Essentials introduces a step-by-step approach to mastering SAS software for statistical data analysis. It's also a valuable reference tool for any researcher currently using SAS. Designed for those new to SAS and filled with illustrative examples, the book shows how to read, write and import data; prepare data for analysis; use SAS procedures; evaluate quantitative data; analyze counts and crosstabulation tables; and compare means using the t-test. The book also provides instruction and examples on analysis of variance, correlation and regression, nonparametric analysis, logistic regression, creating graphs, controlling outputs using ODS, as well as advanced topics in SAS programming.ISBN: 0470461292. Order from publisher -Jossey-Bass/Wiley. Barnes & Noble. Amazon.
- Against All Odds VIDEOS - Now in DVD format -- Teaching Videos from Annenberg/PBS --"This highly engaging primer on statistical methods and inference introduces the practical applications of statistics. Produced by the Consortium for Mathematics and Its Applications and Chedd-Angier." Click here for info
- BeSmartNotes Reference sheets for SAS, SAS ODS, SAS Functions, SPSS and WINKS - Click here for info. (www.besmartnotes.com)
- Statistical Analysis Quick Reference Guidebook: With SPSS Examples is a practical "cut to the chase" handbook that quickly explains the when, where, and how of statistical data analysis as it is used for real-world decision-making in a wide variety of disciplines. It contains examples using SPSS Statistics software. In this one-stop reference, the authors provide succinct guidelines for performing an analysis, avoiding pitfalls, interpreting results, and reporting outcomes. Paperback. Sage Publishers ISBN: 1412925606 Order book from Amazon
© Copyright TexaSoft, 1996-2008