Import data into R from an Excel File
These R statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for R Software. Although there are millions of R users around the world, there is a substantial learning curve involved in mastering the program.These tutorials are an introduction to using R statistical software that could be used in an applied statistics course or as your own self-paced tutorial.
If you have suggestions, or if you encounter errors in any of these tutorials, please contact us.
See www.stattutorials.com/RDATA for files mentioned in these tutorials, © TexaSoft, 2007-11. All rights reserved.
Import Excel Data into R
Creating and distributing data in an Excel file is common, and when you use this data in R, you must import it into R before using it in any analysis. This tutorial describes how to prepare your data in Excel and then how to import it into R.
Preparing an Excel File to Import into R
- Row 1 of your Excel spreadsheet should contain variable names that are compatible with R naming conventions. Variable names should only be on row 1. Do not extend names to row 2. Here are some guidelines: (I’ve made these guidelines general in nature so that the Excel data could also be imported into SAS, SPSS or WINKS.)
- Begin variable names with a letter.
- Keep names short (no more than 64 characters.)
- Do not use blanks or special characters in names (for example, !, ?, ', and *). Underscore is okay.
- Variable names must be unique; duplication is not allowed.
- Do not use typical keywords as names: ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, WITH.
- Case matters in R. NAME is different from Name or name.
- Each row (line) in the Excel spreadsheet (other than the variable row) should represent a single subject or observed entity.
- Avoid blank rows – it will complicate your import and analysis
- If you have missing data, do not leave that cell in Excel blank. Define a missing value code and place that code in any cell that contains missing data. Or use the code NA.
- Always use date variables with 4 digit year formats. That is, enter date in Excel using the format 01/01/2005 and not 01/01/05. Otherwise the old Y2K problem can still be a problem for date calculations.
- For more information, including how to do “best practice) double data entry in Excel, refer to the paper:
Importing an Excel file into R
Use an intermediate “.CSV” file. (Recommended.)
Step 1. Open your data file in Excel. For example, open the file CARSMPG.XLS (download it here). It is shown here in Execl 2007:
Step 2. Save the date as a .CSV (common separated variables) file. By choosing File, Save As…, Excel Workbook. In the “Files of Type” option on the Save dialog box, select the option “Comma delimited” .CSV. Save the file under the name CARSMPG.CSV.
The first few records in this file are as follows
BRAND,MODEL,MINIVAN,WAGON,PICKUP, ... etc...
Note that the data are listed one record per row, with commas separating the data values. The first line contains the names of the varables (appbrviated here.)
Step 3. In R, use the function read.csv() to read the data into a data set. For example, if your data are in the Windows folder C:\MYRDATA, enter the following command at the R > prompt to import the data into an R data set named cars.
>cars <- read.csv(file="C:\\MYRDATA\\CARSMPG.CSV",head=TRUE,sep=",")
This imports the data into the cars dataset. To list the contents of the data set enter
BRAND MODEL MINIVAN WAGON PICKUP
1 TOYOTA Prius 0 0 0
2 HONDA Civic Hybrid 0 0 0
3 HONDA Civic Hybrid 0 0 0
4 HONDA Civic Hybrid 0 0 0
5 HONDA Civic Hybrid 0 0 0
Note: Instead of using the path as a part of the data file name, you can change the default path in R first using
then use the simpler command
>cars <- read.csv(file=”CARSMPG.CSV",head=TRUE,sep=",")
To import the data.
- End of Tutorial -
(c) Alan C. Elliott, 2011