Describing and Examining Measurement (Quantitative) data using R
These R statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for R Software. Although there are millions of R users around the world, there is a substantial learning curve involved in mastering the program.These tutorials are an introduction to using R statistical software that could be used in an applied statistics course or as your own self-paced tutorial.
If you have suggestions, or if you encounter errors in any of these tutorials, please contact us.
See www.stattutorials.com/RDATA for files mentioned in these tutorials, © TexaSoft, 2007-11. All rights reserved.
Describing and Examining Measurement data in R (Means, Standard Deviations, etc)
To describe and evaluate numbers that are measurement values such as weight, height, and volume, we usually look at means (average), data spread, and the shape of the data distribution. This tutorial shows methods in R to describe and evaluate this type of data.
The most common method of summarizing measurement data is with descriptive statistics and graphs. Even if you’re planning to analyze your data using a statistical technique such as a t-test, analysis of variance, or logistic regression you should always begin by examining your data. This preliminary step helps you determine which statistical analysis technique should be used to answer your research questions. This tutorial covers these procedures
- Calculating mean, standard deviation and other descriptive statistics
- Creating a graph (histogram) of the data to visually examine its distribution
- Performing a statistical test to assess normality
- Examining data by group.
Descriptive Statistics in R Part 1
Calculating mean, standard deviation and other descriptive statistics
(This tutorial uses the raw data file CARSMPG.CSV. Download this file here. The program assumes the data file is in the folder C:\RDATA. Make changes to the R program if you save the file to a different folder.) See also Import Data into R from Excel
The following code creates an object named cars, then uses the summary function to produce summary statistics.
>cars<-read.csv(file="C:\\RDATA\\CARSMPG.CSV",head=TRUE,sep=",")
>summary(cars)
Produces the following output (of all variables in the data set):

(Note that there is a missing value in the CYLINDERS field (-1). That will be handled in a later tutorial.)
To calculate statistics for the variables CITYMPG, HWYMPG and ENGINESIZE only, you can create a new object called vars that contain those variable names, then use the following code.
Note: The attach() function allows you to reference variables in cars (in the subsequent cbind function.) The cbind function is used to define a list of variables (as the object vars.)
> attach(cars)
> vars<-cbind(CITYMPG, HWYMPG, ENGINESIZE)
> summary(vars)
CITYMPG HWYMPG ENGINESIZE
Min. :10.00 Min. :13.00 Min. :1.300
1st Qu.:16.00 1st Qu.:22.00 1st Qu.:2.400
Median :19.00 Median :26.00 Median :3.000
Mean :19.29 Mean :25.67 Mean :3.307
3rd Qu.:21.00 3rd Qu.:29.00 3rd Qu.:4.200
Max. :60.00 Max. :51.00 Max. :8.300
To limit the number of decimal places in the output, use
> options(digits=2)
> summary(vars)
CITYMPG HWYMPG ENGINESIZE
Min. :10.0 Min. :13.0 Min. :1.30
1st Qu.:16.0 1st Qu.:22.0 1st Qu.:2.40
Median :19.0 Median :26.0 Median :3.00
Mean :19.3 Mean :25.7 Mean :3.31
3rd Qu.:21.0 3rd Qu.:29.0 3rd Qu.:4.20
Max. :60.0 Max. :51.0 Max. :8.30
>
There are a number of other packages available in R. The following example uses the “describe” function in the psych package to calculate descriptive statistics. Click here for more information about the psych package.
If you have not already installed the psych package, do so with this command:
> install.packages("psych")
--- Messages from R abbreviated here ---
package 'psych' successfully unpacked and MD5 sums checked
Use the library function to access the psyche library, then you can use the describe function to produce descriptive for the variables in the vars object (from the cars dataset). Note that the previously specified digits=2 option is still in effect.
> library(psych)
> describe(vars)

>
(Note that the default “psych” descriptive function includes “n” whereas the summary function does not.)
Options for the describe function are
describe(x, na.rm = TRUE, interp=FALSE,skew = TRUE, ranges = TRUE,trim=.1)
Arguments
x A data frame or matrix
na.rm The default is to delete missing data. na.rm=FALSE will delete the case.
interp Should the median be standard or interpolated
skew Should the skew and kurtosis be calculated?
ranges Should the range be calculated?
trim trim=.1 – trim means by dropping the top and bottom trim fraction
Thus use the following command to leave out skew and kurtosis, and produce a more compact output listing.
>describe(vars,skew=FALSE)

- End of Tutorial -
coming soon...
Part 2: Creating a graph (histogram) of the data to visually examine its distribution
Part 3: Performing a statistical test to assess normality
Part 4: Examining data by group.
(c) Alan C. Elliott, 2011


