Graphing Measurement (Quantitative) data using R - Histograms
These R statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for R Software. Although there are millions of R users around the world, there is a substantial learning curve involved in mastering the program.These tutorials are an introduction to using R statistical software that could be used in an applied statistics course or as your own self-paced tutorial.
If you have suggestions, or if you encounter errors in any of these tutorials, please contact us.
See www.stattutorials.com/RDATA for files mentioned in these tutorials, © TexaSoft, 2007-11. All rights reserved.
Basic Histograms in R
This tutorial illustrates some of the ways you can create histograms from numeric data, to describe the distribution of a set of data.
(This tutorial uses the raw data file CARSMPG.CSV. Download this file here. The program assumes the data file is in the folder C:\RDATA. Make changes to the R program if you save the file to a different folder.) See also Import Data into R from Excel
The following code creates an object named cars, then uses the summary function to produce summary statistics.
cars<-read.csv(file="C:\\RDATA\\CARSMPG.CSV",head=TRUE,sep=",")
hist(cars$HWYMPG)
Or, to avoid having to use cars$variable name, you can first use the attach() function.
attach(cars)
hist(CHWYMPG)
either way will create the following basic histogram:

Adding Labels to the histogram
Use the xlab= ylab= options to define x-axis and y-axis lables, and the main= option to define a title. Also, control the x limits by using the xlim= option as shown in the example.
hist(cars$HWYMPG, xlab="Highway Miles Per Gallon", main="2005 Cars Database", xlim=c(0,60), ylab="Count")

Plot a subset of the data
By defining the variable as
cars$HWYMPG[cars$SUV=="1"]
you are limiting the data in the cars dataset to only those where SUV equals 1.
hist(cars$HWYMPG[cars$SUV=="1"], xlab="Highway Miles Per Gallon", main="SUVs Only", xlim=c(10,40),ylab="Count")

Add a normal curve to a histogram
To add a normal curve to the plot, you must first calculate a mean, and standard deviation. The hist() function options
prob=TRUE makes the histogram a representation of frequencies
density=20 indicates a shading density for the bars
The statement
curve(dnorm(x, mean=m, sd=std),add=TRUE)
defines the curve that will be superimposed onto the graph using data defined by the dnorm() function (produces data base on a probability density function of the normal distributionwith the calculated mean and standard deviation. The add=TRUE option causes the curve to be3 added to the existing plot. the density of shading lines, in lines per inch. The default value of NULL means that no shading lines are drawn. Non-positive values of density also inhibit the drawing of shading lines.
m<-mean(cars$HWYMPG)
std<-sqrt(var(cars$HWYMPG))
hist(cars$HWYMPG, density=20, prob=TRUE,
main="Histogram with normal curve")
curve(dnorm(x, mean=m, sd=std), add=TRUE)

Create a plot lattice and define value labels for the the SUV variable
# If needed read in data
cars<-read.csv(file="C:\\RDATA\\CARSMPG.CSV",head=TRUE,sep=",")
#Define the values of the SUV
cars$SUV <- factor(cars$SUV,levels = c(0,1),labels = c("Non-SUV", "SUV"))
library(lattice)
histogram(~ HWYMPG | SUV, data=cars, type="count", col="red")

For more information on the hist() function
For more information on the curve()function
- End of Tutorial -
See also...
Part 1:
Describing and Examining Measurement (Quantitative) data using R
Part 3: Performing a statistical test to assess normality
Part 4: Examining data by group.
(c) Alan C. Elliott, 2011


