Overview
The lattice package for R provides several convenient functions for plotting data which has some kind of internal structure, usually in the form of groups. Lattice plotting functions create common visualizations of data (scatter plots, box-and-whisker plots, etc.), within in grid of panels defined by one or more grouping variables. See the manual page for xyplot for documentation and examples. The author of the lattice package has posted some excellent examples with code snippets from the upcoming definitive book on the topic (Lattice: Multivariate Data Visualization with R, by Deepayan Sarkar).
Complex Plots with Lattice
With the convenience of lattice graphics comes a price- complex plots cannot be generated element by element (as is the case using base graphics). Instead, one of several panel functions must be used or a customized panel function must be written. There is extensive documentation on this, but not nearly enough for the special case of wanting a graph which includes both lines and point symbols. In addition, lattice functions require that all data to be plotted occur in the same dataframe.This example presents one possible solution to plotting grouped data (via lattice) which consists of both different symbolization (lines and points) and source dataframes.
Example
The following example was created to illustrate the changes in the shape of the logistic function that occur with 3 possible 'slope' terms (b), and 3 possible 'intercept' terms (a). Each combination of slope and intercept are used as grouping variables, such that the resulting figure will contain 9 panels- one for each combination of slope/intercept. The panels are labeled with the respective slope (green panel title) and intercept (yellow panel title). To demonstrate plotting of mixed symbol types, an unrelated set of binary data was generated and split into the same 9 groupings. See R code below for the full story.
Lattice Plot Example: Panels illustrate the effect of different slope and intercept terms for the logistic function.
Generate 9 Possible Versions of the Logistic Function
library(lattice)
##
## the logistic function
## a is the intercept, b the slope, and x the value
f <- function(a, b, x) {p <- (exp(a + b*x)) / (1 + exp(a + b*x))}
## a data vector
x.seq <- seq(-5, 5, by=0.5)
x <- rep(x.seq, 9)
## generate some
## slope and intercept possibilities
a.seq <- c(-2, 0, 2)
b.seq <- c(-1, -0.5, -0.25)
## create data frame of all possible combinations
## i.e. all slope/intercept combinations
d <- expand.grid(a=a.seq, b=b.seq, x=x.seq, KEEP.OUT.ATTRS=FALSE)
## add the probability values back to the main DF
d$p <- f(d$a,d$b,d$x)
Generate Some Fake Binary Data
rx <- runif(min=-5, max=5, n=20)
## using the same groupings from the above example
## combine into a DF
rd <- expand.grid(a=a.seq, b=b.seq, x=rx, KEEP.OUT.ATTRS=FALSE)
## add pretend binary data
rd$rp <- rbinom(n=length(rd$x), 1, 0.5)
Merge the Two Dataframes
d$rp <- NA
## add dummy col for real probability
rd$p <- NA
## combine the first set of data with the pretend probabilities
dd <- make.groups(d, rd)
Plot Lines and Points with xyplot
data=dd,
ylab='Probability',
xlab='Predictor Variable',
panel=panel.superpose,
distribute.type=TRUE,
col=c(1,2), lwd=c(2,1), type=c('l','p'), pch=c(NA, '|')
)
Conclusion
The trick to plotting multiple symbol types can be summarized with:
- include all response variables to be plotted in the left-hand side of the plotting formula : p + rp ~ .... Note that they must share a common predictor variable, in this case the column "x" was used.
- use the panel function panel=panel.superpose and its argument distribute.type=TRUE to allow for more than one plotting style
- specify plotting styles (line style, symbol type, line width, color, etc.) as a vector which contains as many elements as response variables from the plotting formula: col=c(1,2), lwd=c(2,1), type=c('l','p'), pch=c(NA, '|')
With this approach in mind, it is possible to generate complicated plots using lattice graphics when data is of multiple type (line vs. point) and comes from multiple source dataframes. A common example of this scenario might involve plotting the continuous predictions from a linear model and the original points used to create the model.
Lattice Plot Example 2: Data from a logistic regression model, including fitted response, standard error, and original data points.
No comments:
Post a Comment