Miskatonic University Press

ARL statistics visualized with R and Google Motion Charts

r librarystats

I fixed a couple of things after the last post and now we can use motion charts to visualize all the Association of Research Library statistics using R and googleVis. I've got a small example below, set up so that it's showing four particularly interesting things to start, and you can just press play to see something good:

  • along the y-axis: TOTSTU, total number of students at the universities (see Principles of Membership to see what it means to be in the ARL for a library and for its parent university; serious research will be happening there)
  • along the x-axis: FAC, number of faculty
  • colour of the circles: TYPE, blue for Canadian universities (C); green for private American universities (P); yellow for public state American universities (S)
  • size of the circles: TOTEXP, total expenditures of the libraries

(Again, if you're viewing this through an RSS feed and not seeing a fancy graph with a lot of coloured circles on it, come to this page and try out the motion chart.)

R version 2.12.1 (2010-12-16) • googleVis-0.2.7Google Terms of Use </div>

Notice how the green (private American) universities are lower down because they generally have fewer students. The shape of the line they make is closer to the x-axis because they generally have more faculty per student. The yellow (American state universities) are angled up much higher because they generally have more students and fewer faculty per student. The blue Canadian universities are mixed in with the yellow ones.

Pennsylvania State is a big outlier in the American universities, far up and to the right with lots of faculty and lots of students. U of Toronto is the outlier among Canadian universities---it's the biggest in this country. When the graph stops in 2009 there are three large green circles with just over 2,000 faculty, running up the centre of the chart: Yale, Harvard and Columbia.

A related pair of variables to chart is TOTSTU vs PRFSTF (professional staff). A really interesting pair is EXPSER (expenditures for current serials) vs SERPUR (current serials purchased). EXPSER/SERPUR is how much a library spends per year on serials, and the motion chart of the last twenty years of EXPSER to SERPUR shows how crazy all this has become and why serials purchasing is such a problem for libraries now.

How to recreate this in R:

> arl <- read.csv("http://www.miskatonic.org/files/arl-1989-2009.csv")
> install.packages("googleVis")
> library(googleVis)
> arl.toplot <- subset(arl, subset = TYPE %in% c("C", "P", "S"),
select=c(YEAR, TYPE, INAM, FAC, TOTSTU, TOTEXP, VOLS, SERPUR,
TOTCIRC, PRFSTF, TOTSTF, EXPMONO, EXPSER, SALPRF))
> M <- gvisMotionChart(arl.toplot, idvar="INAM", timevar="YEAR")
> plot(M)

To keep this page small enough that my CMS could deal with it (there's about 300K of data embedded in it) I'm only showing a very small subset of all available variables:

  • FAC: instructional faculty
  • TOTSTU: total full-time student enrolment
  • TOTEXP: total library expenditures
  • VOLS: volumes held
  • SERPUR: current serials purchased
  • TOTCIRC: total circulations
  • PRFSTF: professional staff (librarians and others)
  • TOTSTF: total professional and support staff
  • EXPMONO: expenditures for monographs
  • EXPSER: expenditures for current serials
  • SALPRF: professional salaries

Using the code above it's easy to recreate this chart at home. If you do, you can leave out the select bit and it will graph all the variables. As well, the subset command picks out three kinds of institutions and leaves out national libraries like Library and Archives Canada and the Library of Congress, which in many ways aren't comparable to university libraries, but you can leave out the subsetting to see what happens. To visualize the entire ARL data set, run this:

> arl <- read.csv("http://www.miskatonic.org/files/arl-1989-2009.csv")
> install.packages("googleVis")
> library(googleVis)
> M <- gvisMotionChart(arl, idvar="INAM", timevar="YEAR")
> plot(M)

(Aside from loading in the googleVis package, that's three lines: get the data, prep the data, show the data. Powerful!)