Miskatonic University Press

Academic year

code code4lib r

I work at a university library, and when I analyse data I like to arrange things by academic year (September to August) so I often need to find the academic year for a given date. Here are Ruby and R functions I made to do that. Both are pretty simple—they could be better, I’m sure, but they’re good enough for now. They use the same method: subtract eight months and then find the year you’re in.

The Ruby is the shortest, and uses the Date class. First, subtract eight months, with <<.

d « n: Returns a date object pointing n months before self. The n should be a numeric value.

Rather cryptic. Then we find the year with .year, which is pretty clear. This is the function:

require 'date'

def academic_year(date)
  (Date.parse(date) << 8).year
end

Example:

> academic_year("2016-09-22")
=> 2016

The function is very short because Ruby nicely handles leap years and months of varying lengths. What is 30 October 2015 - eight months?

> Date.parse("2015-10-30") << 8
=> #<Date: 2015-02-28 ((2457082j,0s,0n),+0s,2299161j)>

2016 is a leap year—what is 30 October 2016 - eight months?

> Date.parse("2016-10-30") << 8
=> #<Date: 2016-02-29 ((2457448j,0s,0n),+0s,2299161j)>

Sensible. And the function returns a number (a Fixnum), not a string, which is what I want.

In R things are more complicated. How to subtract months from a date in R? gives a few answers, but none are pretty. Using lubridate makes things much easier (and besides, I use lubridate in pretty much everything anyway).

library(lubridate)

academic_year <- function(date) {
  as.integer(format(floor_date(floor_date(as.Date(date), "month") - months(8), "year"), "%Y"))
}

Example:

> academic_year("2016-09-22")
[1] 2016

The floor_date function gets called twice, the first time to drop back to the start of the month, which avoids R’s problems dealing with leap years:

> as.Date("2016-10-30") - months(8)
[1] NA

But you can always subtract 8 months from the first of a month. Then the function goes to 01 January of that year, pulls out just the year (“%Y”) and returns it as an integer. I’m sure it could be faster.

And once the academic year is identified, when making charts it’s nice to have September–August on the x axis. I often do something like this, with a data frame called data that has a date column:

library(dplyr) # I always use it
library(lubridate)

data <- data %>% mutate (month_name = month(date, label = TRUE))
data$month_name <- factor(data$month_name, levels = c("Sep", "Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug"))

Finding the academic year of a date could be a code golf thing, but Stack Overflow has too many rules.