Introduction to lubridate()

  • Dates are a special type of data. Just like strings and integers, dates are their own data type. There are special packages and functions that work with dates.
  • One such package is called lubridate(). In this tutorial, we will see why lubridate() is super helpful and important.
  • If extracting dates was a simple one-time task, we could use Regular Expressions to extract the dates and times in a given text or dataframe.
  • However, it gets tricky when dealing with numerous dates.
    • For example, RegEx cannot find specific dates and times in any timezone.
    • Also, RegEx cannot derive any meaning from the data that it extracts. It simply extracts and presents.
      • There is no information about what day of the week or quarter of the year that date is in.
      • We also cannot tell how many days are in the month just by looking at it or even using RegEx.



lubridate() terms:

  • Dates are represented by the “Date” class.
    • By default, dates are stored in the format Year-Month-Day, where Year is a 4-digit year, Month is a 2-digit month, and Day is a 2-digit year.
    • Dates are stored internally as the number of days since 1970-01-01
  • Times are represented using “POSIXct” and “POSIXlt” classes.
    • Times are stored internally as the number of seconds since 1970-01-01
  • The “POSIXct” class is useful when you want to store times as characters, such as in a dataframe.
  • The “POSIXlt” class is useful when you want to store times as a list. It also stores other information such as day of the month and week. (?)

Helpful Video by Professor Peng, explaining the above terms



Let’s get started!

First, we need to import necessary packages. After many revisions, it was clear that the following packages have to be installed and opened into RStudio.



as.Date(): Converts character dates to Date objects

The format argument of as.Date() tells R what format the inputted date is in to make sure it will be parsed properly.

## [1] "2012-01-10"

This will output the same inputted date, but in the format: “YYYY-MM-DD”.

Some useful extraction functions:

  • weekdays(): returns day of week (RegEx would require us to 1. pass in all possible values of days (Monday,..Sunday and monday,..sunday and mon,..sun, and other possibilities) and then check if any one value is found in the date string (ex: “October 9th Friday”). This is possible only if a day is included in the text. If there is no day, then RegEx cannot extract it for us. This is when lubridate() will be useful because we only need to input the date. A day is not required. Super convenient!
  • months() : returns month of year
  • quarters(): returns quarter of year
  • year() : returns year of the date

Note: today() returns value of the current date in the format YYYY-MM-DD

lubridate() does this very easily. It allows us to find weekday, month, and more within an inputted date.

## [1] "Monday"
## [1] "April"
## [1] "Q2"
## [1] 2021
## Apr 
##  30

Sys.time()

  • Sys.time() returns the current time as understood by the system that is running the code. For example, if the code is run in New York, then it will return the current time in EST time zone.
  • It is a “POSIXct” “POSIXt” time object.
## [1] "2021-04-26 18:32:52 PDT"

We can try time_now$sec to get the exact second of the current minute that we are on. But, it won’t work. It returns “Error in time_now$sec : $ operator is invalid for atomic vectors”

## [1] "2021-04-26 18:32:52 PDT"
## [1] 52.62111
## [1] 1

We can return all the components of a POSIXt object.

## $sec
## [1] 52.62111
## 
## $min
## [1] 32
## 
## $hour
## [1] 18
## 
## $mday
## [1] 26
## 
## $mon
## [1] 3
## 
## $year
## [1] 121
## 
## $wday
## [1] 1
## 
## $yday
## [1] 115
## 
## $isdst
## [1] 1
## 
## $zone
## [1] "PDT"
## 
## $gmtoff
## [1] -25200
## 
## attr(,"tzone")
## [1] ""    "PST" "PDT"
## [1] "list"
## $sec
## [1] 52.62111

Why lubridate() is more efficient than RegEx to parse dates

There are several contents of dates that we can extract with lubridate() that we cannot easily do with RegEx (or do at all!).

  • lubridate() makes it possible to extract fractional seconds like we did above. If airtraffic controllers need to monitor flights’ paths to the details of milliseconds to prevent crashes, lubridate() can help.
  • lubridate() can easily identify:
    • a date’s corresponding day of the year (ex: 300 = 300th day of the year)
    • a date’s corresponding day of the week (ex: 5 = 5th day of the week)
    • if the date is observing day light savings time or not (isdst argument returns 1 if true)
    • difference in duration between dates.
    • Many of the above tasks are not possible or are tedious with RegEx.



Conversion of Character/String format -> Date/Time objects using strptime.

strptime converts the date string to a POSIXlt / POSIXt object.

Note: This is different from “Date” object.

## [1] "2012-01-15 10:40:00 PST" "2014-09-20 11:08:00 PDT"
## [1] "POSIXlt" "POSIXt"
  • Once we have converted characters into dates, we can do important calculations on them. can add, substract, compare dates.
  • One thing to note is that “Date” and “POSIXlt” objects can’t be mixed. For example, it is not possible to subtract a “Date” object from a “POSIXlt” object.



Operations on Dates and Times

Date and Time operators also keep track of:

  • leap years
  • leap seconds
  • daylight savings
  • and time zones!

We can perform calculations on “Date” objects and “POSIXlt” / “POSIXct” objects.

  1. Leap years
## Time difference of 16 days

Note: If the years were 2019 instead of 2020, the result is “Time difference of 15 days”.

  1. Time difference at different time zones (examples include “GMT” and “America/Los Angeles”).
  • We can easily find the differences between dates and times using lubridate().
  • We just need to make sure that the objects we are comparing are of the same data types.
## Time difference of -2 hours
## Time difference of 2 hours

We could do some calculations like these without lubridate().

  • (2019-08-09) - (2019-08-02) : returns -7 without units.
  • (2019-08-09 11:02:33) - (2019-08-02 02:19:33) : returns an error
  • lubridate() enables us to perform these calculations easily.
## [1] "POSIXct" "POSIXt"
## Time difference of -2 hours
## Time difference of 2 hours

We can also get second, day, month, week, day of the week, day of the year of POSIXct and POSIXlt objects.

## [1] 51
## [1] 31
## [1] 10
## [1] Oct
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
## [1] 5
## [1] Thu
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
## [1] 304
## [1] 44

We can also express the current time (or a POSIXlt/POSIXct time object) in another time zone

## [1] "2021-04-26 21:32:52 EDT"

parse_date_time() parses date

parse_date_time() makes its best guess as to what the time is and outputs in the standard ISO format (“YYYY-mm-dd HH:MM:SS UTC” time).

  • orders = argument enables us to list all possible date time formats for R to parse all sorts of dates and times correctly.
##  [1] "2020-01-02 11:00:02 UTC" "2011-02-19 00:00:00 UTC"
##  [3] "2003-02-19 00:00:00 UTC" "1993-06-14 00:00:00 UTC"
##  [5] "2006-09-14 00:00:00 UTC" "2003-11-20 00:00:00 UTC"
##  [7] "2009-10-01 00:00:00 UTC" "2019-12-20 00:00:00 UTC"
##  [9] "2021-04-26 00:00:00 UTC" "2021-04-26 00:00:00 UTC"

We can extract month, year, etc using the right functions as below:

## [1] FALSE  TRUE
##  [1] 1 1 1 1 1 1 1 1 1 1 1
##  [1] 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013
##  [1] 6 6 6 6 6 6 6 6 6 6 6
##  [1] 18 18 18 18 18 18 18 18 18 18 18

and then we can add those values to a new column in a dataframe, that can be created and named as we wish. this will help with better analysis of:

  • when there is more likely to be more traffic
  • what times during the day are shoppers more likely to shop in store vs. online
  • what times am I more likely to win a lottery ticket?
  • what months are safer to drive to a certain city?
  • and much more!

Basically, character strings of dates can be converted to Date/Time classes using the strptime, as.Date, as.POSIXlt, or asPOSIXct functions.

Side note: Plots also change their formatting when using Date/Time objects for graphing. Compare with regular text!

## [1] "America/Los_Angeles"

Intervals

Intervals are separate objects in R.

  • Knowing how to work with intervals is important when studying weather patterns and how they change over seasons from year to year.
  • It is also important when considering which events are co-occuring, so that there will not be overlaps for important events, such as meetings or consecutive flights.
## [1] 2019-10-31 06:00:00 GMT--2019-10-31 08:00:00 GMT
## [1] "Interval"
## attr(,"package")
## [1] "lubridate"
## [1] 1
## Time difference of 7200 secs

We can also perform functions on intervals!

  • such as “do 2 intervals overlap?”
## [1] TRUE
  • can also retrieve start and end times of intervals
## [1] "2019-10-31 01:00:00 PDT"
## [1] "2019-10-31 08:00:00 GMT"
## [1] 2018-12-25 06:00:00 PST--2019-12-25 08:00:00 PST

We can find how many seconds are in “X” number of minutes / hours / days / weeks / months / years.

## [1] "86400s (~1 days)"
## [1] "604800s (~1 weeks)"
## [1] "2629800s (~4.35 weeks)"
## [1] "31557600s (~1 years)"

Moreover, the package makes it easier to split a column that contains both a date and time in a dataframe into separate columns for day, month, second, etc. - If street traffic analysts are evaluating the traffic lights and the corresponding number of accidents in the area, they would want to get details even up to a second to check if they need to increase the number of seconds that a red light is displayed or decrease the number of time a green light is displayed. - If a person is tracking their mood throughout a day, they would probably want hourly data.

## [1] "character"
## [1] "2012-01-09 11:20:44 PST"
## [1] "POSIXlt" "POSIXt"
## [1] "2012-01-09 11:20:44 PST"
## [1] "2012-09-24"
## [1] "Date"



Potential Real Life Applications of lubridate():

  1. Use of lubridate()’s timezone functionalities in airlines booking websites
  • customers can the times at which they will leave from departure and arrive at destination in any time zone that they wish.
    • for example, if a passenger wants to travel from NYC to Auckland, lubridate() will enable the flight booking program to enable the passenger to see when they will arrive at Auckland at both the Eastern Time Zone (EST) as well as Auckland Time Zone. This can help the passengers decide how they will spend time at the layover airport, or even decide when to sleep on the airplane so that they can minimize jet lag!
  1. Use of lubridate()’s interval functionalities in calendar apps
  • Users can be warned of overlapping events with the help of the int_overlaps() functions.
    • For example, if a user wants to schedule a meeting with a client at 2:30pm, but a meeting from 1:30pm rolls over till 3:30pm, they would want the app to warn them before they schedule another meeting for 2:30pm.

Final Words

Overall, I find lubridate() to be very useful and applicable to real-life situations. I find dates, especially when presented as strings, to be really hard to work with. Regular expressions are great to extract dates, but when it comes to details of dates, lubridate() can be a better choice! From working with timezones and durations to simply extracting months out of a list of dates, lubridate() can save the day!

Credits

I extensively researched about lubridate() from the following sources. I highly recommend reading them, because I learned so much that I could not have easily found in the RDocumentation.

Sources List

  1. https://www.fabianheld.com/lubridate/
  2. https://www.youtube.com/watch?v=8HENCYXwZoU (Thank you Prof. Peng!)
  3. https://rawgit.com/rstudio/cheatsheets/master/lubridate.pdf

Thank you to Dr. Lecy who gave me an opportunity to write about an awesome package!