lubridate()
lubridate()
. In this tutorial, we will see why lubridate()
is super helpful and important.
lubridate()
terms:Year-Month-Day
, where Year
is a 4-digit year, Month
is a 2-digit month, and Day
is a 2-digit year.
as.Date()
: Converts character dates to Date objectsformat
argument of as.Date()
tells R what format the inputted date is in to make sure it will be parsed properly.## [1] "2012-01-10"
This will output the same inputted date, but in the format: “YYYY-MM-DD”.
Some useful extraction functions:
weekdays()
: returns day of week (RegEx would require us to 1. pass in all possible values of days (Monday,..Sunday and monday,..sunday and mon,..sun, and other possibilities) and then check if any one value is found in the date string (ex: “October 9th Friday”). This is possible only if a day is included in the text. If there is no day, then RegEx cannot extract it for us. This is when lubridate()
will be useful because we only need to input the date. A day is not required. Super convenient!months()
: returns month of yearquarters()
: returns quarter of yearyear()
: returns year of the dateNote: today()
returns value of the current date in the format YYYY-MM-DD
lubridate()
does this very easily. It allows us to find weekday, month, and more within an inputted date.
## [1] "Monday"
## [1] "April"
## [1] "Q2"
## [1] 2021
## Apr
## 30
Sys.time()
returns the current time as understood by the system that is running the code. For example, if the code is run in New York, then it will return the current time in EST time zone.time_now <- Sys.time() # `time_now` is NOT a list
time_now # stores a value: "2020-10-09 09:37:27.064 PDT"
## [1] "2021-04-26 18:32:52 PDT"
We can try time_now$sec
to get the exact second of the current minute that we are on. But, it won’t work. It returns “Error in time_now$sec : $ operator is invalid for atomic vectors”
time_now1 <- as.POSIXlt(time_now)
# now, time_now1 is a list.
time_now1 # returns exact date and time in format "YYYY-MM-DD HH:MM:SS.ssss TZ"
## [1] "2021-04-26 18:32:52 PDT"
## [1] 52.62111
## [1] 1
We can return all the components of a POSIXt object.
## $sec
## [1] 52.62111
##
## $min
## [1] 32
##
## $hour
## [1] 18
##
## $mday
## [1] 26
##
## $mon
## [1] 3
##
## $year
## [1] 121
##
## $wday
## [1] 1
##
## $yday
## [1] 115
##
## $isdst
## [1] 1
##
## $zone
## [1] "PDT"
##
## $gmtoff
## [1] -25200
##
## attr(,"tzone")
## [1] "" "PST" "PDT"
# ex: mday = day of month
class(unclass(time_now1)) # list - this makes it possible to index and retrieve required information!
## [1] "list"
## $sec
## [1] 52.62111
lubridate()
is more efficient than RegEx to parse datesThere are several contents of dates that we can extract with lubridate()
that we cannot easily do with RegEx (or do at all!).
lubridate()
makes it possible to extract fractional seconds like we did above. If airtraffic controllers need to monitor flights’ paths to the details of milliseconds to prevent crashes, lubridate()
can help.lubridate()
can easily identify:
isdst
argument returns 1 if true)
strptime
. strptime
converts the date string to a POSIXlt / POSIXt object.Note: This is different from “Date” object.
datestring <- c("January 15, 2012 10:40", "September 20, 2014 11:08")
strptime_1 <- strptime(x= datestring,
format = "%B %d, %Y %H:%M")
strptime_1
## [1] "2012-01-15 10:40:00 PST" "2014-09-20 11:08:00 PDT"
## [1] "POSIXlt" "POSIXt"
as.date
converts a string to “Date” object.## [1] "character"
date_1 <- strptime(x = "09 Jan 2012 11:20:44",
format = "%d %b %Y %H:%M:%S")
date_1 # "2012-01-09 11:20:44 PST"
## [1] "2012-01-09 11:20:44 PST"
## [1] "POSIXlt" "POSIXt"
## [1] "2012-09-24"
## [1] "Date"
## [1] "2012-09-24 UTC"
## [1] "POSIXlt" "POSIXt"
## Time difference of 258.1939 days
## Time difference of -258.1939 days
## [1] FALSE
## [1] TRUE
We can perform calculations on “Date” objects and “POSIXlt” / “POSIXct” objects.
leapYearX <- as.Date("2020-02-14")
leapYearY <- as.Date("2020-03-01")
leapYearY - leapYearX # Time difference of 16 days
## Time difference of 16 days
Note: If the years were 2019 instead of 2020, the result is “Time difference of 15 days”.
lubridate()
.halloweenX <- as.POSIXct(x = "2019-10-31 1:00:00") # by default, it uses the current timezone (in this case, PST)
halloweenY <- as.POSIXct(x = "2019-10-31 6:00:00",
tz = "GMT")
# the magnitude of the difference is only 2,
# since GMT is 7 hours ahead of PST during
# the Day Light Savings time.
# So, halloweenX in GMT would be 8:00:00 GMT
# instead of 1:00:00 PST.
# If both were in same timezone, then it would return 5.
halloweenY - halloweenX # Time difference of -2 hours
## Time difference of -2 hours
## Time difference of 2 hours
We could do some calculations like these without lubridate().
(2019-08-09) - (2019-08-02)
: returns -7 without units.(2019-08-09 11:02:33) - (2019-08-02 02:19:33)
: returns an errorlubridate()
enables us to perform these calculations easily.halloweenA <- as.POSIXct(x = "2019-10-31 6:00:00",
tz = "GMT")
class(halloweenA) # "POSIXct" "POSIXt"
## [1] "POSIXct" "POSIXt"
halloweenB <- as.POSIXct(x = "2019-10-31 1:00:00",
tz = "America/Los_Angeles")
halloweenA - halloweenB # Time difference of -2 hours
## Time difference of -2 hours
## Time difference of 2 hours
## [1] 51
## [1] 31
## [1] 10
## [1] Oct
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
## [1] 5
## [1] Thu
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
## [1] 304
## [1] 44
## [1] "2021-04-26 21:32:52 EDT"
parse_date_time()
parses dateparse_date_time()
makes its best guess as to what the time is and outputs in the standard ISO format (“YYYY-mm-dd HH:MM:SS UTC” time).
orders =
argument enables us to list all possible date time formats for R to parse all sorts of dates and times correctly.lubridate::parse_date_time(x = c("2020-01-02 11:00:02", "19-02-2011", "02-19-2003", "931406", "091406", "20112003", "Oct 1, 2009", "Nov 19 2012", "Monday", "Wed"),
orders = c("dmy", "dym",
"ymd", "ydm",
"mdy", "myd",
"%m%d%Y", "%Y%m%d %H%M%S",
"%b", "%a", "%T", "%d%b%Y"))
## [1] "2020-01-02 11:00:02 UTC" "2011-02-19 00:00:00 UTC"
## [3] "2003-02-19 00:00:00 UTC" "1993-06-14 00:00:00 UTC"
## [5] "2006-09-14 00:00:00 UTC" "2003-11-20 00:00:00 UTC"
## [7] "2009-10-01 00:00:00 UTC" "2019-12-20 00:00:00 UTC"
## [9] "2021-04-26 00:00:00 UTC" "2021-04-26 00:00:00 UTC"
We can extract month, year, etc using the right functions as below:
bdt1 <- data.frame(Variable = c("Person","BirthDate/Time"),
Person1 = c("paul", "090594559103"),
Person2 = c("carrie", "2009-05-10"),
Person3 = c("susan", "08/06/04"))
bdt1
## [1] FALSE TRUE
## [1] 1 1 1 1 1 1 1 1 1 1 1
## [1] 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013
## [1] 6 6 6 6 6 6 6 6 6 6 6
## [1] 18 18 18 18 18 18 18 18 18 18 18
and then we can add those values to a new column in a dataframe, that can be created and named as we wish. this will help with better analysis of:
Basically, character strings of dates can be converted to Date/Time classes using the strptime
, as.Date
, as.POSIXlt
, or asPOSIXct
functions.
Side note: Plots also change their formatting when using Date/Time objects for graphing. Compare with regular text!
## [1] "America/Los_Angeles"
## [1] 2019-10-31 06:00:00 GMT--2019-10-31 08:00:00 GMT
int.halloween.1 <- halloweenB %--% halloweenA
int.halloween.2 <- halloweenY %--% halloweenX #automatically converts to GMT for both!
class(halloweenA %--% halloweenB) # interval
## [1] "Interval"
## attr(,"package")
## [1] "lubridate"
## [1] 1
## Time difference of 7200 secs
## [1] TRUE
## [1] "2019-10-31 01:00:00 PDT"
## [1] "2019-10-31 08:00:00 GMT"
christmas2018 <- as.POSIXct(x = "2018-12-25 6:00:00 PDT")
# automatically changes to PST!
christmas2019 <- as.POSIXct(x = "2019-12-25 8:00:00 America/Los Angeles")
# automatically changes to PST!
# creating a Christmas interval object
interval.christmas <- interval(christmas2018, christmas2019)
interval.christmas # 2018-12-25 06:00:00 PST--2019-12-25 08:00:00 PST
## [1] 2018-12-25 06:00:00 PST--2019-12-25 08:00:00 PST
We can find how many seconds are in “X” number of minutes / hours / days / weeks / months / years.
## [1] "86400s (~1 days)"
## [1] "604800s (~1 weeks)"
## [1] "2629800s (~4.35 weeks)"
## [1] "31557600s (~1 years)"
Moreover, the package makes it easier to split a column that contains both a date and time in a dataframe into separate columns for day, month, second, etc. - If street traffic analysts are evaluating the traffic lights and the corresponding number of accidents in the area, they would want to get details even up to a second to check if they need to increase the number of seconds that a red light is displayed or decrease the number of time a green light is displayed. - If a person is tracking their mood throughout a day, they would probably want hourly data.
## [1] "character"
date_1 <- strptime(x = "09 Jan 2012 11:20:44",
format = "%d %b %Y %H:%M:%S")
date_1 # "2012-01-09 11:20:44 PST"
## [1] "2012-01-09 11:20:44 PST"
## [1] "POSIXlt" "POSIXt"
## [1] "2012-01-09 11:20:44 PST"
## [1] "2012-09-24"
## [1] "Date"
lubridate()
:lubridate()
’s timezone functionalities in airlines booking websiteslubridate()
will enable the flight booking program to enable the passenger to see when they will arrive at Auckland at both the Eastern Time Zone (EST) as well as Auckland Time Zone. This can help the passengers decide how they will spend time at the layover airport, or even decide when to sleep on the airplane so that they can minimize jet lag!lubridate()
’s interval functionalities in calendar appsint_overlaps()
functions.
Overall, I find lubridate()
to be very useful and applicable to real-life situations. I find dates, especially when presented as strings, to be really hard to work with. Regular expressions are great to extract dates, but when it comes to details of dates, lubridate()
can be a better choice! From working with timezones and durations to simply extracting months out of a list of dates, lubridate()
can save the day!
I extensively researched about lubridate()
from the following sources. I highly recommend reading them, because I learned so much that I could not have easily found in the RDocumentation.
Sources List
Thank you to Dr. Lecy who gave me an opportunity to write about an awesome package!