R - Filtering datetimes that are less than a set of datetimes -
i have 2 dataframes.
one data
have several variable columns , several datetime related columns (datetimes, week #, dates, hour, minute, second), has data everyday in 2017. example,
> glimpse(data) observations: 8,001,013 variables: 12 $ id <chr> "(2, 3, 4)", "(5,)", "(6,)", "(7,)", "(8,)", "(9,)", "(10,)", "(11,)", "(12,)", "(13,)", "(14,)", "(15,)", "(16,)", "(17,)", "(18,)", "(19,)", "(20,)", "(21,... $ x <int> 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1... $ num <chr> "set([4225])", "set([4712])", "set([5271])", "set([5334])", "set([5395])", "set([5658])", "set([5889])", "set([6020])", "set([6063])", "set([6090])", "set([6... $ w <int> 4, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 7, 1, 3, 2, 1, 1, 3, 2, 3, 2, 1, 1, 2, 1, 1, 4, 1, 2, 3, 1, 1, 1, 1, 3, 1, 1, 1, 2, 3, 1, 1, 4, 1, 2, 1... $ z <int> 4, 6, -1, -1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -2, -1, -1, -2, 2, 7, 1, -3, -2, 1, -1, 3, 2, 3, -2, -1, -1, -2, -1, -1, 4, 1, 2, -3, 1, 1, 1, 1, -3, 1, 1, 1... $ datetime <dttm> 2017-02-19 18:00:00, 2017-02-19 18:00:00, 2017-02-19 18:00:00, 2017-02-19 18:00:00, 2017-02-19 18:00:00, 2017-02-19 18:00:01, 2017-02-19 18:00:01, 2017-02-1... $ date <date> 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, ... $ day_of_week <ord> sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, su... $ week <dbl> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8... $ hour <int> 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 1... $ minute <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1... $ second <dbl> 0.1187501, 0.3406179, 0.7030604, 0.7431633, 0.7939658, 1.0090485, 1.1624568, 1.2924566, 1.3619752, 1.3922081, 1.4920712, 1.5121725, 1.5621316, 1.6688271, 1.7...
the other dataframe small list of 8 key datetimes, key_datetimes
, e.g.
> key_datetimes # tibble: 9 x 2 code keydatetime <chr> <chr> 1 tail1 2017-01-12 08:30:00 2 tail2 2017-02-09 11:40:00 3 tail3 2017-03-22 08:30:01 4 tail4 2017-04-13 10:30:00 5 tail5 2017-05-19 08:30:00 6 tail6 2017-06-13 08:35:00 7 tail7 2017-07-28 09:30:00 8 tail8 2017-08-23 06:30:00 9 tail9 2017-09-13 07:30:00
i want label data each week specific datetimes in key_datetimes
. want create new column in data
called before_key_datetime
true if data$datetime < key_datetimes
otherwise false.
how can accomplish this?
in other words, want keep weeks of interest (this works), group each week (this works) each group label/mutate new column says rows before/after keydatetime second dataframe (can't part).
things have tried:
- doing
ifelse
on 1 datetime works. - doing
ifelse
vector vector doesn't vector:data %>% filter(week %in% lubridate::week(as.date(key_datetimes$keydatetime))) %>% group_by(week) %>% filter(datetime %in% c(as.posixct(key_datetimes$keydatetime)))
not 100% sure want, try using zoo::na.locf
after merging key data , arranging date
to fill in before each key date
library(dplyr) library(zoo) df %>% full_join(key, by="date") %>% arrange(date) %>% mutate_at(vars(code, key), funs(zoo::na.locf(., na.rm=false, fromlast=true)))
output (head)
date code key 1 2017-01-02 00:00:01 1 tail1 2 2017-01-03 00:00:01 1 tail1 3 2017-01-04 00:00:01 1 tail1 4 2017-01-05 00:00:01 1 tail1 5 2017-01-06 00:00:01 1 tail1 6 2017-01-07 00:00:01 1 tail1 7 2017-01-08 00:00:01 1 tail1 8 2017-01-09 00:00:01 1 tail1
simpler example
simple <- head(df) ans <- simple %>% full_join(key, by="date") %>% arrange(date)
i left out last mutate_at
step. full_join meant combine data both (without losing data). data simple
, key
still present in output. point combine data , sort them. easy way see order of entries.
date code key 1 2017-01-02 00:00:01 na <na> # simple 2 2017-01-03 00:00:01 na <na> # simple 3 2017-01-04 00:00:01 na <na> # simple 4 2017-01-05 00:00:01 na <na> # simple 5 2017-01-06 00:00:01 na <na> # simple 6 2017-01-07 00:00:01 na <na> # simple 7 2017-01-12 08:30:00 1 tail1 # key 8 2017-02-09 11:40:00 2 tail2 9 2017-03-22 08:30:01 3 tail3 10 2017-04-13 10:30:00 4 tail4 11 2017-05-19 08:30:00 5 tail5 12 2017-06-13 08:35:00 6 tail6 13 2017-07-28 09:30:00 7 tail7 14 2017-08-23 06:30:00 8 tail8 15 2017-09-13 07:30:00 9 tail9
the mutate_at
fill in na
s nearest value traveling backwards. code , key in rows 1-6 take on values in row 7.
you can use code or key filter dates interested in or determine whether date in full data frame before key date. instance,
ans <- df %>% full_join(key, by="date") %>% arrange(date) %>% mutate_at(vars(code, key), funs(zoo::na.locf(., na.rm=false, fromlast=true)))
to find dates before key[1,] 1 tail1 2017-01-12 08:30:00
, can do
ans %>% filter(code==1) # last row key data frame
or
ans %>% filter(key=="tail1")
to determine if date in data frame before key date,
ans[3, ] # date code key # 3 2017-01-04 00:00:01 1 tail1
tells 3rd entry in data frame before key[1,] 1 tail1 2017-01-12 08:30:00
just in case
to fill in after each key date
df %>% full_join(key, by="date") %>% arrange(date) %>% mutate_at(vars(code, key), funs(zoo::na.locf(., na.rm=false, fromlast=false)))
output (tail)
363 2017-12-21 00:00:01 9 tail9 364 2017-12-22 00:00:01 9 tail9 365 2017-12-23 00:00:01 9 tail9 366 2017-12-24 00:00:01 9 tail9 367 2017-12-25 00:00:01 9 tail9 368 2017-12-26 00:00:01 9 tail9 369 2017-12-27 00:00:01 9 tail9 370 2017-12-28 00:00:01 9 tail9 371 2017-12-29 00:00:01 9 tail9 372 2017-12-30 00:00:01 9 tail9 373 2017-12-31 00:00:01 9 tail9 374 2018-01-01 00:00:01 9 tail9
data
df <- data.frame(date = ymd_hms("2017-01-01 00:00:01") + days(x=1:365)) key <- structure(list(code = 1:9, key = c("tail1", "tail2", "tail3", "tail4", "tail5", "tail6", "tail7", "tail8", "tail9"), date = structure(c(1484209800, 1486640400, 1490171401, 1492079400, 1495182600, 1497342900, 1501234200, 1503469800, 1505287800), tzone = "utc", class = c("posixct", "posixt"))), class = "data.frame", .names = c("code", "key", "date"), row.names = c(na, -9l))
Comments
Post a Comment