R - Filtering datetimes that are less than a set of datetimes -


i have 2 dataframes.

one data have several variable columns , several datetime related columns (datetimes, week #, dates, hour, minute, second), has data everyday in 2017. example,

> glimpse(data) observations: 8,001,013 variables: 12  $ id                 <chr> "(2, 3, 4)", "(5,)", "(6,)", "(7,)", "(8,)", "(9,)", "(10,)", "(11,)", "(12,)", "(13,)", "(14,)", "(15,)", "(16,)", "(17,)", "(18,)", "(19,)", "(20,)", "(21,... $ x                  <int> 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1... $ num                <chr> "set([4225])", "set([4712])", "set([5271])", "set([5334])", "set([5395])", "set([5658])", "set([5889])", "set([6020])", "set([6063])", "set([6090])", "set([6... $ w                  <int> 4, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 7, 1, 3, 2, 1, 1, 3, 2, 3, 2, 1, 1, 2, 1, 1, 4, 1, 2, 3, 1, 1, 1, 1, 3, 1, 1, 1, 2, 3, 1, 1, 4, 1, 2, 1... $ z                  <int> 4, 6, -1, -1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -2, -1, -1, -2, 2, 7, 1, -3, -2, 1, -1, 3, 2, 3, -2, -1, -1, -2, -1, -1, 4, 1, 2, -3, 1, 1, 1, 1, -3, 1, 1, 1... $ datetime           <dttm> 2017-02-19 18:00:00, 2017-02-19 18:00:00, 2017-02-19 18:00:00, 2017-02-19 18:00:00, 2017-02-19 18:00:00, 2017-02-19 18:00:01, 2017-02-19 18:00:01, 2017-02-1... $ date               <date> 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, 2017-02-19, ... $ day_of_week        <ord> sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, sun, su... $ week               <dbl> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8... $ hour               <int> 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 1... $ minute             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1... $ second             <dbl> 0.1187501, 0.3406179, 0.7030604, 0.7431633, 0.7939658, 1.0090485, 1.1624568, 1.2924566, 1.3619752, 1.3922081, 1.4920712, 1.5121725, 1.5621316, 1.6688271, 1.7... 

the other dataframe small list of 8 key datetimes, key_datetimes, e.g.

> key_datetimes # tibble: 9 x 2          code         keydatetime         <chr>               <chr> 1       tail1 2017-01-12 08:30:00 2       tail2 2017-02-09 11:40:00 3       tail3 2017-03-22 08:30:01 4       tail4 2017-04-13 10:30:00 5       tail5 2017-05-19 08:30:00 6       tail6 2017-06-13 08:35:00 7       tail7 2017-07-28 09:30:00 8       tail8 2017-08-23 06:30:00 9       tail9 2017-09-13 07:30:00 

i want label data each week specific datetimes in key_datetimes. want create new column in data called before_key_datetime true if data$datetime < key_datetimes otherwise false.

how can accomplish this?

in other words, want keep weeks of interest (this works), group each week (this works) each group label/mutate new column says rows before/after keydatetime second dataframe (can't part).

things have tried:

  • doing ifelse on 1 datetime works.
  • doing ifelse vector vector doesn't vector: data %>% filter(week %in% lubridate::week(as.date(key_datetimes$keydatetime))) %>% group_by(week) %>% filter(datetime %in% c(as.posixct(key_datetimes$keydatetime)))

not 100% sure want, try using zoo::na.locf after merging key data , arranging date

to fill in before each key date

library(dplyr) library(zoo) df %>%   full_join(key, by="date") %>%   arrange(date) %>%   mutate_at(vars(code, key), funs(zoo::na.locf(., na.rm=false, fromlast=true))) 

output (head)

                   date code   key 1   2017-01-02 00:00:01    1 tail1 2   2017-01-03 00:00:01    1 tail1 3   2017-01-04 00:00:01    1 tail1 4   2017-01-05 00:00:01    1 tail1 5   2017-01-06 00:00:01    1 tail1 6   2017-01-07 00:00:01    1 tail1 7   2017-01-08 00:00:01    1 tail1 8   2017-01-09 00:00:01    1 tail1 

simpler example

simple <- head(df) ans <- simple %>%   full_join(key, by="date") %>%   arrange(date) 

i left out last mutate_at step. full_join meant combine data both (without losing data). data simple , key still present in output. point combine data , sort them. easy way see order of entries.

                  date code   key 1  2017-01-02 00:00:01   na  <na>    # simple 2  2017-01-03 00:00:01   na  <na>    # simple 3  2017-01-04 00:00:01   na  <na>    # simple 4  2017-01-05 00:00:01   na  <na>    # simple 5  2017-01-06 00:00:01   na  <na>    # simple 6  2017-01-07 00:00:01   na  <na>    # simple 7  2017-01-12 08:30:00    1 tail1    # key 8  2017-02-09 11:40:00    2 tail2 9  2017-03-22 08:30:01    3 tail3 10 2017-04-13 10:30:00    4 tail4 11 2017-05-19 08:30:00    5 tail5 12 2017-06-13 08:35:00    6 tail6 13 2017-07-28 09:30:00    7 tail7 14 2017-08-23 06:30:00    8 tail8 15 2017-09-13 07:30:00    9 tail9 

the mutate_at fill in nas nearest value traveling backwards. code , key in rows 1-6 take on values in row 7.

you can use code or key filter dates interested in or determine whether date in full data frame before key date. instance,

ans <- df %>%   full_join(key, by="date") %>%   arrange(date) %>%   mutate_at(vars(code, key), funs(zoo::na.locf(., na.rm=false, fromlast=true))) 

to find dates before key[1,] 1 tail1 2017-01-12 08:30:00, can do

ans %>%    filter(code==1) # last row key data frame 

or

ans %>%    filter(key=="tail1") 

to determine if date in data frame before key date,

ans[3, ] #                  date code   key # 3 2017-01-04 00:00:01    1 tail1 

tells 3rd entry in data frame before key[1,] 1 tail1 2017-01-12 08:30:00

just in case

to fill in after each key date

df %>%   full_join(key, by="date") %>%   arrange(date) %>%   mutate_at(vars(code, key), funs(zoo::na.locf(., na.rm=false, fromlast=false))) 

output (tail)

363 2017-12-21 00:00:01    9 tail9 364 2017-12-22 00:00:01    9 tail9 365 2017-12-23 00:00:01    9 tail9 366 2017-12-24 00:00:01    9 tail9 367 2017-12-25 00:00:01    9 tail9 368 2017-12-26 00:00:01    9 tail9 369 2017-12-27 00:00:01    9 tail9 370 2017-12-28 00:00:01    9 tail9 371 2017-12-29 00:00:01    9 tail9 372 2017-12-30 00:00:01    9 tail9 373 2017-12-31 00:00:01    9 tail9 374 2018-01-01 00:00:01    9 tail9 

data

df <- data.frame(date = ymd_hms("2017-01-01 00:00:01") + days(x=1:365))  key <- structure(list(code = 1:9, key = c("tail1", "tail2", "tail3",  "tail4", "tail5", "tail6", "tail7", "tail8", "tail9"), date = structure(c(1484209800,  1486640400, 1490171401, 1492079400, 1495182600, 1497342900, 1501234200,  1503469800, 1505287800), tzone = "utc", class = c("posixct",  "posixt"))), class = "data.frame", .names = c("code", "key",  "date"), row.names = c(na, -9l)) 

Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -