Rolling Average in Microsoft Data Lake Analytics -


i running rolling average on value1 field. window of size 11 , centered. code below (i have omitted output statement). have included input file.

declare @window_size int = 11; declare @window_lag int = (@window_size - 1) / 2;  @data =    extract timestamp datetime,         site string,         value1 double    @input    using extractors.csv(quoting : true, skipfirstnrows : 1, silent : false);  // adjust timestamp utc @data =    select timestamp.touniversaltime() timestamp,        site,        value1    @data;  @avgdata =    select timestamp,        site,        value1,         avg(value1) over(partition site order timestamp rows between      @window_lag preceding , @window_lag following) rolling_avg  @data; 

the calculated values seem ok. match in r (d data frame containing loaded value of input file):

window_size = 11   d %>%   group_by(site) %>%   mutate(      rolling_avg = zoo::rollapply(data = value1, fun = mean, align = "center", width = window_size, partial = t)    ) 

but last 5 values of each of groups, different values. below last 6 values site1 calculated values in r , dla.

# tibble: 6 x 3     value1 rolling_avg_r rolling_avg_dla      <dbl>         <dbl>           <dbl> 1 21.87613      46.64412        46.64412 2 76.77852      48.48727        50.47584 3 10.81344      52.50682        55.75313 4 70.84038      50.77922        56.11984 5 86.66709      53.73964        59.78395 6 70.36151      56.22284        62.64951 

for the last line, rolling average should be:

 mean(c(21.87613, 76.77852, 10.81344, 70.84038, 86.66709, 70.36151)) 

56.22284

and different value returned dla (which 62.64951).


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -