Rolling Average in Microsoft Data Lake Analytics -
i running rolling average on value1
field. window of size 11 , centered. code below (i have omitted output
statement). have included input file.
declare @window_size int = 11; declare @window_lag int = (@window_size - 1) / 2; @data = extract timestamp datetime, site string, value1 double @input using extractors.csv(quoting : true, skipfirstnrows : 1, silent : false); // adjust timestamp utc @data = select timestamp.touniversaltime() timestamp, site, value1 @data; @avgdata = select timestamp, site, value1, avg(value1) over(partition site order timestamp rows between @window_lag preceding , @window_lag following) rolling_avg @data;
the calculated values seem ok. match in r (d
data frame containing loaded value of input file):
window_size = 11 d %>% group_by(site) %>% mutate( rolling_avg = zoo::rollapply(data = value1, fun = mean, align = "center", width = window_size, partial = t) )
but last 5 values of each of groups, different values. below last 6 values site1
calculated values in r , dla.
# tibble: 6 x 3 value1 rolling_avg_r rolling_avg_dla <dbl> <dbl> <dbl> 1 21.87613 46.64412 46.64412 2 76.77852 48.48727 50.47584 3 10.81344 52.50682 55.75313 4 70.84038 50.77922 56.11984 5 86.66709 53.73964 59.78395 6 70.36151 56.22284 62.64951
for the last line, rolling average should be:
mean(c(21.87613, 76.77852, 10.81344, 70.84038, 86.66709, 70.36151))
56.22284
and different value returned dla (which 62.64951
).
Comments
Post a Comment