Speeding up SQL Server cross apply to get aggregated data -
in sql server, trying put single query grabs row , includes aggregated data 2 hour window before row aggregated data 1 hour window after. how can make run faster?
the rows have time stamps millisecond precision, , not evenly spaced. have on 50 million rows in table, , query not seem completing. there indexes in many places, don't seem help. thinking using window function, not sure possible have sliding window unevenly distributed rows. also, future 1 hour window, not sure how done sql window.
box string , has 10 unique values. process string , has 30 unique values. average duration_ms 200 ms. errors account less 0.1% of data. 50 million rows describes years worth of data.
select c1.start_time, c1.end_time, c1.box, c1.process, datediff(ms,c1.start_time,c1.end_time) duration_ms, datepart(dw,c1.start_time) day_of_week, datepart(hour,c1.start_time) hour_of_day, c3.*, c5.* metrics_table c1 cross apply (select avg(cast(datediff(ms,c2.start_time,c2.end_time) numeric)) avg_ms, count(1) num_process_total, count(distinct process) num_process_unique, count(distinct box) num_box_unique metrics_table c2 datediff(minute,c2.start_time,c1.start_time) <= 120 , c1.start_time> c2.start_time , c2.error_code = 0 ) c3 cross apply (select avg(case when datediff(ms,c4.start_time,c4.end_time)>1000 1.0 else 0.0 end) percent_over_thresh metrics_table c4 datediff(hour,c1.start_time,c4.start_time) <= 1 , c4.start_time> c1.start_time , c4.error_code= 0 ) c5 c1.error_code= 0
edit
version: sql azure 12.0
the following should step in right direction... note: c2.start_time & c4.start_time no longer wrappen in datediff functions making them sargable...
select c1.start_time, c1.end_time, c1.box, c1.process, datediff(ms, c1.start_time, c1.end_time) duration_ms, datepart(dw, c1.start_time) day_of_week, datepart(hour, c1.start_time) hour_of_day, --c3.*, avg_ms = case when c5.* dbo.metrics_table c1 cross apply ( select avg(cast(datediff(ms, c2.start_time, c2.end_time) numeric)) avg_ms, count(1) num_process_total, count(distinct process) num_process_unique, count(distinct box) num_box_unique dbo.metrics_table c2 --datediff(minute,c2.start_time,c1.start_time) <= 120 c2.start_time <= dateadd(minute, -120, c1.start_time) --and c1.start_time> c2.start_time , c2.error_code = 0 ) c3 cross apply ( select avg(case when datediff(ms, c4.start_time, c4.end_time) > 1000 1.0 else 0.0 end ) percent_over_thresh dbo.metrics_table c4 --datediff(hour, c1.start_time, c4.start_time) <= 1 c4.start_time >= dateadd(hour, 1, c1.start_time) --and c4.start_time> c1.start_time , c4.error_code = 0 ) c5 c1.error_code = 0;
of course, making query sargable doesn't unless there's appropriate index available. following should 3 metrics_table references... (see indexes available, there's chance may not need create new index)
create nonclustered index ixf_metricstable_errorcode_starttime on dbo.metrics_table ( error_code, start_time ) include ( end_time, box, process ) error_code = 0;
Comments
Post a Comment