python - Pandas - DateTime within X amount of minutes from row -


i not entirely positive best way ask or phrase question highlight problem, dataset, thoughts on method , end goal , clear end.

my problem:

my company dispatches workers , load dispatches single employee if on current dispatch. due limitation in software use. if employee receives 2 dispatches within 30 minutes, call double dispatch.

we analyzing our dispatching efficiency , running bit of head scratcher. need run through our 100k row database , add additional column read dummy variable 1 double 0 normal. have multiple people dispatch , b our records not start ordered dispatch, need determine how dispatch occurs same person within 30 minutes.

dataset:

the dataset incredibly massive due poor organization in our data warehouse terms of items need these columns need calc.

tech name         |  dispatch time (pst)    john smith        |  1/1/2017 12:34 jane smith        |  1/1/2017 12:46 john smith        |  1/1/2017 18:32 john smith        |  1/1/2017 18:50 

my thoughts: how clunky , work 1 way not backwards. more or less write code as:

import pandas pd  df = pd.read_excel('data.xlsx') df.sort('dispatch time (pst)', inplace = true)  tech_name = none dispatch_time = pd.to_datetime('1/1/1900 00:00:00')  index, row in df.iterrows():     if tech_name none:         tech_name = row['tech name']     else:         if dispatch_time.pd.time_delta('0 days 00:30:00') > row['tech dispatch time (pst)'] , row['tech name'] = tech_name:             row['double dispatch'] = 1             dispatch_time = row['tech dispatch time (pst)']         else:             dispatch_time = row['tech dispatch time (pst)']             tech_name = row['tech name'] 

this has many problems being slow, tracking dates going backwards , not forwards missing many dispatches.

end goal:

my goal have dataset can plug tableau report adding on 1 column reads dummy variable can filter , calculate on that.

i appreciate time , , let me know if more details necessary.

thank you!

------------------ edit ------------- added edit make question clear failed earlier.

question: pandas best tool use iterate on dataframe see each each datetime dispatch, there record matches tech's name , less 30 minutes away record.

if so, how improve algorithm or theory, if not best tool be.

desired output - additional column records if dispatch happened within 30 minute window dummy variable 1 true 0 false. need see when double dispatches occuring , how many records true double dispatches, , not count says there 100 instances of double dispatch, involved on 200 records. need able sort , see each record.

hello think found solution. slow, compares 1 index before or after, in terms of cases have 3 dispatches within thirty minutes, represents less .5 % us.

import pandas pd import numpy np import datetime dt  dispatch = 'tech dispatched date-time (pst)' tech = 'combinedtech' df = pd.read_excel('combined_data.xlsx') df.sort_values(dispatch, inplace=true) df.reset_index(inplace = true) df['double dispatch'] = np.nan  writer = pd.excelwriter('final_output.xlsx', engine='xlsxwriter')  dispatch_count = 0 time = dt.timedelta(minutes = 30)  index, row in df.iterrows():     try:         tech_one = df[tech].loc[(index - 1)]         dispatch_one = df[dispatch].loc[(index - 1)]     except keyerror:         tech_one = none         dispatch_one = pd.to_datetime('1/1/1990 00:00:00')     try:         tech_two = df[tech].loc[(index + 1)]         dispatch_two = df[dispatch].loc[(index + 1)]     except keyerror:         tech_two = none         dispatch_two = pd.to_datetime('1/1/2020 00:00:00')     first_time = dispatch_one + time     second_time = pd.to_datetime(row[dispatch]) + time      dispatch_pd = pd.to_datetime(row[dispatch])     if tech_one == row[tech] or tech_two == row[tech]:         if first_time > row[dispatch] or second_time > dispatch_two:             df.set_value(index, 'double dispatch', 1)             dispatch_count += 1         else:             df.set_value(index, 'double dispatch', 0)             dispatch_count += 1     print(dispatch_count) # monitor total # of records being pushed through  df.to_excel(writer,sheet_name='sheet1') writer.save() writer.close() 

Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -