python - Pandas rolling window with custom look back length based on column sum -


given pandas dataframe 2 columns, "atbats" , "hits", indexed date, possible recent historical batting average (average number of hits per atbat)? example, historical batting average fewest atbats greater 10. sort of rolling window conditional number of look-back periods. example, given:

      date, atbats, hits,  2017-01-01,      5,    2, 2017-01-02,      6,    3, 2017-01-03,      1,    1, 2017-01-04,      12,   3, 2017-01-04,      1,    0, 

on first day, there have been no historical atbats. on second day, 6. since both less 10, can nan or 0.

on third day, on last 2 days , see 5+6 atbats average of (2+3)/(5+6) = 0.45 hits/atbat.

on third day, on last 3 days , (2+3+1)/(5+6+1) = 0.5 hits/atbat.

on fourth day, on last day , 4/16 = 0.25 hits/atbat. since last day has more 10 (16), don't need further.

the final dataframe like:

      date, atbats, hits,  pastatbats, pasthits, avg, 2017-01-01,      5,    2,           0,       0,   0, 2017-01-02,      6,    3,           0,       0,   0, 2017-01-03,      1,    1,          11,       5,   0.45, 2017-01-04,      16,   4,          12,       6,   0.50, 2017-01-04,      1,    0,          16,       4,   0.25, 

is sort of calculation possible in pandas?

the solution can think of pure brute force - divide hits atbats in each row, replicate each row x times, x = atbats, , rolling window of 10. in dataframe, "atbats" average 80 per day, massively increase size of dataframe , total number of windows calculate.

use iterrows achieve need. see below:

original dataframe:

index atbats  hits 1       5     2 2       6     3 3       1     1 4      16     4 4       1     0 5       1     0 6      14     2 7       5     1 

code:

data = [] last = [0,0] past_atbats = 0 past_hits = 0 i, row in df.iterrows():     if( last[0] >= 10):         data.append(last.copy())     else:         data.append([0,0])      if(row['atbats'] >= 10):         last[0] = row['atbats']         last[1] = row['hits']     else:         last[0] += row['atbats']         last[1] += row['hits']  df_past = pd.dataframe(data=data,index=df.index,columns=['past_atbats','past_hits']) df = df.merge(df_past,left_index=true,right_index=true) df['avg'] = df['past_hits'].divide(df['past_atbats']) 

result in:

index atbats  hits  past_atbats  past_hits       avg 1       5     2            0          0       nan 2       6     3            0          0       nan 3       1     1           11          5  0.454545 4      16     4           12          6  0.500000 4      16     4           16          4  0.250000 4       1     0           12          6  0.500000 4       1     0           16          4  0.250000 5       1     0           17          4  0.235294 6      14     2           18          4  0.222222 7       5     1           14          2  0.142857 

probably optimization can done think helps you.


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -