python - Pandas rolling window with custom look back length based on column sum -
given pandas dataframe 2 columns, "atbats" , "hits", indexed date, possible recent historical batting average (average number of hits per atbat)? example, historical batting average fewest atbats greater 10. sort of rolling window conditional number of look-back periods. example, given:
date, atbats, hits, 2017-01-01, 5, 2, 2017-01-02, 6, 3, 2017-01-03, 1, 1, 2017-01-04, 12, 3, 2017-01-04, 1, 0,
on first day, there have been no historical atbats. on second day, 6. since both less 10, can nan or 0.
on third day, on last 2 days , see 5+6 atbats average of (2+3)/(5+6) = 0.45 hits/atbat.
on third day, on last 3 days , (2+3+1)/(5+6+1) = 0.5 hits/atbat.
on fourth day, on last day , 4/16 = 0.25 hits/atbat. since last day has more 10 (16), don't need further.
the final dataframe like:
date, atbats, hits, pastatbats, pasthits, avg, 2017-01-01, 5, 2, 0, 0, 0, 2017-01-02, 6, 3, 0, 0, 0, 2017-01-03, 1, 1, 11, 5, 0.45, 2017-01-04, 16, 4, 12, 6, 0.50, 2017-01-04, 1, 0, 16, 4, 0.25,
is sort of calculation possible in pandas?
the solution can think of pure brute force - divide hits atbats in each row, replicate each row x times, x = atbats, , rolling window of 10. in dataframe, "atbats" average 80 per day, massively increase size of dataframe , total number of windows calculate.
use iterrows achieve need. see below:
original dataframe:
index atbats hits 1 5 2 2 6 3 3 1 1 4 16 4 4 1 0 5 1 0 6 14 2 7 5 1
code:
data = [] last = [0,0] past_atbats = 0 past_hits = 0 i, row in df.iterrows(): if( last[0] >= 10): data.append(last.copy()) else: data.append([0,0]) if(row['atbats'] >= 10): last[0] = row['atbats'] last[1] = row['hits'] else: last[0] += row['atbats'] last[1] += row['hits'] df_past = pd.dataframe(data=data,index=df.index,columns=['past_atbats','past_hits']) df = df.merge(df_past,left_index=true,right_index=true) df['avg'] = df['past_hits'].divide(df['past_atbats'])
result in:
index atbats hits past_atbats past_hits avg 1 5 2 0 0 nan 2 6 3 0 0 nan 3 1 1 11 5 0.454545 4 16 4 12 6 0.500000 4 16 4 16 4 0.250000 4 1 0 12 6 0.500000 4 1 0 16 4 0.250000 5 1 0 17 4 0.235294 6 14 2 18 4 0.222222 7 5 1 14 2 0.142857
probably optimization can done think helps you.
Comments
Post a Comment