python - Activity log into Pandas Series -
i have input csv (userid, datetime) in order analyze activity of users generate pandas series index being date, , columns being users (multiple series). values sum of activities of each user aggregated on date (for instance: day).
thanks in advance.
fake data
n = 100 np.random.seed(1) userid = np.random.randint(0, 10, n) datetime = np.random.randint(0, 10, n) + pd.timestamp('2016-1-1', freq='d') activity = np.random.randint(0,1000, n) df = pd.dataframe({'userid':userid, 'datetime':datetime, 'activity':activity}) df.head(10) datetime userid activity 0 2016-01-10 5 788 1 2016-01-01 8 44 2 2016-01-03 9 271 3 2016-01-01 5 670 4 2016-01-08 0 475 5 2016-01-02 0 910 6 2016-01-08 1 499 7 2016-01-10 7 787 8 2016-01-09 6 251 9 2016-01-05 9 666
solution
df.groupby(['datetime', 'userid'])['activity'].sum().unstack(fill_value=0) userid 0 1 2 3 4 5 6 7 8 9 datetime 2016-01-01 0 166 1091 1878 583 670 0 1524 577 881 2016-01-02 910 0 0 810 2146 706 182 138 1157 0 2016-01-03 0 0 0 0 433 0 1955 1914 566 561 2016-01-04 51 407 598 0 0 0 440 783 0 0 2016-01-05 0 324 662 0 0 0 0 990 79 2849 2016-01-06 0 959 0 230 878 0 0 656 879 300 2016-01-07 1390 100 0 575 0 0 0 806 87 1243 2016-01-08 975 499 503 0 657 0 403 755 0 1271 2016-01-09 342 0 739 617 0 1297 251 1207 324 458 2016-01-10 963 832 0 0 975 1179 0 787 717 145
Comments
Post a Comment