python - How to generate n-level hierarchical JSON from pandas DataFrame? -


is there efficient way create hierarchical json (n-levels deep) parent values keys , not variable label? i.e:

{"2017-12-31":     {"junior":         {"electronics":             {"a":                 {"sales": 0.440755                 }             },             {"b":                 {"sales": -3.230951                 }             }         }, ...etc...     }, ...etc... }, ...etc...  

1. testing dataframe:

colindex=pd.multiindex.from_product([['new york','paris'],                                      ['electronics','household'],                                      ['a','b','c'],                                      ['junior','senior']],                                names=['city','department','team','job role'])  rowindex=pd.date_range('25-12-2017',periods=12,freq='d')  df1=pd.dataframe(np.random.randn(12, 24), index=rowindex, columns=colindex) df1.index.name='date' df2=df1.resample('m').sum() df3=df2.stack(level=0).groupby('date').sum() 

source dataframe


2. transformation i'm making seems logical structure build json from:

df4=df3.stack(level=[0,1,2]).reset_index() \     .set_index(['date','job role','department','team']) \     .sort_index() 

transformed dataframe


3. attempts-so-far

i came across very helpful question solves problem 1 level of nesting using code along lines of:

j =(df.groupby(['id','location','country','latitude','longitude'],as_index=false) \     .apply(lambda x: x[['timestamp','tide']].to_dict('r'))\     .reset_index()\     .rename(columns={0:'tide-data'})\     .to_json(orient='records')) 

...but can't find way nested .groupby()s working:

j=(df.groupby('date', as_index=true).apply(     lambda x: x.groupby('job role', as_index=true).apply(         lambda x: x.groupby('department', as_index=true).apply(             lambda x: x.groupby('team', as_index=true).to_dict())))  \                 .reset_index().rename(columns={0:'sales'}).to_json(orient='records')) 

you can use itertuples generate nested dict, , dump json. this, need change date timestamp string

df4=df3.stack(level=[0,1,2]).reset_index()  df4['date'] = df4['date'].dt.strftime('%y-%m-%d') df4 = df4.set_index(['date','job role','department','team']) \     .sort_index() 

create nested dict

def nested_dict():     return collections.defaultdict(nested_dict) result = nested_dict() 

use itertuples populate it

for row in df4.itertuples():     result[row.index[0]][row.index[1]][row.index[2]][row.index[3]]['sales'] = row._1     # print(row) 

and use json module dump it.

import json json.dumps(result) 

'{"2017-12-31": {"junior": {"electronics": {"a": {"sales": -0.3947134370101142}, "b": {"sales": -0.9873530754403204}, "c": {"sales": -1.1182598058984508}}, "household": {"a": {"sales": -1.1211850078098677}, "b": {"sales": 2.0330914483907847}, "c": {"sales": 3.94762379718749}}}, "senior": {"electronics": {"a": {"sales": 1.4528493451404196}, "b": {"sales": -2.3277322345261005}, "c": {"sales": -2.8040263791743922}}, "household": {"a": {"sales": 3.0972591929279663}, "b": {"sales": 9.884565742502392}, "c": {"sales": 2.9359830722457576}}}}, "2018-01-31": {"junior": {"electronics": {"a": {"sales": -1.3580300149125217}, "b": {"sales": 1.414665000013205}, "c": {"sales": -1.432795129108244}}, "household": {"a": {"sales": 2.7783259569115346}, "b": {"sales": 2.717700275321333}, "c": {"sales": 1.4358377416259644}}}, "senior": {"electronics": {"a": {"sales": 2.8981726774941485}, "b": {"sales": 12.022897003654117}, "c": {"sales": 0.01776855733076088}}, "household": {"a": {"sales": -3.342163776613092}, "b": {"sales": -5.283208386572307}, "c": {"sales": 2.942580121975619}}}}}'


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -