python - How to generate n-level hierarchical JSON from pandas DataFrame? -
is there efficient way create hierarchical json (n-levels deep) parent values keys , not variable label? i.e:
{"2017-12-31": {"junior": {"electronics": {"a": {"sales": 0.440755 } }, {"b": {"sales": -3.230951 } } }, ...etc... }, ...etc... }, ...etc...
1. testing dataframe:
colindex=pd.multiindex.from_product([['new york','paris'], ['electronics','household'], ['a','b','c'], ['junior','senior']], names=['city','department','team','job role']) rowindex=pd.date_range('25-12-2017',periods=12,freq='d') df1=pd.dataframe(np.random.randn(12, 24), index=rowindex, columns=colindex) df1.index.name='date' df2=df1.resample('m').sum() df3=df2.stack(level=0).groupby('date').sum()
2. transformation i'm making seems logical structure build json from:
df4=df3.stack(level=[0,1,2]).reset_index() \ .set_index(['date','job role','department','team']) \ .sort_index()
3. attempts-so-far
i came across very helpful question solves problem 1 level of nesting using code along lines of:
j =(df.groupby(['id','location','country','latitude','longitude'],as_index=false) \ .apply(lambda x: x[['timestamp','tide']].to_dict('r'))\ .reset_index()\ .rename(columns={0:'tide-data'})\ .to_json(orient='records'))
...but can't find way nested .groupby()
s working:
j=(df.groupby('date', as_index=true).apply( lambda x: x.groupby('job role', as_index=true).apply( lambda x: x.groupby('department', as_index=true).apply( lambda x: x.groupby('team', as_index=true).to_dict()))) \ .reset_index().rename(columns={0:'sales'}).to_json(orient='records'))
you can use itertuples generate nested dict
, , dump json
. this, need change date timestamp string
df4=df3.stack(level=[0,1,2]).reset_index() df4['date'] = df4['date'].dt.strftime('%y-%m-%d') df4 = df4.set_index(['date','job role','department','team']) \ .sort_index()
create nested dict
def nested_dict(): return collections.defaultdict(nested_dict) result = nested_dict()
use itertuples
populate it
for row in df4.itertuples(): result[row.index[0]][row.index[1]][row.index[2]][row.index[3]]['sales'] = row._1 # print(row)
and use json
module dump it.
import json json.dumps(result)
'{"2017-12-31": {"junior": {"electronics": {"a": {"sales": -0.3947134370101142}, "b": {"sales": -0.9873530754403204}, "c": {"sales": -1.1182598058984508}}, "household": {"a": {"sales": -1.1211850078098677}, "b": {"sales": 2.0330914483907847}, "c": {"sales": 3.94762379718749}}}, "senior": {"electronics": {"a": {"sales": 1.4528493451404196}, "b": {"sales": -2.3277322345261005}, "c": {"sales": -2.8040263791743922}}, "household": {"a": {"sales": 3.0972591929279663}, "b": {"sales": 9.884565742502392}, "c": {"sales": 2.9359830722457576}}}}, "2018-01-31": {"junior": {"electronics": {"a": {"sales": -1.3580300149125217}, "b": {"sales": 1.414665000013205}, "c": {"sales": -1.432795129108244}}, "household": {"a": {"sales": 2.7783259569115346}, "b": {"sales": 2.717700275321333}, "c": {"sales": 1.4358377416259644}}}, "senior": {"electronics": {"a": {"sales": 2.8981726774941485}, "b": {"sales": 12.022897003654117}, "c": {"sales": 0.01776855733076088}}, "household": {"a": {"sales": -3.342163776613092}, "b": {"sales": -5.283208386572307}, "c": {"sales": 2.942580121975619}}}}}'
Comments
Post a Comment