dataframe - How to add column to exploded struct in Spark? -
say have following data:
{"id":1, "payload":[{"foo":1, "lol":2},{"foo":2, "lol":2}]}
i explode payload , add column it, this:
df = df.select('id', f.explode('payload').alias('data')) df = df.withcolumn('data.bar', f.col('data.foo') * 2)
however results in dataframe 3 columns:
id
data
data.bar
i expected data.bar
part of data
struct...
how can add column exploded struct, instead of adding top-level column?
df = df.withcolumn('data', f.struct( df['data']['foo'].alias('foo'), (df['data']['foo'] * 2).alias('bar') ))
this result in:
root |-- id: long (nullable = true) |-- data: struct (nullable = false) | |-- col1: long (nullable = true) | |-- bar: long (nullable = true)
update:
def func(x): tmp = x.asdict() tmp['foo'] = tmp.get('foo', 0) * 100 res = zip(*tmp.items()) return row(*res[0])(*res[1]) df = df.withcolumn('data', f.userdefinedfunction(func, structtype( [structfield('foo', stringtype()), structfield('lol', stringtype())]))(df['data']))
p.s.
spark not support inplace opreation.
so every time want inplace, need replace actually.
Comments
Post a Comment