dataframe - How to add column to exploded struct in Spark? -

- January 15, 2013

say have following data:

{"id":1, "payload":[{"foo":1, "lol":2},{"foo":2, "lol":2}]}

i explode payload , add column it, this:

df = df.select('id', f.explode('payload').alias('data')) df = df.withcolumn('data.bar', f.col('data.foo') * 2)

however results in dataframe 3 columns:

id
data
data.bar

i expected data.bar part of data struct...

how can add column exploded struct, instead of adding top-level column?

df = df.withcolumn('data', f.struct(     df['data']['foo'].alias('foo'),    (df['data']['foo'] * 2).alias('bar') ))

this result in:

root  |-- id: long (nullable = true)  |-- data: struct (nullable = false)  |    |-- col1: long (nullable = true)  |    |-- bar: long (nullable = true)

update:

def func(x):     tmp = x.asdict()     tmp['foo'] = tmp.get('foo', 0) * 100     res = zip(*tmp.items())     return row(*res[0])(*res[1])  df = df.withcolumn('data', f.userdefinedfunction(func, structtype(     [structfield('foo', stringtype()), structfield('lol', stringtype())]))(df['data']))

p.s.

spark not support inplace opreation.

so every time want inplace, need replace actually.

Search This Blog

ANy

dataframe - How to add column to exploded struct in Spark? -

Comments

Post a Comment

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

ios - MKAnnotationView layer is not of expected type: MKLayer -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -