apache spark - How to get this using scala -

- April 15, 2010

**df1**   **df2**            **output_df** 120 d                      120  null  120 e        b               120  null  b 125 f        c               120  null  c              d               120   d    d              e               120   e    e              f               120  null  f              g               120  null  g              h               120  null  h                              125  null                               125  null  b                              125  null  c                              125  null  d                              125  null  e                              125   f    f                              125  null  g                              125  null  h

from dataframe 1 , 2 need final output dataframe in spark-shell. a,b,c,d,e,f in date format(yyyy-mm-dd) & 120,125 ticket_id's column there thousands of ticket_id's. extracted 1 out of here.

full join of possible values, left join original dataframe:

import hivecontext.implicits._ val df1data = list((120, "d"), (120, "e"), (125, "f")) val df2data = list("a", "b", "c", "d", "e", "f", "g", "h") val df1 = sparkcontext.parallelize(df1data).todf("id", "date") val df2 = sparkcontext.parallelize(df2data).todf("date")  // unique id: 120, 125 val uniqueiddf = df1.select(col("id")).distinct() val fulljoin = uniqueiddf.join(df2) val result = fulljoin.as("full").join(df1.as("df1"), col("full.id") === col("df1.id") && col("full.date") === col("df1.date"), "left_outer")  val sorted = result.select(col("full.id"), col("df1.date"), col("full.date")).sort(col("full.id"), col("full.date")) sorted.show(false)

output:

+---+----+----+ |id |date|date| +---+----+----+ |120|null|a   | |120|null|b   | |120|null|c   | |120|d   |d   | |120|e   |e   | |120|null|f   | |120|null|g   | |120|null|h   | |125|null|a   | |125|null|b   | |125|null|c   | |125|null|d   | |125|null|e   | |125|f   |f   | |125|null|g   | |125|null|h   | +---+----+----+

sorting here show same result, can skipped.

Search This Blog

ANy

apache spark - How to get this using scala -

Comments

Post a Comment

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

ios - MKAnnotationView layer is not of expected type: MKLayer -

unity3d - Unity SceneManager.LoadScene quits application -