python - Pandas add column from one dataframe to another based on a join -


assume have 2 dataframes. want add column of dataframe 1 dataframe 2 based on column lookup. if join not possible, want in column constant (so can filter that).

graphically:

enter image description here

code:

import pandas pd import numpy np  data = np.array([['','col1','col2'],                 ['row1','2','two'],                 ['row2','1','one']]             )  data2 = np.array([['','col3','col4'],                 ['row1','1','t1'],                 ['row2','2','t2'],                 ['row3','3','t3']]             )  df = pd.dataframe(data=data[1:,1:],                   index=data[1:,0],                   columns=data[0,1:])  df2 = pd.dataframe(data=data2[1:,1:],                   index=data2[1:,0],                   columns=data2[0,1:])  result_df = df2 + join col2 based on df2.col3 = df.col1. add string constant if join fails.   print(df) print(df2) print(result_df) 

use join or map:

df = df2.join(df.set_index('col1'), on='col3') print (df)      col3 col4 col2 row1    1   t1  1 row2    2   t2  2 row3    3   t3  nan 

df2['col2'] = df2['col3'].map(df.set_index('col1')['col2']) print (df2)      col3 col4 col2 row1    1   t1  1 row2    2   t2  2 row3    3   t3  nan 

Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -