pentaho - Remove duplicated row from 2 tables in different database -


i have 2 users table in different databases , unique rows 1 tables.

on following example, need list of emails have no duplicate name.

i using pentaho di kettle.

table users database 1

id  | name        | email --- | ----------- | ------------- 1   | jonh snow   | jonhs@got.com   --- | ----------- | ------------- 2   | sansa stark | sansas@got.com   --- | ----------- | ------------- 3   | ayra stark  | ayras@got.com   

table users database 2

id  | name        | email --- | ----------- | ------------- 1   | jonh stott  | jonhs@example.com   --- | ----------- | ------------- 2   | jonh jonh   | jonhj@example.com   --- | ----------- | ------------- 3   | ayra stark  | ayras@got.com   

desired result

id  | name        | email --- | ----------- | ------------- 1   | jonh snow   | jonhs@got.com   --- | ----------- | ------------- 2   | sansa stark | sansas@got.com 

as far understand question, need keep emails not duplicates in db1 union db2?

well, follow logic: data in (with 1 input table db connection), count number of records per emails (memory group by) , filter out emails count greater 1.

use memory group by, not requires sorting. in group field put key: email. , in aggregates put number of rows (in type drop down), , first value (or last value) of name otherwise column disappear stream.

and add sequence if need create id on output.

enter image description here


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -