pentaho - Remove duplicated row from 2 tables in different database -


i have 2 users table in different databases , unique rows 1 tables.

on following example, need list of emails have no duplicate name.

i using pentaho di kettle.

table users database 1

id  | name        | email --- | ----------- | ------------- 1   | jonh snow   | jonhs@got.com   --- | ----------- | ------------- 2   | sansa stark | sansas@got.com   --- | ----------- | ------------- 3   | ayra stark  | ayras@got.com   

table users database 2

id  | name        | email --- | ----------- | ------------- 1   | jonh stott  | jonhs@example.com   --- | ----------- | ------------- 2   | jonh jonh   | jonhj@example.com   --- | ----------- | ------------- 3   | ayra stark  | ayras@got.com   

desired result

id  | name        | email --- | ----------- | ------------- 1   | jonh snow   | jonhs@got.com   --- | ----------- | ------------- 2   | sansa stark | sansas@got.com 

as far understand question, need keep emails not duplicates in db1 union db2?

well, follow logic: data in (with 1 input table db connection), count number of records per emails (memory group by) , filter out emails count greater 1.

use memory group by, not requires sorting. in group field put key: email. , in aggregates put number of rows (in type drop down), , first value (or last value) of name otherwise column disappear stream.

and add sequence if need create id on output.

enter image description here


Comments