pentaho - Remove duplicated row from 2 tables in different database -
i have 2 users table in different databases , unique rows 1 tables.
on following example, need list of emails have no duplicate name.
i using pentaho di kettle.
table users database 1
id | name | email --- | ----------- | ------------- 1 | jonh snow | jonhs@got.com --- | ----------- | ------------- 2 | sansa stark | sansas@got.com --- | ----------- | ------------- 3 | ayra stark | ayras@got.com table users database 2
id | name | email --- | ----------- | ------------- 1 | jonh stott | jonhs@example.com --- | ----------- | ------------- 2 | jonh jonh | jonhj@example.com --- | ----------- | ------------- 3 | ayra stark | ayras@got.com desired result
id | name | email --- | ----------- | ------------- 1 | jonh snow | jonhs@got.com --- | ----------- | ------------- 2 | sansa stark | sansas@got.com
as far understand question, need keep emails not duplicates in db1 union db2?
well, follow logic: data in (with 1 input table db connection), count number of records per emails (memory group by) , filter out emails count greater 1.
use memory group by, not requires sorting. in group field put key: email. , in aggregates put number of rows (in type drop down), , first value (or last value) of name otherwise column disappear stream.
and add sequence if need create id on output.

Comments
Post a Comment