pentaho - Remove duplicated row from 2 tables in different database -
i have 2 users table in different databases , unique rows 1 tables.
on following example, need list of emails have no duplicate name.
i using pentaho di kettle.
table users database 1
id | name | email --- | ----------- | ------------- 1 | jonh snow | jonhs@got.com --- | ----------- | ------------- 2 | sansa stark | sansas@got.com --- | ----------- | ------------- 3 | ayra stark | ayras@got.com
table users database 2
id | name | email --- | ----------- | ------------- 1 | jonh stott | jonhs@example.com --- | ----------- | ------------- 2 | jonh jonh | jonhj@example.com --- | ----------- | ------------- 3 | ayra stark | ayras@got.com
desired result
id | name | email --- | ----------- | ------------- 1 | jonh snow | jonhs@got.com --- | ----------- | ------------- 2 | sansa stark | sansas@got.com
as far understand question, need keep emails not duplicates in db1 union db2?
well, follow logic: data in (with 1 input table
db connection), count number of records per emails (memory group by
) , filter
out emails count greater 1.
use memory group by
, not requires sorting. in group field
put key: email
. , in aggregates
put number of rows
(in type drop down), , first value
(or last value
) of name
otherwise column disappear stream.
and add sequence
if need create id on output.
Comments
Post a Comment