r - Filtering records that occur more than once with multiple variables -


this question has answer here:

i have sample dataset. objective keep records user_id and plan_id occur more once. understand can count frequency of variable in column

n_occur <- data.frame(table(test$user_id)) 

but how 1 count frequency of variables in 2 columns and then filter original dataset occur more once? example, here test dataset:

> test    user_id plan_id hour 1        1      10    2 2        2      10    4 3        3      20   23 4        4      20   12 5        5      10    8 6        1      10   10 7        5      20    6 8        1      20    5 9        1      20   18 10       5      10    7 11       1      30    6 

and here intended output:

> output   user_id plan_id hour 1       1      10    2 2       5      10    8 3       1      10   10 4       1      20    5 5       1      20    8 6       5      10   17 

and data:

> dput(test) structure(list(user_id = c(1, 2, 3, 4, 5, 1, 5, 1, 1, 5, 1),      plan_id = c(10, 10, 20, 20, 10, 10, 20, 20, 20, 10, 30),      hour = c(2, 4, 23, 12, 8, 10, 6, 5, 18, 7, 6)), .names = c("user_id",  "plan_id", "hour"), row.names = c(na, 11l), class = "data.frame") 

any suggestions appreciated!

you can use duplicated check id columns beginning , end, if either returns true, row appears more once; can use returned logical vector subset data frame:

ids <- df[c('user_id', 'plan_id')] df[duplicated(ids) | duplicated(ids, fromlast = true),]  #   user_id plan_id hour #1        1      10    2 #5        5      10    8 #6        1      10   10 #8        1      20    5 #9        1      20   18 #10       5      10    7 

Comments

Popular posts from this blog

ios - MKAnnotationView layer is not of expected type: MKLayer -

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -