dataframe - Optimize an R code having for loop -
here code, trying convert data wide format may or may not have duplicate data. tried use for-each , parallel still takes more time, can suggest change
this data processing:-
param1: 1, , 753360af0c8949c0aeab64d520599656 param2: value2 param3: value3 param4: value4 param1: 2, , 8c8c659813d842c5bab2ddba9483ea5a param2: value5 param4: value6 param3: value7
so need wide format the above example contains 4 parameters. file can have 10 varying number of parameters text file comes different sources.
the result should this:-
param1 param2 param3 param4 1 753360af0c8949c0aeab64d520599656 value2 value3 value4 2 8c8c659813d842c5bab2ddba9483ea5a value5 value6 value7
here code same:-
f <- read.table("./sample.txt",header = false, sep = ":",fill=true, row.names=null) f[,2] <- paste(f[,2],f[,3],f[,4]) c <- unique(f[,1]) rw <- round(nrow(f) / length(c)) + 1 result <- data.frame(matrix(0,ncol=length(c),nrow=rw)) colnames(result) <- t(c) wh <- which(f[,1]==c[1]) for(i in 1:(length(wh)-1)) { print(i) tmp <- f[(wh[i]:(wh[i+1]-1)),] result[i,] <- t(tmp[,2][match(colnames(result),tmp[,1])]) }
i have more 10,00,000 row process , above code not complete after day.
thanks in advance
Comments
Post a Comment