dataframe - Optimize an R code having for loop -


here code, trying convert data wide format may or may not have duplicate data. tried use for-each , parallel still takes more time, can suggest change

this data processing:-

param1: 1, , 753360af0c8949c0aeab64d520599656 param2: value2 param3: value3 param4: value4 param1: 2, , 8c8c659813d842c5bab2ddba9483ea5a param2: value5 param4: value6 param3: value7 

so need wide format the above example contains 4 parameters. file can have 10 varying number of parameters text file comes different sources.

the result should this:-

param1                                    param2  param3  param4     1 753360af0c8949c0aeab64d520599656        value2  value3  value4     2 8c8c659813d842c5bab2ddba9483ea5a        value5  value6  value7 

here code same:-

f <-  read.table("./sample.txt",header = false, sep = ":",fill=true, row.names=null)  f[,2] <- paste(f[,2],f[,3],f[,4])  c <- unique(f[,1])  rw <- round(nrow(f) / length(c)) + 1  result <- data.frame(matrix(0,ncol=length(c),nrow=rw))  colnames(result) <- t(c)  wh <- which(f[,1]==c[1])  for(i in 1:(length(wh)-1)) {    print(i)    tmp <- f[(wh[i]:(wh[i+1]-1)),]    result[i,] <- t(tmp[,2][match(colnames(result),tmp[,1])])    } 

i have more 10,00,000 row process , above code not complete after day.

thanks in advance


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -