json - Proces files fast in loop with function in R -
i have read 88 thousand json files , process them. r-script looks this:
webshop_id = numeric() serving = data.frame(value = numeric(), unit = character()) alcohol = data.frame(alcoholpercentage = numeric()) voeding = data.frame(character()) filenames <- list.files("d:/thijs/documents/ah/productfeatures/allproducts", pattern="*.json", full.names=true) tic() (i in 1:length(filenames)) { df = fromjson(filenames[i]) webshop_id = c(webshop_id, df$product$webshopid) if (is.null(df$productlabels$alcoholpercentage)==true){ alcohol = rbind(alcohol, alcoholpercentage = na) } else{ alcohol = suppresswarnings(rbind(alcohol, alcoholpercentage = as.numeric(gsub("%","",df$productlabels$alcoholpercentage)))) }if (length(df$productlabels$nutrition$nutritionvalues[[1]])==0){ temp_row = data.frame(matrix(c(rep.int(na,length(voeding))),nrow=1,ncol=length(voeding))) colnames(temp_row) = colnames(voeding) voeding = rbind(voeding, temp_row) serving = rbind(serving, data.frame(value = na, unit = na)) } else { temp_row = suppresswarnings(t(as.character(sub(",",".",df$productlabels$nutrition$nutritionvalues[[1]]$description, fixed=true)))) colnames(temp_row) = t(df$productlabels$nutrition$nutritionvalues[[1]]$name) temp_row = data.frame(temp_row) voeding = rbind.fill(voeding,temp_row) if (length(df$productlabels$nutrition$servingsizecode)==0){ serving = rbind(serving, data.frame(value = na, unit = na)) } else{ serving = rbind(serving, data.frame(value = as.numeric(df$productlabels$nutrition$servingsizevalue[1]), unit = df$productlabels$nutrition$servingsizecode[1])) } } } toc() nutrition = data.frame(webshop_id = webshop_id, serving, alcohol, voeding)
when i'm running few files, there little no problems speed wise. however, 88 thousand, takes long time load in , process them. since lot of files have missing information, hence if
functions.
is there more efficient way load in , process these files?
Comments
Post a Comment