Spark-Scala quote issue -

- August 15, 2013

i have input data in iso-8859-1 format. cedilla delimited file. data has double quote in it. converting file utf8 format. when doing so, spark inserting escape character , more quotes. can make sure quotes , escape character not added output?

sample input

xyzÇvib bros crane , big "tonyÇ1961-02-23Ç00:00:00

sample output

xyzÇ"vib bros crane , big \"tony"Ç1961-02-23Ç00:00:00

code

var inputformatdataframe = sparksession.sqlcontext.read                 .format("com.databricks.spark.csv")                 .option("delimiter", delimiter)                 .option("charset", input_format)                 .option("header", "false")                 .option("treatemptyvaluesasnulls","true")                 .option("nullvalue"," ")                 .option("quote","")                 .option("quotemode","none")                 //.option("escape","\"")                 .option("ignoreleadingwhitespace", "true")                 .option("ignoretrailingwhitespace", "true")                 .option("mode","failfast")                 .load(input_location)                 inputformatdataframe.write.mode("overwrite").option("delimiter", delimiter).option("charset", "utf-8").csv(output_location)

Search This Blog

ANy

Spark-Scala quote issue -

Comments

Post a Comment

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

ios - MKAnnotationView layer is not of expected type: MKLayer -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -