apache spark - set outputCommitterClass property in hadoop2 -

- July 15, 2012

i have been researching problem past few weeks, , didn't find clear answer.

here problem:

for hadoop1x (in mapred lib), use customized output committer using:

spark.conf.set(   "spark.hadoop.mapred.output.committer.class",   "some committer"  )

or calling jobconf.setoutputcommitter.

however, hadoop2x (in mapreduce lib), gets committer outputformat.getoutputcommitter, there no clear answer on how setoutputcommitter.

i found databricks set output committer using property, spark.hadoop.spark.sql.sources.outputcommitterclass.

i tried netflix's s3 committer(com.netflix.bdp.s3.s3directoryoutputcommitter), in log, spark still uses default committer:

17/09/13 22:39:36 info fileoutputcommitter: file output committer algorithm version 2 17/09/13 22:39:36 info directfileoutputcommitter: nothing clean since no temporary files written. 17/09/13 22:39:36 info csemultipartuploadoutputstream: close closed:false s3://xxxx/testtable3/.hive-staging_hive_2017-09-13_22-39-34_140_3769635956945982238-1/-ext-10000/_success

i'm wondering if it's possible overwrite default fileoutputcommitter , use customized committer in mapreduce lib?

how do it?

not no; it's i'm trying fix mapreduce-6823 -where you'll able set committer per filesystem schema. won't surface while (hadoop 3.1?)

you should able away setting sql output committer, though i'd check path. kicks in sql/dataframe work. can set parquet separately, though committer declare must subclass of parquetoutputcommitter, netflix 1 isn't.

Search This Blog

ANy

apache spark - set outputCommitterClass property in hadoop2 -

Comments

Post a Comment

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -

ios - MKAnnotationView layer is not of expected type: MKLayer -