performance - How can we optimize CPU/core/executor for different stages of a Spark job? -

- April 15, 2011

as below picture shows:

my spark job has 3 stages:

0. groupby 1. repartition 2. collect

stage 0 , 1 pretty lightweight, stage 2 quite cpu intensive.

is possible have different configuration different stages of 1 spark job?

i thought separate spark job 2 sub-ones, defeats purpose of using spark has intermediate result stored in memory. , extend our job time.

any ideas please?

no, it's not possible change spark configurations @ runtime. see documentation sparkconf:

note once sparkconf object passed spark, cloned , can no longer modified user. spark not support modifying configuration @ runtime.

however, guess not need repartition before collect, if there no other operations in-between. repartition move data around on nodes unnecessary if want collect them onto driver node.

Search This Blog

ANy

performance - How can we optimize CPU/core/executor for different stages of a Spark job? -

Comments

Post a Comment

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -

ios - MKAnnotationView layer is not of expected type: MKLayer -