performance - How can we optimize CPU/core/executor for different stages of a Spark job? -


as below picture shows:

enter image description here

my spark job has 3 stages:

0. groupby 1. repartition 2. collect 

stage 0 , 1 pretty lightweight, stage 2 quite cpu intensive.

is possible have different configuration different stages of 1 spark job?

i thought separate spark job 2 sub-ones, defeats purpose of using spark has intermediate result stored in memory. , extend our job time.

any ideas please?

no, it's not possible change spark configurations @ runtime. see documentation sparkconf:

note once sparkconf object passed spark, cloned , can no longer modified user. spark not support modifying configuration @ runtime.


however, guess not need repartition before collect, if there no other operations in-between. repartition move data around on nodes unnecessary if want collect them onto driver node.


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -