performance - How can we optimize CPU/core/executor for different stages of a Spark job? -
as below picture shows:
my spark job has 3 stages:
0. groupby 1. repartition 2. collect stage 0 , 1 pretty lightweight, stage 2 quite cpu intensive.
is possible have different configuration different stages of 1 spark job?
i thought separate spark job 2 sub-ones, defeats purpose of using spark has intermediate result stored in memory. , extend our job time.
any ideas please?
no, it's not possible change spark configurations @ runtime. see documentation sparkconf:
note once sparkconf object passed spark, cloned , can no longer modified user. spark not support modifying configuration @ runtime.
however, guess not need repartition before collect, if there no other operations in-between. repartition move data around on nodes unnecessary if want collect them onto driver node.

Comments
Post a Comment