performance - How can we optimize CPU/core/executor for different stages of a Spark job? -
as below picture shows:
my spark job has 3 stages:
0. groupby 1. repartition 2. collect
stage 0 , 1 pretty lightweight, stage 2 quite cpu intensive.
is possible have different configuration different stages of 1 spark job?
i thought separate spark job 2 sub-ones, defeats purpose of using spark has intermediate result stored in memory. , extend our job time.
any ideas please?
no, it's not possible change spark configurations @ runtime. see documentation sparkconf
:
note once sparkconf object passed spark, cloned , can no longer modified user. spark not support modifying configuration @ runtime.
however, guess not need repartition
before collect
, if there no other operations in-between. repartition
move data around on nodes unnecessary if want collect
them onto driver node.
Comments
Post a Comment