Spark Production setup and its execution -


if spark running in own cluster(not hadoop cluster), in spark application,

rdd = sparkcontext.textfile("hdfs://.../file.txt") 
  • when data (for ex 1 tb of data)actually transferred spark executor nodes processing?
  • how data split across nodes?
  • is ha feature , fault tolerance available way of replication in executor nodes well?
  • is feasible peek worker nodes see location of data files?
  • is spark installed on distributed cluster bring processing closer data. ex, in cassandra cluster or hbase cluster, or hadoop cluster

i bit confused spark setup , execution. can point me links clears above queries?


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -