Spark Production setup and its execution -
if spark running in own cluster(not hadoop cluster), in spark application,
rdd = sparkcontext.textfile("hdfs://.../file.txt")
- when data (for ex 1 tb of data)actually transferred spark executor nodes processing?
- how data split across nodes?
- is ha feature , fault tolerance available way of replication in executor nodes well?
- is feasible peek worker nodes see location of data files?
- is spark installed on distributed cluster bring processing closer data. ex, in cassandra cluster or hbase cluster, or hadoop cluster
i bit confused spark setup , execution. can point me links clears above queries?
Comments
Post a Comment