Spark Production setup and its execution -

Spark Production setup and its execution -

- February 15, 2014

if spark running in own cluster(not hadoop cluster), in spark application,

rdd = sparkcontext.textfile("hdfs://.../file.txt")

when data (for ex 1 tb of data)actually transferred spark executor nodes processing?
how data split across nodes?
is ha feature , fault tolerance available way of replication in executor nodes well?
is feasible peek worker nodes see location of data files?
is spark installed on distributed cluster bring processing closer data. ex, in cassandra cluster or hbase cluster, or hadoop cluster

i bit confused spark setup , execution. can point me links clears above queries?

Comments