hadoop - Pyspark es.query only works when default -
in pypspark way can data returned es leaving es.query default. why this?
es_query = {"match" : {"key" : "value"}} es_conf = {"es.nodes" : "localhost", "es.resource" : "index/type", "es.query" : json.dumps(es_query)} rdd = sc.newapihadooprdd(inputformatclass="org.elasticsearch.hadoop.mr.esinputformat",keyclass="org.apache.hadoop.io.nullwritable",valueclass="org.elasticsearch.hadoop.mr.linkedmapwritable", conf=es_conf) ... rdd.count() 0 rdd.first() valueerror: rdd empty
yet query (the default) seems work
es_query = {"match_all" : {}} ... rdd.first() (u'2017-01-01 23:59:59)
*i have tested queries directly querying elastic search , work wrong spark/es-hadoop.
Comments
Post a Comment