python - Scrapy yield request from one spider to another -


i have following code:

#firstspider.py class firstspider(scrapy.spider):      name = 'first'     start_urls = ['https://www.basesite.com']     next_urls = []      def parse(self, response):         url in response.css('bunch > of > css > here'):         self.next_urls.append(url.css('more > css > here'))         l = loader(item=item(), selector=url.css('more > css'))         l.add_css('add', 'more > css')         ...         ...         yield l.load_item()         url in self.next_urls:             new_urls = self.start_urls[0] + url             yield scrapy.request(new_urls, callback=secondspider.parse_url)  #secondspider.py class secondspider(scrapy.spider):      name = 'second'     start_urls = ['https://www.basesite.com']       def parse_url(self):         """parse team data."""         return self         # self htmlresponse not 'response' object       def parse(self, response):         """parse all."""         summary = self.parse_url(response)         return summary  #thirdspider.py class thirdspider(scrapy.spider):     # take links second spider, continue: 

i want able pass url scraped in spider 1 spider 2 (in different script). i'm curious why when do, 'response' htmlresponse , not response object ( when doing similar method in same class spider 1; don't have issue )

what missing here? how pass original response(s) second spider? ( , second onto third, etc..?)

you use redis shared resource between spiders https://github.com/rmax/scrapy-redis

  • run n spiders (don't close on idle state), each of them connected same redis , waiting tasks(url, request headers) there;

  • as side-effect push task data redis x_spider specific key (y_spider name).


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -