Scrapy基本概念——Scrapy shell_资讯

Scrapy基本概念——Scrapy shell

创始人

2024-02-23 12:38:36

0次

Scrapy shell是一个交互式shell，可以在不运行Spider的情况下，测试和调试自己的数据提取代码。事实上，Scrapy shell可以测试任何类型的代码，因为它本就是一个常规的Python shell。

一、Scrapy shell的使用

1、启动Scrapy shell

scrapy shell 'https://scrapy.org' --nolog

2、使用实例和函数调试

1、使用response实例提取数据

>>> response.xpath('//title/text()').get()
'Scrapy | A Fast and Powerful Scraping and Web Crawling Framework'

2、使用fetch()函数获取响应

>>> fetch("https://old.reddit.com/")

3、使用response实例提取数据

>>> response.xpath('//title/text()').get()
'reddit: the front page of the internet'

4、使用request实例修改请求方式

>>> request = request.replace(method="POST")

5、使用fetch()函数获取响应

>>> fetch(request)

6、使用response实例查看响应状态

>>> response.status
404

7、使用response实例打印响应头信息

>>> from pprint import pprint
>>> pprint(response.headers)
{'Accept-Ranges': ['bytes'],'Cache-Control': ['max-age=0, must-revalidate'],'Content-Type': ['text/html; charset=UTF-8'],'Date': ['Thu, 08 Dec 2016 16:21:19 GMT'],'Server': ['snooserv'],'Set-Cookie': ['loid=KqNLou0V9SKMX4qb4n; Domain=reddit.com; Max-Age=63071999; Path=/; expires=Sat, 08-Dec-2018 16:21:19 GMT; secure', 'loidcreated=2016-12-08T16%3A21%3A19.445Z; Domain=reddit.com; Max-Age=63071999; Path=/; expires=Sat, 08-Dec-2018 16:21:19 GMT; secure', 'loid=vi0ZVe4NkxNWdlH7r7; Domain=reddit.com; Max-Age=63071999; Path=/; expires=Sat, 08-Dec-2018 16:21:19 GMT; secure','loidcreated=2016-12-08T16%3A21%3A19.459Z; Domain=reddit.com; Max-Age=63071999; Path=/; expires=Sat, 08-Dec-2018 16:21:19 GMT; secure'],'Vary': ['accept-encoding'],'Via': ['1.1 varnish'],'X-Cache': ['MISS'],'X-Cache-Hits': ['0'],'X-Content-Type-Options': ['nosniff'],'X-Frame-Options': ['SAMEORIGIN'],'X-Moose': ['majestic'],'X-Served-By': ['cache-cdg8730-CDG'],'X-Timer': ['S1481214079.394283,VS0,VE159'],'X-Ua-Compatible': ['IE=edge'],'X-Xss-Protection': ['1; mode=block']}

二、Scrapy shell可用的实例和函数

[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    
[s]   item       {}
[s]   request    
[s]   response   <200 https://scrapy.org/>
[s]   settings   
[s]   spider     
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser

三、在Spider中调用shell来检查响应

Spider代码中可以通过scrapy.shell.inspect_response调用shell

import scrapy
class MySpider(scrapy.Spider):name = "myspider"start_urls = ["http://example.com","http://example.org","http://example.net",]def parse(self, response):# We want to inspect one specific response.if ".org" in response.url:from scrapy.shell import inspect_responseinspect_response(response, self)# Rest of parsing code.

在执行Spider代码时，会自动启动Scrapy shell，如

2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200)  (referer: None)
2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200)  (referer: None)
[s] Available Scrapy objects:
[s]   crawler    
...>>> response.url
'http://example.org'

上一篇：王小利发表律师声明回应被儿子断绝父子关系

下一篇：形容开朗阳光的句子有哪些

Scrapy基本概念——Scrapy shell

相关内容

热门资讯