首页 > 解决方案 > 如何使用带有文本列表的scrapy

问题描述

大家好,我在新项目上工作,用 scrappy 将 ip 转换为域名

我找不到如何在我的起始网址上添加列表文本(ip.txt),用文本列表替换(+ IP)

示例:

start_urls = [
    `"https://api.hackertarget.com/reverseiplookup/?q= + ip"`]

------------------------------------我的代码------------- ----------------

# -*- coding: utf-8 -*-
import scrapy

lists = open(raw_input('IP list file name: '), 'r').read().split('\n')

class jeffbullasSpider(scrapy.Spider):
    name = "iptohost"
    allowed_domains = ["api.hackertarget.com"]
    start_urls = [
    "https://api.hackertarget.com/reverseiplookup/?q=" + str(lists) ] 

    def parse(self, response):
       print response.xpath('//body//text()').get()

(我是python新手,非常感谢你。)

标签: pythonweb-scrapingscrapypython-requests

解决方案


试试这个:

编辑:在发送请求之前也剥离 ip

import scrapy

lists = open(raw_input('IP list file name: '), 'r').read().split('\n')

class jeffbullasSpider(scrapy.Spider):
    name = "iptohost"
    allowed_domains = ["api.hackertarget.com"]
    url = "https://api.hackertarget.com/reverseiplookup/?q={}"

    def start_requests(self):
        for ip in lists:
            yield scrapy.Request(url=self.url.format(ip.strip()), callback=self.parse)

    def parse(self, response):
       print(response.xpath('//body//text()').get())

推荐阅读