首页 > 解决方案 > 具有索引输出的 Find_between 函数?

问题描述

我想使用该find_between函数从特定的 Web 服务器检索可索引的值。

我正在使用该requests模块从第 18 行看到的特定网站收集一些源代码:

response = requests.get("https://www.shodan.io/search?query=Server%3A+SQ-WEBCAM")

我想调用该find_between函数以使用指定的参数检索所有值(页面上的所有项目,每个项目由“n”的递增值表示)find_between

x = find_between(response.content,'/></a><a href="/host/','">---')

有谁知道如何解决这个问题?

import sys
import requests
from time import sleep

# Find between page tags on page.
def find_between( s, tag1, tag2 ):
    try:
        start = s.index( tag1 ) + len( tag1 )
        end = s.index( tag2, start )
        return s[start:end]
    except ValueError:
        return ""

def main():
    # Default value for 'n' index value (item on page) is 0
    n = 0

    # Enter the command 'go' to start
    cmd = raw_input("Enter Command: ")
    if cmd == "go":
        print "go!"

        # Go to this page for page item gathering.
        response = requests.get("https://www.shodan.io/search?query=Server%3A+SQ-WEBCAM")

        # Initial source output...
        print response.content

        # Find between value of 'x' sources between two tags
        x = find_between(response.content,'/></a><a href="/host/','">---')
        while(True):

            # Wait one second before continuing...
            sleep(1)
            n = n + 1

            # Display find_between data in 'x'
            print "\nindex: %s\n\n%s\n" % (n, x)

    # Enter 'exit' to exit script
    if cmd == "exit":
        sys.exit()

# Recursive function call
while(True):
    main()

标签: python

解决方案


您的代码中的一些事情似乎需要解决:

  1. 的值x设置在循环之外(之前)while,因此循环会增加索引n,但会一遍又一遍地打印相同的文本,因为x永远不会改变。
  2. find_between()仅返回一个匹配项,并且您想要所有匹配项。
  3. 你的while循环永远不会结束。

建议:

  1. 将调用find_between()放在while循环内。
  2. 每次连续调用find_between()时,只传递上一个匹配项之后的文本部分。
  3. 找不到匹配项时退出while循环。find_between()

像这样的东西:

text_to_search = response.content
while(True):
    # Find between value of 'x' sources between two tags
    x = find_between(text_to_search, '/></a><a href="/host/', '">---')
    if not x:
        break

    # Wait one second before continuing...
    sleep(1)

    # Increment 'n' for index value of item on page
    n = n + 1

    # Display find_between data in 'x'
    print "\nindex: %s\n\n%s\n" % (n, x)

    # Remove text already searched
    found_text_pos = text_to_search.index(x) + len(x)
    text_to_search = text_to_search[found_text_pos:]

推荐阅读