首页 > 解决方案 > 有没有办法使用套接字从网站发送/查看数据?

问题描述

有没有办法使用套接字从谷歌搜索之类的网站发送/查看数据。让它运行一个程序并搜索一些固定值,然后在 Python shell 中输出结果。

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server = 'google.com'
port = 80
server_ip = socket.gethostbyname(server)
s.connect((server, port))

标签: pythonsockets

解决方案


是的。尝试发送一个 HTTP GET 请求,例如对 HTTP 1.1 服务器的最小请求:

import socket

s = socket.socket()
s.connect(('httpbin.org', 80))

request = '\r\n'.join(('GET /get HTTP/1.1', 'Host: httpbin.org', '', ''))
s.send(request)
response = s.recv(1024)

>>> print(response)
HTTP/1.1 200 OK
Connection: keep-alive
Server: gunicorn/19.7.1
Date: Thu, 03 May 2018 22:40:59 GMT
Content-Type: application/json
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
X-Powered-By: Flask
X-Processed-Time: 0
Content-Length: 159
Via: 1.1 vegur

{
  "args": {}, 
  "headers": {
    "Connection": "close", 
    "Host": "httpbin.org"
  }, 
  "origin": "220.233.14.203", 
  "url": "http://httpbin.org/get"
}

然而,这比您需要做的工作要多得多。考虑使用一个库,例如requests

import requests
r = requests.get('http://httpbin.org/get')
>>> print(r.text)
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "origin": "220.233.14.203", 
  "url": "http://httpbin.org/get"
}

或者标准库urlopen()函数。


推荐阅读