首页 > 解决方案 > 如何在没有方括号的情况下将结果保存在文本文件或 excel 中?

问题描述

我正在研究网页抓取,我正在逐行从文本文件中获取名称,并在谷歌上搜索并从该结果中抓取地址。我想在各自名称的前面添加该结果。这是我的文本文件 a.txt:

0.5BN FINHEALTH PRIVATE LIMITED
01 SYNERGY CO.
1 BY 0 SOLUTIONS

这是我的代码:

import requests
from bs4 import BeautifulSoup

USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"

out_fl = open('a.txt','r')
for line in out_fl:
    query = line
    query = query.replace(' ', '+')
    print(line)
    URL = f"https://google.com/search?q={query}"
    print(URL)
    headers = {"user-agent": USER_AGENT}
    resp = requests.get(URL, headers=headers)
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        results = []
        newline = '\n'
        for g in soup.find_all('span', class_="i4J0ge"):
            x = f'{line}:{g.text}{newline}'
            results.append(x)
        print(results)

        with open("results.txt","a") as result:
            result.write(str(results))

我得到这样的结果,但它的格式不正确,请帮帮我。我的预期结果是:

  0.5BN FINHEALTH PRIVATE LIMITED : Address: 2nd Floor, BHIVE Forum, GNS Towers #18, Dairy 
  Circle Road, Adugodi, Koramangala, Bengaluru, Karnataka 560029Hours: Closed ⋅ Opens 9:30AM 
  MonSaturdayClosedSundayClosedMonday9:30am–7:30pmTuesday9:30am–7:30pmWednesday9:30am– 
  7:30pmThursday9:30am–7:30pmFriday9:30am–7:30pmSuggest an editUnable to add this file. 
  Please check that it is a valid photo

  01 SYNERGY CO. : 01 SYNERGY CO.\n:Located in: Punjab Agricultural UniversityAddress: 3rd 
  Floor Kartar Bhawan, Ferozpur Rd, Ludhiana, Punjab 141001Hours: Closes soon ⋅ 5PM ⋅ Opens 
  9:30AM MonSaturday10am–5pmSundayClosedMonday9:30am–7:30pmTuesday9:30am– 
  7:30pmWednesday9:30am–7:30pmThursday9:30am–7:30pmFriday9:30am–7:30pmSuggest an editUnable 
  to add this file. Please check that it is a valid photo.Phone: 098159 18807

或者进入excel。谢谢

标签: python-3.xbeautifulsoup

解决方案


您可以将结果分配给 pandas 数据框,然后将其写入 excel 或 csv

Import pandas as pd
df=pd.DataFrame(columns=["",""].  # Assign column name as required

df = [results]

df.to_excel('filename.xlsx', sheet_name='sheet name', index = False)

推荐阅读