首页 > 解决方案 > 从列中获取 URL 并粘贴到 chrome 中

问题描述

我有一个 Excel 文件,其中有一列填充了 +4000 个 URL,每个 URL 位于不同的单元格中。我需要使用 Python 用 Chrome 打开它并从网站上抓取一些数据。在excel中过去。

然后对下一个 URL 执行相同的步骤。你能帮我解决这个问题吗?

标签: pythonwebscreen-scraping

解决方案


将 excel 文件导出到 csv 文件,从中读取数据

def data_collector(url):
    # do your code here and return data that you want to write in place of url
    return url
with open("myfile.csv") as fobj:
    content = fobj.read()
    #below line will return you urls in form of list
    urls = content.replace(",", " ").strip()

for url in urls:
    data_to_be_write = data_collector(url)
    # added extra quotes to prevent csv from breaking it is prescribed
    # to use csv module to write in csv file but for ease of understanding
    # i did it like this, Hoping You will correct it by yourself
    content = "\"" + {content.replace(url, data_to_be_write) + "\""

with open("new_file.csv", "wt") as fnew:
    fnew.write(content)

运行此代码后,您将new_file.csv使用 Excel 打开它,您将获得所需的数据来代替 url

如果您希望您的 url 带有数据,只需将其附加到用冒号分隔的字符串中的数据即可。


推荐阅读