首页 > 解决方案 > 对于每个循环,如何将数据导出到 CSV 文件中的新行?

问题描述

我正在从多个 URL 中抓取数据,并且收到的数据被拆分为单词。在 for 循环的帮助下,我试图将数据附加到空列表并创建数据框,然后导出到 csv 文件。问题是当导出到 csv 时,它会覆盖上一列,我只能看到一列。如何从每次迭代中将数据导出到每一行。

import urllib.request
from inscriptis import get_text
import pandas as pd
from googletrans import Translator
from time import sleep

url_list = pd.read_csv("/home/user/Downloads/warrior_categories.alcohol.csv")
urls = url_list['domain']


def dataextraction():
    df = pd.DataFrame()
    for url in urls:
        final_url = 'http://' + url
        try:
            html = urllib.request.urlopen(final_url).read().decode('utf-8')
            text = get_text(html)
            extracted_data = text.split()
            refined_data = []
            SYMBOLS = '{}()[].,:;+-*/&|<>=~0123456789'
            for i in extracted_data:
                if i not in SYMBOLS:
                    refined_data.append(i)
            print("\n", "$" * 50, "HEYAAA we got arround: ", len(refined_data), " of keywords! Here are they: ",
                  "$" * 50, "\n")
            print(type(refined_data))
            empty=[]
            for data in refined_data:
                empty.append(data)
            df.append(empty)
        except:
            pass

    df.to_csv('alcohol.csv', index=False)

print(dataextraction())

标签: python-3.xpandasfor-loop

解决方案


您的问题需要更多解释,但我的理解是您想在 csv 中显示所有内置 for 循环的列,可以这样做

import pandas as pd


def dataextraction():
    df = pd.DataFrame()
    for url in urls:
        final_url = 'http://' + url
        try:
            html = urllib.request.urlopen(final_url).read().decode('utf-8')
            text = get_text(html)
            extracted_data = text.split()
            refined_data = []
            SYMBOLS = '{}()[].,:;+-*/&|<>=~0123456789'
            for i in extracted_data:
                if i not in SYMBOLS:
                    refined_data.append(i)
            print("\n", "$" * 50, "HEYAAA we got arround: ", len(refined_data), " of keywords! Here are they: ",
                  "$" * 50, "\n")
            print(type(refined_data))
            empty=[]
            for data in refined_data:
                empty.append(data)
            df.append(empty)
        except:
            pass

    df.to_csv('alcohol.csv', index=False)

推荐阅读