首页 > 解决方案 > 提取 HTML 元素时在 CSV 中添加新行

问题描述

经过几天的研究和数百个错误,我几乎达到了我的代码目标,但仍然缺少一些细节。在这里,我正在抓取一个网站以获取一些信息并将其提取到 Excel 中。我在这里试图克服的问题是为每个运营商创建一条新生产线。现在,输出是一个列表,我无法弄清楚如何将每个运营商信息字符串分开。

import csv
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

c_list = ['282131','365370','551712'] 
headers = ['Name','Unsafe Driving','d1','Crash Indicator','Hours Of Service','d2','Vehicle Maintenance','d3','CS/Alcohol','d4', 'HazMat','Driver Fitness','d5']
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu') 
driver = webdriver.Chrome(executable_path = 'mypath/chromedriver.exe') 
a=[]
c=[]
for i in c_list:
    driver.get("https://ai.fmcsa.dot.gov/SMS")
    wait = WebDriverWait(driver, 20)
    wait.until(EC.element_to_be_clickable((By.XPATH, "//a[@title='Close']"))).click()
    wait.until(EC.element_to_be_clickable((By.XPATH, "(//input[@name='MCSearch'])[2]"))).send_keys(i)
    wait.until(EC.element_to_be_clickable((By.XPATH, "(//input[@name='search'])[2]"))).click()
    wait.until(EC.element_to_be_clickable((By.XPATH, "//*[@id='BASICs']/p[2]/a"))).click()
    carrier = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="basicInfo"]/div/h3')))
    c = carrier.text
    tbl = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//tr[@class='valueRow sumData']")))
    tab = tbl.text.replace("\n",','.strip())
    tab = tab.split(",")
    a.append(c)
    for x in tab:
        a.append(x)  

with open('table.csv','w', encoding='utf8') as myFile:
     writer = csv.writer(myFile)
     writer.writerow(headers)
     writer.writerow(a)

现在,输出看起来像这样输出

但我需要的是: 期望输出

标签: pythoncsv

解决方案


这个问题在Can you encode CR/LF in into CSV files?中得到了很好的回答。

还可以考虑在 Excel 中对多行进行逆向工程。要在 Excel 单元格中嵌入换行符,请按Alt+Enter。然后将文件另存为 .csv。您会看到双引号从一行开始,文件中的每一行都被认为是单元格中嵌入的换行符。

要将其保存为.csv文件,您需要双引号值,因此如果存在,它不会破坏您的列并csv转义"""

for article in articles:
    ...
    # description = re.sub(r"[\r\n]+", " ", description)
    description = description.replace('"', '""')
    rows = '"%s","%s","%s","%s"\n' % (title, date, description, info)
    f.write(rows)

推荐阅读