首页 > 解决方案 > python网页抓取csv文件

问题描述

这是我的网络抓取代码,用于获取内容并导出到 csv 文件。我可以知道为什么csv文件的每一行都有间距吗?能解决吗?谢谢!

Python代码

import requests
from bs4 import BeautifulSoup
import csv

session = requests.session()

payload = {"i0023":"XXXXXX", 
          "i0025":"XXXXXX"
         }
         
session.post("http://192.168.XXX.XXX/checkLogin.cgi",data = payload)

s = session.get("http://192.168.XXX.XXX/m_departmentid.html")

soup = BeautifulSoup(s.text, "html.parser")

table = soup.find('div', attrs={ "class" : "ItemListComponent"})
tbody = table.find_all('tbody')

rows = []

for row in table.find_all('tr'):
    rows.append([val.text for val in row.find_all('td')[0:6]])

with open('test.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(row for row in rows if row)

源代码

<div class="ItemListComponent">
<table>
<thead>
<tr><th rowspan="3" scope="col">Department ID</th><th colspan="5" scope="col">Page Total/Page Restriction</th><th rowspan="3" scope="col"></th></tr>
<tr><th colspan="3" scope="col">Total Prints</th><th colspan="1" scope="col">Color</th><th colspan="1" scope="col">Black & White</th></tr>
<tr><th colspan="1" scope="col">Total</th><th colspan="1" scope="col">Color</th><th colspan="1" scope="col">Black & White</th><th colspan="1" scope="col">Print</th><th colspan="1" scope="col">Print</th></tr>

</thead>
<tbody>
<tr><td>7654321</td><td>11</td><td>0</td><td>11</td><td>0</td><td>11</td><td></td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=100">0000100</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(100)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(100)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=101">0000101</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(101)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(101)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=102">0000102</a></td><td>18</td><td>5</td><td>13</td><td>5</td><td>13</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(102)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(102)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=103">0000103</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(103)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(103)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=104">0000104</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(104)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(104)" />
</td></tr>

图11

标签: pythonbeautifulsouppython-requestsexport-to-csv

解决方案


您将其打开为“wb”,即写入字节。改为“w”打开它。


推荐阅读