python - python网页抓取csv文件
问题描述
这是我的网络抓取代码,用于获取内容并导出到 csv 文件。我可以知道为什么csv文件的每一行都有间距吗?能解决吗?谢谢!
Python代码
import requests
from bs4 import BeautifulSoup
import csv
session = requests.session()
payload = {"i0023":"XXXXXX",
"i0025":"XXXXXX"
}
session.post("http://192.168.XXX.XXX/checkLogin.cgi",data = payload)
s = session.get("http://192.168.XXX.XXX/m_departmentid.html")
soup = BeautifulSoup(s.text, "html.parser")
table = soup.find('div', attrs={ "class" : "ItemListComponent"})
tbody = table.find_all('tbody')
rows = []
for row in table.find_all('tr'):
rows.append([val.text for val in row.find_all('td')[0:6]])
with open('test.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(row for row in rows if row)
源代码
<div class="ItemListComponent">
<table>
<thead>
<tr><th rowspan="3" scope="col">Department ID</th><th colspan="5" scope="col">Page Total/Page Restriction</th><th rowspan="3" scope="col"></th></tr>
<tr><th colspan="3" scope="col">Total Prints</th><th colspan="1" scope="col">Color</th><th colspan="1" scope="col">Black & White</th></tr>
<tr><th colspan="1" scope="col">Total</th><th colspan="1" scope="col">Color</th><th colspan="1" scope="col">Black & White</th><th colspan="1" scope="col">Print</th><th colspan="1" scope="col">Print</th></tr>
</thead>
<tbody>
<tr><td>7654321</td><td>11</td><td>0</td><td>11</td><td>0</td><td>11</td><td></td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=100">0000100</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(100)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(100)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=101">0000101</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(101)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(101)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=102">0000102</a></td><td>18</td><td>5</td><td>13</td><td>5</td><td>13</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(102)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(102)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=103">0000103</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(103)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(103)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=104">0000104</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(104)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(104)" />
</td></tr>
解决方案
您将其打开为“wb”,即写入字节。改为“w”打开它。
推荐阅读
- python - 神经网络的网格搜索超参数 (Keras)
- mongodb - 如何在 Spring Boot 中从 Mongodb 读取集合数据并定期发布到 kafka 主题中
- woocommerce - 删除 woocommerce 帐户详细信息中的“显示名称”字段
- python - 如何在 Python API 发布请求的“文件体”中指定文件?
- c++ - 与 C++ 上的 2019 LNK2001
- python - 当 selenium 打开 Gmail_login 时,我没有登录,当我尝试登录时,它显示以下内容:
- android - Flutter 无需手动许可即可获取用户所在的国家/地区
- xpath - 使用 appium 和 winappdriver 时 xPath 表达式无效
- php - Lopp递归遍历多维数组
- r - 为 plotly 饼图添加附加值到 hoverinfo