python - Python:从嵌套循环内部将网站抓取的数据从行转换为列
问题描述
我正在尝试将行转换为在嵌套 for 循环中生成的列。
简而言之,它是这样的: Value1 在行中,属于值 1 的数据必须在列中 Value2 在行中,属于值 2 的数据必须在列中
现在的情况是所有值都导出为行,然后一个值的所有值都导出为行,这使得它不可读。
问题是要获取 Value1、value2 等...我必须通过 for 循环并获取 value 1 的所有数据,我需要通过另一个 for 循环(嵌套循环)。
我正在获取的所有数据都来自网站(抓取)。我已经包含了 imgurl 链接到它是如何以及它应该如何(我到目前为止的进展)。第一个是它是怎样的,第二个是它应该怎样。我相信用图像比用我自己的话更容易解释。 https://imgur.com/a/2LRhQrj
我正在使用 pandas 和 xlsxwriter 存储到 excel 中。我设法将所有数据导出到我需要的 excel 中,但我似乎无法将每个值的值转换为列。第一行是时间。这是它应该如何工作的。
#Initialize things before loop
df = pd.DataFrame()
### Time based on hour 00:00, 01:00 etc...
df_time = pd.DataFrame(columns=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])
for listing in soup.find_all('tr'):
listing.attrs = {}
#assetTime = listing.find_all("td", {"class": "locked"})
assetCell = listing.find_all("td", {"class": "assetCell"})
assetValue = listing.find_all("td", {"class": "assetValue"})
for data in assetCell:
array = [data.get_text()]
df = df.append(pd.DataFrame({
'Fridge name': array,
}))
for value in assetValue:
asset_array = [value.get_text()]
df_time = df_time.append(pd.DataFrame({
'Temperature': asset_array
}))
### End of assetValue loop
### End of assetCell loop
### Now we need to save the data to excel
### Create a Pandas Excel writer using XlsxWriter as the Engine
writer = pd.ExcelWriter(filename+'.xlsx', engine='xlsxwriter')
### Convert dataframes
frames = [df, df_time]
result = pd.concat(frames)
### Convert the dataframe to an XlsxWriter Excel object and skip first row for custom header
result.to_excel(writer, sheet_name='SheetName', startrow=1, header=True)
### Get the xlsxwritert workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['SheetName']
### Write the column headers with the defined add_format
for col_num, value in enumerate(result.columns.values):
worksheet.write(0, col_num +1, value)
### Close Pandas Excel writer and output the Excel file
writer.save()
解决方案
经过大量测试后,我采用了另一种方法。我没有弄乱熊猫,而是使用制表法来抓取整个数据,然后将整个表结构导出为 csv。
from tabulate import tabulate
import csv
import datetime ### Import date function to make the files based on date
import requests
from bs4 import BeautifulSoup
if (DAY_INTEGER <= 31) and (DAY_INTEGER > 0):
while True:
try:
### Validate the user input
form_data = {'UserName': USERNAME, 'Password': PASSWORD}
with requests.Session() as sesh:
sesh.post(login_post_url, data=form_data)
response = sesh.get(internal_url)
html = response.text
break
except requests.exceptions.ConnectionError:
print ("Whoops! This is embarrasing :( ")
print ("Unable to connect to the address. Looks like the website is down.")
if(sesh):
#BeautifulSoup version
soup = BeautifulSoup(html,'lxml')
table = soup.find_all("table")[3] # Skip the first two tables as there isn't something useful there
df = pd.read_html(str(table))
df2 = (tabulate(df[0], headers='keys', tablefmt='psql', showindex=False))
myFile = open(filename+'.csv', 'w')
myFile.write(str(df2))
else:
print("Oops. Something went wrong :(")
print("It looks like authentication failed")
推荐阅读
- c# - 为什么系统找不到指定的文件?
- python-3.x - python3 - 检测 ip 存在花费太多时间
- python-3.x - jinja2.exceptions.TemplateSyntaxError:遇到未知标签'do'
- html - 特色系列产品下方的自定义“添加到购物车”按钮 Shopify
- javascript - LeafLet JS Map中的大型JSON URL加载太慢
- linux - 在单节点集群弹性搜索 v 7.11.1 中未发现弹性搜索主机
- javascript - 在 ASP.NET Core Razor 页面中使用 JavaScript 更改活动导航栏链接
- python - 我如何计算每个员工的总时间
- python - 为什么这个简单的“hello world”Urwid 代码会失败?
- python - 如何进行行操作