python - ValueError:所有数组的长度必须相同,在数据框中附加数据
问题描述
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
productlink=[]
n=[]
a=[]
re=[]
ra=[]
w=[]
r =requests.get('https://www.houzz.com/professionals/general-contractor')
soup=BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('div',class_='hz-pro-search-result__info')
for pro in tra:
name=pro.find('span',class_='mlm header-5 text-unbold').text
n.append(name)
address=pro.find('span',class_='hz-pro-search-result__location-info__text').text
a.append(address)
reviews=pro.find('span',class_='hz-star-rate__review-string').text
re.append(reviews)
rating=pro.find('span',class_='hz-star-rate__rating-number').text
ra.append(rating)
for links in tra:
for link in links.find_all('a',href=True)[2:]:
if link['href'].startswith('https://www.houzz.com/professionals/general-contractors'):
productlink.append(link['href'])
for link in productlink:
r =requests.get(link,headers=headers)
soup=BeautifulSoup(r.content, 'html.parser')
for web in soup.find_all('a',attrs={'class':'sc-62xgu6-0 jxCcwv mwxddt-0 bSdLOV hui-link trackMe'}):
w.append(web['href'])
df = pd.DataFrame({'name':n,'address':a,'reviews':re,'rating':ra,'web':w})
print(df)
当我尝试将数据附加到数据框中时,代码运行良好,它们向我展示了所有ValueError: All arrays must be of the same length
如何将这些数据附加到数据框中如何解决这些问题如果您在这件事上帮助我,我将非常感谢
这是我的输出:
Capital Remodeling Hanover, Maryland 21076, United States 409 Reviews 4.8
SOD Home Group 367 Santana Heights, Unit #3-3021, San Jose, California 95128, United States 238 Reviews 5.0
Innovative Construction Inc. 3040 Amwiler Rd, Suite B, Peachtree Corners, Georgia 30360, United States 100 Reviews 5.0
Baron Construction & Remodeling Co. Saratoga & Los Angeles, California 95070, United States 69 Reviews 4.8
Luxe Remodel 329 N. Wetherly Dr., Suite 205, Los Angeles, California 90211, United States 79 Reviews 4.9
California Home Builders & Remodeling Inc. STUDIO CITY, California 91604, United States 232 Reviews 5.0
Sneller Custom Homes and Remodeling, LLC 17018 Seven Pines Dr Ste 100, Spring, Texas 77379, United States 77 Reviews 4.9
123 Remodeling Inc. 5070 N. Kimberly Ave Suite C, Chicago, Illinois 60630, United States 83 Reviews 4.7
Professional builders & Remodeling, Inc 15335 Morrison St #325, Sherman Oaks, California 91403, United States 203 Reviews 5.0
Rudloff Custom Builders 896 Breezewood Lane, West Chester, Pennsylvania 19382, United States 111 Reviews 5.0
LAR Construction & Remodeling 6371 canby ave, Tarzana, California 91335, United States 191 Reviews 5.0
Erie Construction Mid West 4271 Monroe St., Toledo, Ohio 43606, United States 231 Reviews 4.8
Regal Construction & Remodeling Inc. 19537 � Ventura Blvd., Tarzana, California 91356, United States 96 Reviews 4.8
Mr. & Mrs. Construction & Remodeling 2570 N 1st street, ste 212, San Jose, California 95131, United States 75 Reviews 5.0
Bailey Remodeling and Construction LLC 201 Meridian Ave., Suite 201, Louisville, Kentucky 40207, United States 106 Reviews 5.0
https://www.houzz.com/trk/aHR0cDovL3d3dy5iYWlsZXlyZW1vZGVsLmNvbQ/2f005891e940e2c01021b57733580fa3/ue/NDU3NDcxNQ/a3be682e415d6c23590401e416ee1018
解决方案
使其尽可能简单,不要将来自不同循环的信息存储在这些列表中,尝试将它们存储在一个中dict
:
可能的解决方案
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get('https://www.houzz.com/professionals/general-contractor')
soup=BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('div',class_='hz-pro-search-result__info')
data = []
for pro in tra:
name=pro.find('span',class_='mlm header-5 text-unbold').text
address=pro.find('span',class_='hz-pro-search-result__location-info__text').text
reviews=pro.find('span',class_='hz-star-rate__review-string').text
rating=pro.find('span',class_='hz-star-rate__rating-number').text
productlink.append(pro.find('a')['href'])
w = pro.find('a')['href']
data.append({'name':name,'address':address,'reviews':reviews,'rating':rating,'web':w})
for idx,item in enumerate(data):
r =requests.get(item['web'],headers=headers)
soup=BeautifulSoup(r.content, 'html.parser')
for web in soup.find_all('a',attrs={'class':'sc-62xgu6-0 jxCcwv mwxddt-0 bSdLOV hui-link trackMe'}):
data[idx]['web']=(web['href'])
df = pd.DataFrame(data)
df
输出
name address reviews rating web
0 Capital Remodeling Hanover, Maryland 21076, United States 409 Reviews 4.8 https://www.houzz.com/trk/aHR0cDovL3d3dy5jYXBp...
1 SOD Home Group 367 Santana Heights, Unit #3-3021, San Jose, C... 238 Reviews 5.0 https://www.houzz.com/trk/aHR0cHM6Ly9zb2RoZy5j...
2 Innovative Construction Inc. 3040 Amwiler Rd, Suite B, Peachtree Corners, G... 100 Reviews 5.0 https://www.houzz.com/trk/aHR0cHM6Ly9pbm5vdmF0...
3 Baron Construction & Remodeling Co. Saratoga & Los Angeles, California 95070, Unit... 69 Reviews 4.8 https://www.houzz.com/trk/aHR0cDovL3d3dy5iYXJv...
4 Luxe Remodel 329 N. Wetherly Dr., Suite 205, Los Angeles, C... 79 Reviews 4.9 https://www.houzz.com/professionals/general-co...
5 California Home Builders & Remodeling Inc. STUDIO CITY, California 91604, United States 232 Reviews 5.0 https://www.houzz.com/trk/aHR0cDovL3d3dy5teWNh...
6 Sneller Custom Homes and Remodeling, LLC 17018 Seven Pines Dr Ste 100, Spring, Texas 77... 77 Reviews 4.9 https://www.houzz.com/trk/aHR0cDovL3NuZWxsZXJj...
7 123 Remodeling Inc. 5070 N. Kimberly Ave Suite C, Chicago, Illinoi... 83 Reviews 4.7 https://www.houzz.com/trk/aHR0cHM6Ly8xMjNyZW1v...
8 Professional builders & Remodeling, Inc 15335 Morrison St #325, Sherman Oaks, Californ... 203 Reviews 5.0 https://www.houzz.com/trk/aHR0cDovL3d3dy5wcm9m...
9 Rudloff Custom Builders 896 Breezewood Lane, West Chester, Pennsylvani... 111 Reviews 5.0 https://www.houzz.com/trk/aHR0cDovL1J1ZGxvZmZj...
10 LAR Construction & Remodeling 6371 canby ave, Tarzana, California 91335, Uni... 191 Reviews 5.0 https://www.houzz.com/trk/aHR0cDovL3d3dy5sYXJy...
11 Erie Construction Mid West 4271 Monroe St., Toledo, Ohio 43606, United St... 231 Reviews 4.8 https://www.houzz.com/trk/aHR0cDovL3d3dy5lcmll...
12 Regal Construction & Remodeling Inc. 19537 ½ Ventura Blvd., Tarzana, California 913... 96 Reviews 4.8 https://www.houzz.com/trk/aHR0cDovL3JlZ2FscmVu...
13 Mr. & Mrs. Construction & Remodeling 2570 N 1st street, ste 212, San Jose, Californ... 75 Reviews 5.0 https://www.houzz.com/trk/aHR0cDovL3d3dy5NcmFu...
14 Bailey Remodeling and Construction LLC 201 Meridian Ave., Suite 201, Louisville, Kent... 106 Reviews 5.0 https://www.houzz.com/trk/aHR0cDovL3d3dy5iYWls...
推荐阅读
- java - 尽管没有任何错误,但 JDBC 无法更新
- regex - 正则表达式:匹配带有下划线和 id 的单词
- perl - 如何将多个 HMTL 文件中的内容合并到一个文件中?
- http - 为什么不总是为 CORS 请求发送 Origin HTTP 标头?
- javascript - React cloneElement 不适用于功能组件
- java - 如何将参数注入 JUnit 5 扩展
- java - 如何查找数字序列是升序还是降序
- java - 使用 httpClient 创建 HttpComponentsMessageSender
- matlab - 在单元格中的单元格数组之间拆分和合并
- node.js - 为发出事件添加附加参数 - Quasar