python - 如何在excel中插入以下美丽的汤刮数据?
问题描述
from bs4 import BeautifulSoup
import pandas as pd
import requests
import time
from datetime import datetime
def extract_source(url):
agent = {"User-Agent":"Mozilla/5.0"}
source=requests.get(url, headers=agent).text
return source
html_text = extract_source('https://www.mpbio.com/us/life-sciences/biochemicals/amino-acids')
soup = BeautifulSoup(html_text, 'lxml')
for a in soup.find_all('a', class_ = 'button button--link button--fluid catalog-list-item__actions-primary-button', href=True):
# print ("Found the URL:", a['href'])
urlof = a['href']
html_text = extract_source(urlof)
soup = BeautifulSoup(html_text, 'lxml')
table_rows = soup.find_all('tr')
first_columns = []
third_columns = []
for row in table_rows:
# for row in table_rows[1:]:
first_columns.append(row.findAll('td')[0])
third_columns.append(row.findAll('td')[1])
for first, third in zip(first_columns, third_columns):
print(first.text, third.text)
基本上我正在尝试从网站的多个链接中的表中抓取数据。我想将该数据插入到下表格式的一个 excel csv 文件中
货号 07DE9922 分析物/目标皮质酮 基本目录号 DE9922 诊断平台 EIA/ELISA 诊断解决方案内分泌学 疾病筛查皮质酮 评价量化 包装尺寸 96 孔 样品类型 血浆、血清 样品量 10 uL 物种反应性小鼠,大鼠 使用声明仅供研究使用,不用于诊断程序。
到excel文件中的以下格式
SKU 分析物/目标基础 目录号 包装尺寸 样品类型 数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据数据
我在以正确格式转换数据时遇到困难
解决方案
我对你的代码做了一些小的修改。我没有打印数据,而是创建了一个字典并将其添加到列表中。然后我用这个列表创建了一个 DataFrame:
import pandas as pd
import requests
import time
from datetime import datetime
def extract_source(url):
agent = {"User-Agent": "Mozilla/5.0"}
source = requests.get(url, headers=agent).text
return source
html_text = extract_source(
"https://www.mpbio.com/us/life-sciences/biochemicals/amino-acids"
)
soup = BeautifulSoup(html_text, "lxml")
data = []
for a in soup.find_all(
"a",
class_="button button--link button--fluid catalog-list-item__actions-primary-button",
href=True,
):
urlof = a["href"]
html_text = extract_source(urlof)
soup = BeautifulSoup(html_text, "lxml")
table_rows = soup.find_all("tr")
first_columns = []
third_columns = []
for row in table_rows:
first_columns.append(row.findAll("td")[0])
third_columns.append(row.findAll("td")[1])
# create dictionary with values and add to the list
d = {}
for first, third in zip(first_columns, third_columns):
d[first.get_text(strip=True)] = third.get_text(strip=True)
data.append(d)
df = pd.DataFrame(data)
print(df)
df.to_csv("data.csv", index=False)
印刷:
SKU Alternate Names Base Catalog Number CAS # EC Number Format Molecular Formula Molecular Weight Personal Protective Equipment Usage Statement Application Notes Beilstein Registry Number Optical Rotation Purity UV Visible Absorbance Hazard Statements RTECS Number Safety Symbol Auto Ignition Biochemical Physiological Actions Density Melting Point pH pKa Solubility Vapor Pressure Grade Boiling Point Isoelectric Point
0 02100078-CF 2-Acetamido-5-Guanidinovaleric acid 100078 210545-23-6 205-846-6 Powder C8H16N4O3· 2H2O 216.241 g/mol Eyeshields, Gloves, respirator filter Unless specified otherwise, MP Biomedical's pr... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 02100142-CF Acetyltryptophan; DL-α-Acetylamino-3-indolepro... 100142 87-32-1 201-739-3 Powder C13H14N2O3 246.266 g/mol Eyeshields, Gloves, respirator filter Unless specified otherwise, MP Biomedical's pr... N-acetyl-DL-tryptophan, is used as stabilizer ... 89478 0° ± 2° (c=1, 1N NaOH, 24 hrs.) ~99% λ max (water)=280 ± 2 nm NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 02100421-CF L-2,5-Diaminopentanoic acid; 2,5-Diaminopentan... 100421 3184-13-2 221-678-6 Powder C5H12N2O2 • HCl 168.621 g/mol Eyeshields, Gloves, respirator filter Unless specified otherwise, MP Biomedical's pr... NaN 3625847 NaN ~99% NaN H319 RM2985000 GHS07 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 02100520-CF Phosphocreatine Disodium Salt Tetrahydrate; So... 100520 922-32-7 213-074-6 Powder C4H8N3Na2O5P·4H2O 255.077 g/mol Eyeshields, Gloves, respirator filter Unless specified otherwise, MP Biomedical's pr... NaN NaN NaN ≥98% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 02100769-CF Vitamin C; Ascorbate; Sodium ascorbate; L-Xylo... 100769 50-81-7 200-066-2 NaN C6H8O6 176.124 g/mol Eyeshields, Gloves, respirator filter Unless specified otherwise, MP Biomedical's pr... L-Ascorbic Acid is used as an Antimicrobial an... 84272 +18° to +32° (c=1, water) ≥98% NaN NaN CI7650000 NaN 1220° F (NTP, 1992) Ascorbic Acid, also known as Vitamin C, is a s... 1.65 (NTP, 1992) 374 to 378° F (decomposes) (NTP, 1992) Between 2,4 and 2,8 (2 % aqueous solution) pK1: 4.17; pK2: 11.57 greater than or equal to 100 mg/mL at 73° F (... 9.28X10-11 mm Hg at 25 deg C (est) NaN NaN NaN
5 02101003-CF Lycine; Oxyneurine; (Carboxymethyl)trimethylam... 101003 107-43-7 203-490-6 Powder C5H11NO2 117.148 g/mol Eyeshields, Gloves, respirator filter Unless specified otherwise, MP Biomedical's pr... Betaine is a reagent that is used in soldering... 3537113 NaN NaN NaN NaN DS5900000 NaN NaN End-product of oxidative metabolism of choline... NaN Decomposes around 293 deg C NaN 1.83 (Lit.) Solubility (g/100 g solvent): <a class="pubche... 1.36X10-8 mm Hg at 25 deg C (est) Anhydrous NaN NaN
6 02101806-CF (S)-2,5-Diamino-5-oxopentanoic acid; L-Glutami... 101806 56-85-9 200-292-1 NaN C5H10N2O3 146.146 g/mol Eyeshields, Gloves, respirator filter Unless specified otherwise, MP Biomedical's pr... L-glutamine is an essential amino acid, which ... 1723797 +30 ± 5° (c = 3.5, 1N HCl) ≥99% NaN NaN MA2275100 NaN NaN L-Glutamine is an essential amino acid that is... 1.364 g/cu cm 185.5 dec °C pH = 5-6 at 14.6 g/L at 25 deg C NaN Water Solubility41300 mg/L (at 25 °C) 1.9X10-8 mm Hg at 25 deg C (est) NaN NaN NaN
7 02102158-CF L-2-Amino-4-methylpentanoic acid; Leu; L; α-am... 102158 61-90-5 200-522-0 Powder C6H13NO2 131.175 g/mol Eyeshields, Gloves, respirator filter Unless specified otherwise, MP Biomedical's pr... Leucine has been used as a molecular marker in... 1721722 +14.5 ° to +16.5 ° (Lit.) NaN NaN NaN OH2850000 NaN NaN NaN 1.293 g/cu cm at 18 deg C 293 °C NaN 2.33 (-COOH), 9.74 (-NH2)(Lit.) Water Solubility21500 mg/L (at 25 °C) 5.52X10-9 mm Hg at 25 deg C (est) NaN Sublimes at 145-148 deg C. Decomposes at 293-2... 6.04(Lit.)
8 02102576-CF 4-Hydroxycinnamic acid; 3-(4-Hydroxphenyl)-2-p... 102576 7400-08-0 231-000-0 Powder C9H8O3 164.16 g/mol Dust mask, Eyeshields, Gloves Unless specified otherwise, MP Biomedical's pr... p-Coumaric acid was used as a substrate to stu... NaN NaN ≥98% NaN NaN GD9094000 NaN NaN NaN NaN 211.5 °C NaN NaN NaN NaN NaN NaN NaN
9 02102868-CF DL-2-Amino-3-hydroxypropionic acid; (±)-2-Amin... 102868 302-84-1 206-130-6 Powder C3H7NO3 105.093 g/mol Eyeshields, Gloves, respirator filter Unless specified otherwise, MP Biomedical's pr... NaN 1721405 -1° to + 1° (c = 5, 1N HCl) ≥98% NaN NaN NaN NaN NaN NMDA agonist acting at the glycine site; precu... 1.6 g/cu cm @ 22 deg C 228 deg C (decomposes) NaN NaN SOL IN <a class="pubchem-internal-link CID-962... NaN NaN NaN NaN
...and so on.
并保存data.csv
(来自 LibreOffice 的屏幕截图):
推荐阅读
- javascript - 需要查找从哪个网站重定向
- node.js - 将 VAPID Push 订阅导入 Firebase 时预期的 OAuth 2 访问令牌
- c# - 如何知道从哪一点触发HttpClient任务取消
- linux - 'find' 找不到文件,可能是什么原因?
- python - 在 Keras 训练期间,tf.py_func 和 sklearn 的 AUC 不同
- supervisord - 是否可以为每个用户(目录)单独使用不同的主管(配置)?
- javascript - 是否可以构建独立的 Vue 应用程序?
- laravel - Laravel(将数据下载到数据库)
- c++11 - tensorflow c ++错误:类型名称的一元VariantShapeFn:int已经注册
- html - 我怎样才能使这个字体更小而不弄乱它的效果