python - 如何将一大行数据(它有分隔符)放入熊猫数据框中
问题描述
我正在努力将一些数据放入熊猫数据框(新手)。数据(实际上来自 URL)是一大行,其中逗号作为列分隔符,分号作为行分隔符。有人可以帮我写代码吗?这是我在浏览器中看到的:
20210301,12:25,987,0.359,NaN,408,0.148,NaN,NaN,NaN,NaN;20210301,12:30,1022,0.372,NaN,420,0.153,NaN,NaN,NaN,NaN;20210301,12:35,1057,0.384,NaN,420,0.153,NaN,NaN,NaN,NaN;20210301,12:40,1089,0.396,NaN,384,0.140,NaN,NaN,NaN,NaN;20210301,12:45,1147,0.417,NaN,696,0.253,NaN,NaN,NaN,NaN;20210301,12:50,1200,0.436,NaN,636,0.231,NaN,NaN,NaN,NaN;20210301,12:55,1259,0.458,NaN,708,0.257,NaN,NaN,NaN,NaN;20210301,13:00,1332,0.484,NaN,876,0.319,NaN,NaN,NaN,NaN;20210301,13:05,1401,0.509,NaN,828,0.301,NaN,NaN,NaN,NaN;20210301,13:10,1449,0.527,NaN,576,0.209,NaN,NaN,NaN,NaN;20210301,13:15,1487,0.541,NaN,456,0.166,NaN,NaN,NaN,NaN;20210301,13:20,1534,0.558,NaN,564,0.205,NaN,NaN,NaN,NaN;20210301,13:25,1583,0.576,NaN,588,0.214,NaN,NaN,NaN,NaN;20210301,13:30,1643,0.597,NaN,720,0.262,NaN,NaN,NaN,NaN;20210301,13:35,1701,0.619,NaN,696,0.253,NaN,NaN,NaN,NaN;20210301,13:40,1756,0.639,NaN,660,0.240,NaN,NaN,NaN,NaN;20210301,13:45,1827,0.664,NaN,852,0.310,NaN,NaN,NaN,NaN;20210301,13:50,1967,0.715,NaN,1680,0.611,NaN,NaN,NaN,NaN;20210301,13:55,2099,0.763,NaN,1584,0.576,NaN,NaN,NaN,NaN;20210301,14:00,2248,0.817,NaN,1788,0.650,NaN,NaN,NaN,NaN;20210301,14:05,2388,0.868,NaN,1680,0.611,NaN,NaN,NaN,NaN
这是我到目前为止所拥有的:
import pandas as pd
import urllib as ul
link = "https://xxxxxxxxx"
f = ul.request.urlopen(link)
mydata = f.read()
print(mydata)
输出:
b'20210301,12:25,987,0.359,NaN,408,0.148,NaN,NaN,NaN,NaN;20210301,12:30,1022,0.372,NaN,420,0.153,NaN,NaN,NaN,NaN;20210301,12:35,1057,0.384,NaN,420,0.153,NaN,NaN,NaN,NaN;20210301,12:40,1089,0.396,NaN,384,0.140,NaN,NaN,NaN,NaN;20210301'
我也必须自己添加标题。我不知道如何让熊猫像这样接受它:
| Date | Time | Value1 | Value2 | Value3 | Value4
| -------- | ----- | ------ | ------ | ------ | ------
| 20210301 | 12:25 | 987 | 0.359 | NaN | ...
| 20210301 | 12:30 | 1022 | 0.372 | NaN | ...
| ... | ... | ... | ... | ... | ...
我不知道使用哪种 pandas 方法以及如何使用。非常感谢任何帮助!
解决方案
您所追求的神奇词是read_csv
(只需点击文档的链接)。
您的用例的选项并非微不足道,唯一万无一失的方法是检查文档页面,但这就是您想要的:
df = pd.read_csv(io.BytesIO(mydata), lineterminator=';', sep=',',
header=None, names=['Date', 'Time', 'Value1', 'Value2',
'Value3', 'Value4'],
usecols=range(6))
推荐阅读
- awk - 从文件打印字符串和输入
- powershell - Windows 7 上的 PowerShell SMTPClient 失败(“需要身份验证”);适用于 Windows 10
- swift - 小窗口的大小
- python - 在 google colab 中运行 python 脚本,nohup 给出 ImportError
- python - Jupyer 无法识别不确定性库
- python - 每次运行 Python 脚本时,如何关闭使用 Python 脚本创建的现有文件?
- c# - basicHttpBinding、netTcpBinding、protobuff扩展的序列化性能
- c++ - 为什么使用 SIMD 指令时这个简单的 C++ SIMD 基准测试运行速度会变慢?
- r - 如何正确使用函数切片
- android - 我应该如何初始化 Koin DI 并使用共享首选项