首页 > 解决方案 > 如何将一大行数据(它有分隔符)放入熊猫数据框中

问题描述

我正在努力将一些数据放入熊猫数据框(新手)。数据(实际上来自 URL)是一大行,其中逗号作为列分隔符,分号作为行分隔符。有人可以帮我写代码吗?这是我在浏览器中看到的:

20210301,12:25,987,0.359,NaN,408,0.148,NaN,NaN,NaN,NaN;20210301,12:30,1022,0.372,NaN,420,0.153,NaN,NaN,NaN,NaN;20210301,12:35,1057,0.384,NaN,420,0.153,NaN,NaN,NaN,NaN;20210301,12:40,1089,0.396,NaN,384,0.140,NaN,NaN,NaN,NaN;20210301,12:45,1147,0.417,NaN,696,0.253,NaN,NaN,NaN,NaN;20210301,12:50,1200,0.436,NaN,636,0.231,NaN,NaN,NaN,NaN;20210301,12:55,1259,0.458,NaN,708,0.257,NaN,NaN,NaN,NaN;20210301,13:00,1332,0.484,NaN,876,0.319,NaN,NaN,NaN,NaN;20210301,13:05,1401,0.509,NaN,828,0.301,NaN,NaN,NaN,NaN;20210301,13:10,1449,0.527,NaN,576,0.209,NaN,NaN,NaN,NaN;20210301,13:15,1487,0.541,NaN,456,0.166,NaN,NaN,NaN,NaN;20210301,13:20,1534,0.558,NaN,564,0.205,NaN,NaN,NaN,NaN;20210301,13:25,1583,0.576,NaN,588,0.214,NaN,NaN,NaN,NaN;20210301,13:30,1643,0.597,NaN,720,0.262,NaN,NaN,NaN,NaN;20210301,13:35,1701,0.619,NaN,696,0.253,NaN,NaN,NaN,NaN;20210301,13:40,1756,0.639,NaN,660,0.240,NaN,NaN,NaN,NaN;20210301,13:45,1827,0.664,NaN,852,0.310,NaN,NaN,NaN,NaN;20210301,13:50,1967,0.715,NaN,1680,0.611,NaN,NaN,NaN,NaN;20210301,13:55,2099,0.763,NaN,1584,0.576,NaN,NaN,NaN,NaN;20210301,14:00,2248,0.817,NaN,1788,0.650,NaN,NaN,NaN,NaN;20210301,14:05,2388,0.868,NaN,1680,0.611,NaN,NaN,NaN,NaN

这是我到目前为止所拥有的:

import pandas as pd
import urllib as ul

link = "https://xxxxxxxxx"
f = ul.request.urlopen(link)
mydata = f.read()
print(mydata)

输出:

b'20210301,12:25,987,0.359,NaN,408,0.148,NaN,NaN,NaN,NaN;20210301,12:30,1022,0.372,NaN,420,0.153,NaN,NaN,NaN,NaN;20210301,12:35,1057,0.384,NaN,420,0.153,NaN,NaN,NaN,NaN;20210301,12:40,1089,0.396,NaN,384,0.140,NaN,NaN,NaN,NaN;20210301'

我也必须自己添加标题。我不知道如何让熊猫像这样接受它:

| Date     | Time  | Value1 | Value2 | Value3 | Value4
| -------- | ----- | ------ | ------ | ------ | ------
| 20210301 | 12:25 | 987    | 0.359  | NaN    | ...
| 20210301 | 12:30 | 1022   | 0.372  | NaN    | ...
| ...      | ...   | ...    | ...    | ...    | ...

我不知道使用哪种 pandas 方法以及如何使用。非常感谢任何帮助!

标签: pythonpandasdataframe

解决方案


您所追求的神奇词是read_csv(只需点击文档的链接)。

您的用例的选项并非微不足道,唯一万无一失的方法是检查文档页面,但这就是您想要的:

df = pd.read_csv(io.BytesIO(mydata), lineterminator=';', sep=',',
                 header=None, names=['Date', 'Time', 'Value1', 'Value2',
                                     'Value3', 'Value4'],
                 usecols=range(6))

推荐阅读