首页 > 解决方案 > 如何将字符串列表中的值转换为 Pandas DataFrame

问题描述

我想将此字符串列表转换为 Pandas DataFrame,其中包含“Open”、“High”、“Low”、“Close”、“PeriodVolume”、“OpenInterest”和“Datetime”列作为索引。如何提取值并创建 DataFrame?谢谢你的帮助!

['RequestId: , Datetime: 5/28/2020 12:00:00 AM, High: 323.44, Low: 315.63, Open: 316.77, Close: 318.25, PeriodVolume: 33449103, OpenInterest: 0',

 'RequestId: , Datetime: 5/27/2020 12:00:00 AM, High: 318.71, Low: 313.09, Open: 316.14, Close: 318.11, PeriodVolume: 28236274, OpenInterest: 0',

 'RequestId: , Datetime: 5/26/2020 12:00:00 AM, High: 324.24, Low: 316.5, Open: 323.5, Close: 316.73, PeriodVolume: 31380454, OpenInterest: 0',

 'RequestId: , Datetime: 5/22/2020 12:00:00 AM, High: 319.23, Low: 315.35, Open: 315.77, Close: 318.89, PeriodVolume: 20450754, OpenInterest: 0']

标签: pythonpandasstring

解决方案


您可以使用 split() 和一些 for 循环将数据放入字典中,然后将字典传递给数据框。

import pandas as pd

# First create the list containing your entries.
entries = [
    'RequestId: , Datetime: 5/28/2020 12:00:00 AM, High: 323.44, Low: 315.63,' \
    ' Open: 316.77, Close: 318.25, PeriodVolume: 33449103, OpenInterest: 0',
    'RequestId: , Datetime: 5/27/2020 12:00:00 AM, High: 318.71, Low: 313.09,' \
    ' Open: 316.14, Close: 318.11, PeriodVolume: 28236274, OpenInterest: 0',
    'RequestId: , Datetime: 5/26/2020 12:00:00 AM, High: 324.24, Low: 316.5,' \
    ' Open: 323.5, Close: 316.73, PeriodVolume: 31380454, OpenInterest: 0',
    'RequestId: , Datetime: 5/22/2020 12:00:00 AM, High: 319.23, Low: 315.35,' \
    ' Open: 315.77, Close: 318.89, PeriodVolume: 20450754, OpenInterest: 0'
]

# Next create a dictionary in which we will store the data after processing.
data = {
    'Datetime': [], 'Open': [], 'High': [], 'Low': [],
    'Close': [], 'PeriodVolume': [], 'OpenInterest': []
}
# Now split your entries by ','
split_entries = [entry.split(',') for entry in entries]

# Loop over the list
for entry in split_entries:
    # and loop over each of the inner lists
    for ent in entry:
        # Split by ': ' to get the 'key'
        # I have added the [1:] as there is a space before each
        # column name which needs to be cut out for this to work.
        key = ent.split(': ')[0][1:]

        # Now we check if the key is in the keys of the dictionary
        # we created earlier and append the value to the list
        # associated with that key if so.
        if key in data.keys():
            data[key].append(ent.split(': ')[1])

# Now we can pass the data into panda's DataFrame class
dataframe = pd.DataFrame(data)

# Then call one more method to set the index
dataframe = dataframe.set_index('Datetime')

推荐阅读