首页 > 解决方案 > Convert Text File into Pandas Dataframe

问题描述

I want to create a dataframe from a textfile. I scraped some data from a website and wrote it into a .txt file. There are 10 'columns', as shown in the first 10 lines of the text file. Can anyone help me with seperating the lines into the respective columns in a pandas dataframe format? Much appreciated!

The following is an example of the text file. I would like the first 10 lines to be the column names and the subsequent lines to be under the respective columns.

NFT Collection
Volume (ETH)
Market Cap (ETH)
Max price (ETH)
Avg price (ETH)
Min price (ETH)
% Opensea+Rarible
#Transactions
#Wallets
Contract date
Axies | Axie Infinity
4,884
480,695
5.24
.0563
.0024
0
86,807
2,389,981
189d ago
Sandbox's LANDs
578
112,989
6
1.11
.108
100%
394
12,879
700d ago

标签: pythonpandastext

解决方案


Update

Filling your DataFrame directly in the loop should be the most efficient memory-wise. This method also avoids loading the whole text file at once:

txt_file = "path/to/your/file"

COL_COUNT = 10

with open(txt_file, "r") as f:
    col = [next(f).strip() for i in range(COL_COUNT)]
    df = pd.DataFrame(columns=col) 
    i = COL_COUNT
    while line:=f.readline():
        if i % COL_COUNT == 0:
            row = []
        row.append(line.strip())
        if i % COL_COUNT == COL_COUNT - 1:
            df = df.append(pd.DataFrame([row], columns=col))
        i += 1

    df.set_index(col[0], inplace=True) # get rid of row index
    print(df)

Output:

                      Volume (ETH) Market Cap (ETH) Max price (ETH) Avg price (ETH) Min price (ETH) % Opensea+Rarible #Transactions   #Wallets Contract date
NFT Collection
Axies | Axie Infinity        4,884          480,695            5.24           .0563           .0024                 0        86,807  2,389,981      189d ago
Sandbox's LANDs                578          112,989               6            1.11            .108              100%           394     12,879      700d ago
Update 2

The list approach is still faster but might claim more memory with big files:

txt_file = "path/to/your/file"

COL_COUNT = 10

table = []
with open(txt_file, "r") as f:
    col = [next(f).strip() for i in range(COL_COUNT)]
    i = COL_COUNT
    while line:=f.readline():
        if i % COL_COUNT == 0:
            row = []
        row.append(line.strip())
        if i % COL_COUNT == COL_COUNT - 1:
            table.append(row)
        i += 1

    df = pd.DataFrame(table, columns=col)
    df.set_index(col[0], inplace=True) # get rid of row index
    print(df)

推荐阅读