python - Convert Text File into Pandas Dataframe
问题描述
I want to create a dataframe from a textfile. I scraped some data from a website and wrote it into a .txt file. There are 10 'columns', as shown in the first 10 lines of the text file. Can anyone help me with seperating the lines into the respective columns in a pandas dataframe format? Much appreciated!
The following is an example of the text file. I would like the first 10 lines to be the column names and the subsequent lines to be under the respective columns.
NFT Collection
Volume (ETH)
Market Cap (ETH)
Max price (ETH)
Avg price (ETH)
Min price (ETH)
% Opensea+Rarible
#Transactions
#Wallets
Contract date
Axies | Axie Infinity
4,884
480,695
5.24
.0563
.0024
0
86,807
2,389,981
189d ago
Sandbox's LANDs
578
112,989
6
1.11
.108
100%
394
12,879
700d ago
解决方案
Update
Filling your DataFrame directly in the loop should be the most efficient memory-wise. This method also avoids loading the whole text file at once:
txt_file = "path/to/your/file"
COL_COUNT = 10
with open(txt_file, "r") as f:
col = [next(f).strip() for i in range(COL_COUNT)]
df = pd.DataFrame(columns=col)
i = COL_COUNT
while line:=f.readline():
if i % COL_COUNT == 0:
row = []
row.append(line.strip())
if i % COL_COUNT == COL_COUNT - 1:
df = df.append(pd.DataFrame([row], columns=col))
i += 1
df.set_index(col[0], inplace=True) # get rid of row index
print(df)
Output:
Volume (ETH) Market Cap (ETH) Max price (ETH) Avg price (ETH) Min price (ETH) % Opensea+Rarible #Transactions #Wallets Contract date
NFT Collection
Axies | Axie Infinity 4,884 480,695 5.24 .0563 .0024 0 86,807 2,389,981 189d ago
Sandbox's LANDs 578 112,989 6 1.11 .108 100% 394 12,879 700d ago
Update 2
The list approach is still faster but might claim more memory with big files:
txt_file = "path/to/your/file"
COL_COUNT = 10
table = []
with open(txt_file, "r") as f:
col = [next(f).strip() for i in range(COL_COUNT)]
i = COL_COUNT
while line:=f.readline():
if i % COL_COUNT == 0:
row = []
row.append(line.strip())
if i % COL_COUNT == COL_COUNT - 1:
table.append(row)
i += 1
df = pd.DataFrame(table, columns=col)
df.set_index(col[0], inplace=True) # get rid of row index
print(df)
推荐阅读
- angular - Angular 自动完成功能似乎无法正常工作
- python - 在 django 中启动 celery task_monitor
- python - 如何停止/终止运行 python 脚本?(再次)
- python - Python:将大型数据框(1.2M 行)插入 Postgres 的问题
- java - 2.1.6 Spring Boot - Elasticsearch Healthcheck 失败
- angular - 使用 mqtt-ngx 通过 websocket 在 tls 中连接到代理
- r - 与微软团队的 slackr_upload 等效的 R 函数
- sql - 数字格式 SQL INSERT
- c# - 如果 5 分钟内没有记录任何内容,如何使进程崩溃
- perl - perl 哈希打印键的值