python - Python + Pandas:复制多个 CSV 第一行的特定列并将这些行存储到单个 csv
问题描述
我有大约 190 个 CSV。每个都有相同的列名。下面共享的示例 csv:
从每个csv 中,我只需要选择第一行的Item
, Predicted_BelRd(D2)
, Predicted_Ulsoor(D2)
, Predicted_ChrchStrt(D2)
, Predicted_BlrClub(D2)
,
Predicted_Indrangr(D1)
, Predicted_Krmngl(D1)
,列Predicted_KrmnglBkry(D1)
,并且需要将所有这些行存储到单独的 CSV 中。所以最终的 CSV 应该是 190 行。Predicted_HSR(D1)
怎么做?
编辑: 到目前为止的代码,正如 DavidDR 所建议的那样:
path = '/home/hp/products1'
all_files = glob.glob(path + "/*.csv")
#print(all_files)
columns = ['Item', 'Predicted_BelRd(D2)', 'Predicted_Ulsoor(D2)', 'Predicted_ChrchStrt(D2)', 'Predicted_BlrClub(D2)', 'Predicted_Indrangr(D1)', 'Predicted_Krmngl(D1)', 'Predicted_KrmnglBkry(D1)', 'Predicted_HSR(D1)']
rows_list = []
for filename in all_files:
origin_data = pd.read_csv(filename)
my_data = origin_data[columns]
rows_list.append(my_data.head(1))
output = pd.DataFrame(rows_list)
#output.to_csv(file_name, sep='\t', encoding='utf-8')
output.to_csv('smallys_final.csv', encoding='utf-8', index=False)
Edit2: 原始数据框:
prod = pd.read_csv('/home/hp/products1/' + 'prod[' + str(0) + '].csv', engine='python')
print(prod)
输出:
Category Item UOM BelRd(D2) Ulsoor(D2) \
0 Food/Bakery BAKING POWDER SPARSH (1KGS) PKT 0 0
1 Food/Bakery BAKING POWDER SPARSH (1KGS) PKT 0 0
2 Food/Bakery BAKING POWDER SPARSH (1KGS) PKT 0 0
3 Food/Bakery BAKING POWDER SPARSH (1KGS) PKT 0 0
4 Food/Bakery BAKING POWDER SPARSH (1KGS) PKT 0 0
ChrchStrt(D2) BlrClub(D2) Indrangr(D1) Krmngl(D1) KrmnglBkry(D1) \
0 0 0 0 0 1
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 1
HSR(D1) date Predicted_BelRd(D2) Predicted_Ulsoor(D2) \
0 0 10 FEB 19 0.0 0.0
1 0 17 FEB 19 NaN NaN
2 0 24 FEB 19 NaN NaN
3 0 4 MARCH 19 NaN NaN
4 0 11 MARCH 19 NaN NaN
Predicted_ChrchStrt(D2) Predicted_BlrClub(D2) Predicted_Indrangr(D1) \
0 0.0 0.0 0.0
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
Predicted_Krmngl(D1) Predicted_KrmnglBkry(D1) Predicted_HSR(D1)
0 0.0 0.0 0.0
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
3 0 4 MARCH 19
4 0 11 MARCH 19
解决方案
干得好:
def function():
firstrows = [] # to collect 190 dataframes, each only 1 row
for filename in csvnames:
# read CSV, filter for a subset of columns, take only first row
df = pd.read_csv(filename) \
.filter(["Item", "Predicted_BelRd(D2)", ...]) \
.iloc[:1]
firstrows.append(df)
return pd.concat(firstrows)
推荐阅读
- swift - Swift HTTP 请求完成块无法正常工作
- swiftui - 如何在可扩展列表 SwiftUI 中移动元素
- mysql - 哪个是执行三重连接查询的最佳性能设置
- java - Apache Camel Spring Boot Java - 如何将动态值从休息路由传递到来自文件的路由?
- javascript - 使用功能组件中的 useState 反应无效的挂钩调用错误
- java - 在 redis sentinel 上使用 asyncRedisCommands 的 Redis 生菜设置
- c++ - 使 int 返回函数表现为左值和右值
- json - 为 Azure 资源创建复制循环
- flutter - 如何使用 Getx 修复 Flutter 中的导航栏悬停问题
- r - 了解有关箱线图的 ggplot2 文档中的数字