首页 > 解决方案 > 如何仅在数据框中第一次拆分之前拆分文本?

问题描述

我有一个数据集,其中有两列:Industry ClassificationsStock Tickers. 一家公司在其Industry Classification列中有多个标签,由;分隔符分隔。我只想选择第一个标签。

import pandas as pd
training = pd.read_excel('Training Data.xlsx')

当前文件结构:(这是该列的示例)

Industry Classifications
Beauty Care Products (Primary); Consumer Staples (Primary); Hair Care Products (Primary);

Catalog Flowers, Gifts and Novelties (Primary); Catalog Hobbies, Games and Toy Retail (Primary);

Information Technology (Primary); Internet Software and Services (Primary);
Casualty (Primary); Financials (Primary); Fire and Marine Insurance (Primary); 

Commercial and Professional Services (Primary); Commercial Services and Supplies (Primary); 

Banks (Primary); Banks (Primary); Diversified Banks (Primary); Financials (Primary); 

Application Software (Primary); Information Technology (Primary); Software (Primary);

Commercial and Professional Services (Primary); Consulting Services (Primary); Industrials (Primary);

Banks (Primary); Banks (Primary); Financials (Primary); National and State Commercial Banks (Primary); 

预期输出:

Industry Classifications

Beauty Care Products (Primary)

Catalog Flowers

Information Technology (Primary)

Casualty (Primary)

Commercial and Professional Services (Primary) 

Banks (Primary); Banks (Primary)

Application Software (Primary)

Commercial and Professional Services (Primary)

Banks (Primary); Banks (Primary)

标签: pythonpandassplit

解决方案


您可以像已经在做的那样提取第一列,然后在分号上拆分并获取结果的第一个元素。

first_tag = col.split(';')[0]

推荐阅读