python - 如何根据条件将数据框列拆分为单独的列
问题描述
我正在尝试将以下数据框拆分为单独的列。我希望一列中的所有文本和数字在空白处拆分。
df[0].head(10)
0 []
1 [Andaman and Nicobar, 194, 52, 142, 0]
2 [Andhra Pradesh, 40,646, 19,814, 20,298, 534]
3 [Arunachal Pradesh, 609, 431, 175, 3]
4 [Assam, 20,646, 6,490, 14,105, 51]
5 [Bihar, 23,589, 8,767, 14,621, 201]
6 [Chandigarh, 660, 169, 480, 11]
7 [Chhattisgarh, 4,964, 1,429, 3,512, 23]
8 [Dadra and Nagar Haveli and Daman, 585, 182, 4...
9 [Daman and Diu, 0, 0, 0, 0]
Name: 0, dtype: object
如果我只在空白处分割并展开,虽然数字被正确分割,但文本被分割成多列。由于不同观察的文本跨越不同数量的列,我无法再次连接它们。显然,解决方案是编写正确的“正则表达式”并对其进行拆分。我无法弄清楚所需的正则表达式,因此请求输入。
df1 = df[0].str.split(' ', expand= True)
df1.head(10)
0 1 2 3 4 5 6 7 8 9
0 [] None None None None None None None None None
1 [Andaman and Nicobar, 194, 52, 142, 0] None None None
2 [Andhra Pradesh, 40,646, 19,814, 20,298, 534] None None None None
3 [Arunachal Pradesh, 609, 431, 175, 3] None None None None
4 [Assam, 20,646, 6,490, 14,105, 51] None None None None None
5 [Bihar, 23,589, 8,767, 14,621, 201] None None None None None
6 [Chandigarh, 660, 169, 480, 11] None None None None None
7 [Chhattisgarh, 4,964, 1,429, 3,512, 23] None None None None None
8 [Dadra and Nagar Haveli and Daman, 585, 182, 401, 2]
9 [Daman and Diu, 0, 0, 0, 0] None None None
我期待的结果应该是这样的:
0 1 2 3 4 5 6 7 8 9
0 [] None None None None None None None None None
1 [Andaman and Nicobar, 194, 52, 142, 0] None None None None None
2 [Andhra Pradesh, 40,646, 19,814, 20,298, 534] None None None None None
3 [Arunachal Pradesh, 609, 431, 175, 3] None None None None None
4 [Assam, 20,646, 6,490, 14,105, 51] None None None None None
5 [Bihar, 23,589, 8,767, 14,621, 201] None None None None None
6 [Chandigarh, 660, 169, 480, 11] None None None None None
7 [Chhattisgarh, 4,964, 1,429, 3,512, 23] None None None None None
8 [Dadra and Nagar Haveli and Daman, 585, 182, 401, 2] None None None None None
9 [Daman and Diu, 0, 0, 0, 0] None None None None None
解决方案
您可以使用str.replace
并str.extract
重新塑造您的数据框。
names = df[0].str.extract('(\D+)').replace('\[|,','',regex=True).rename(columns={0 : 'names'})
df_new = names.join(df[0].str.replace('\D+,','').str.strip(']').str.split(' ',expand=True))
print(df_new)
names 0 1 2 3 4
0 Andaman and Nicobar 194, 52, 142, 0
1 Andhra Pradesh 40,646, 19,814, 20,298, 534
2 Arunachal Pradesh 609, 431, 175, 3
3 Assam 20,646, 6,490, 14,105, 51
4 Bihar 23,589, 8,767, 14,621, 201
5 Chandigarh 660, 169, 480, 11
6 Chhattisgarh 4,964, 1,429, 3,512, 23
7 Dadra and Nagar Haveli and Daman 585, 182, 4... None
8 Daman and Diu 0, 0, 0, 0
推荐阅读
- java - 在表达式内使用赋值运算符
- javascript - JxBrowser 从资源 URL 加载 HTML,无法打开 websocket
- django - 如何使用Django rest框架将虚拟json响应发布到服务器而不将其添加到数据库
- javascript - 如何在 Blogger 的小部件 Feed 中显示“showItemThumbnail”?
- python - Python-如何将 .txt 文件(两列)转换为字典元素?
- reactjs - React 预处理道具
- javascript - window:beforeunload 适用于 chrome 浏览器,但不适用于 chrome 移动浏览器。角 6
- apache-spark - 火花推测在 SparkStreaming 中生效后任务不分配给执行者
- ios - 如何在 TabBarController 中重置 UIViewController
- c - 在 Contiki 的 Wismote 上运行咖啡文件系统示例失败