python - How can I split a text from a parenthesis in a CSV, and create another column with it
问题描述
I'm completely new to the Python world, so I've been struggling with this issue for a couple days now. I thank you guys in advance.
I have been trying to separate a single Row and column text in three diferente ones. To explain myself better, here's where I am.
So this is my pandas dataframe from a csv:
In[2]:
df = pd.read_csv('raw_csv/consejo_judicatura_guerrero.csv', header=None)
df.columns = ["institution"]
df
Out[2]:
institution
0 1.1.2. Consejo Nacional de Ciencias (CNCOO00012)
Then, I try first to separate the 1.1.2. in a new column called number, which I kind of nailed it:
In[3]:
new_df = pd.DataFrame(df['institution'].str.split('. ',1).tolist(),columns=['number', 'institution'])
Out[3]:
number institution
0 1.1.2. Consejo Nacional de Ciencias (CNCOO00012)
Finally, trying to split the (CNCOO00012) in a new column called unit_id I get the following:
In[4]:
new_df['institution'] = pd.DataFrame(new_df['institution'].str.split('(').tolist(),columns=['institution', 'unit_id'])
Out[4]:
------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-24-70d13206881c> in <module>
----> 1 new_df['institution'] = pd.DataFrame(new_df['institution'].str.split('(').tolist(),columns=['institution', 'unit_id'])
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
472 if is_named_tuple(data[0]) and columns is None:
473 columns = data[0]._fields
--> 474 arrays, columns = to_arrays(data, columns, dtype=dtype)
475 columns = ensure_index(columns)
476
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in to_arrays(data, columns, coerce_float, dtype)
459 return [], [] # columns if columns is not None else []
460 if isinstance(data[0], (list, tuple)):
--> 461 return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
462 elif isinstance(data[0], abc.Mapping):
463 return _list_of_dict_to_arrays(
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
491 else:
492 # list of lists
--> 493 content = list(lib.to_object_array(data).T)
494 # gh-26429 do not raise user-facing AssertionError
495 try:
pandas/_libs/lib.pyx in pandas._libs.lib.to_object_array()
TypeError: object of type 'NoneType' has no len()
What can I do to successfully achieve this task?
解决方案
You can use assign
with str.split
like below. But format of text should be fixed.
df.assign(number = df.institution.str.split().str[0], \
unit_id = df.institution.str.split().str[-1])
Output:
institution number unit_id
0 1.1.2. Consejo Nacional de Ciencias (CNCOO00012) 1.1.2. (CNCOO00012)
Or If you want to strip ()
from unit_id
use
df.assign(number = df.institution.str.split().str[0], \
unit_id = df.institution.str.split().str[-1].str.strip('()'))
institution number unit_id
0 1.1.2. Consejo Nacional de Ciencias (CNCOO00012) 1.1.2. CNCOO00012
推荐阅读
- javascript - 如何使用 Drive API 设置文件的元数据?
- python - 使用设置了“set -o pipefail”的 Python 脚本从标准输入读取时获取退出代码 141
- python - 根据另一列中的字符串在列中分配值
- reactjs - 反应原生Android Webview中的后退按钮
- python - AWS Glue Python Shell 包导入
- flutter - 找不到盒子。您是否忘记调用 Hive.openBox()?
- laravel - Laravel 与 vuejs 单元测试 mocha-webpack 脚本错误
- java - 为什么我们需要一个列表作为hibernate中一对多关系的实体类中的属性?
- elm - 从不同的模块调用函数来设置参数 ELM
- php - 仅在浏览器中执行并创建成功但从命令行运行时失败的 PHP 脚本 Ubuntu