python - 如何构造一个 pd.Series 的字符串
问题描述
如何从包含每个单元格作为数据字段的字符串(作为 txt 文件导入)构造一个 pd.Series 对象?
细绳:
'Hegselmann, R. (2012). Thomas C. Schelling and the Computer: Some Notes on Schelling’s Essay „Letting a Computer Help with the Work“. Journal of Artificial Societies and Social Simulation, 15(4). http://jasss.soc.surrey.ac.uk/15/4/9.html\nDowney, A. (2012). Think Python. How to Think Like a Computer Scientist. O’Reilly Media, Incorporated. http://www.greenteapress.com/thinkpython/html/index.html\nBird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python—Analyzing Text with the Natural Language Toolkit. O’Reilly Media. https://sites.google.com/site/naturallanguagetoolkit/book'
首先,我将文件更改为 csv
import pandas as pd
import numpy as np
df = pd.read_fwf('E1_TM_1.txt')
df.to_csv('E1_TM_1.csv')
如果我现在想将其呈现为矢量(这是正确的术语吗?)它应该看起来像一个简单的表格。第一列从索引 1 开始,第二列包含字符串中的每个引用。
我已经尝试了代码,但它看起来不像我想要的。
pd.read_fwf('E1_TM_1.csv', encoding='utf8', index_col=0)
,"Hegselmann, R. (2012). Thomas C.","Schelling and the Computer: Some Notes on Schelling’s Essay „Letting a Computer Help with the Work“. Journal of Artificial Societies and Social Simulation, 15(4). http://jasss.soc.surrey.ac.uk/15/4/9.html"
0,"Downey, A. (2012). Think Python.","How to Think Like a Computer Scientist. O’Reilly Media, Incorporated. http://www.greenteapress.com/thinkpython/html/index.html"
1,"Bird, S., Klein, E., & Loper, E.",(2009). Natural Language Processing with Python—Analyzing Text with the Natural Language Toolkit. O’Reilly Media. https://sites.google.com/site/naturallanguagetoolkit/book
此外,对 utf8 的编码不适用于完整的字符串。
解决方案
首先,我建议您使用 ' ' 拆分字符串:
string1 = "Hegselmann, R. (2012). Thomas C. Schelling and the Computer: Some Notes on Schelling’s Essay „Letting a Computer Help with the Work“. Journal of Artificial Societies and Social Simulation, 15(4). http://jasss.soc.surrey.ac.uk/15/4/9.html\nDowney, A. (2012). Think Python. How to Think Like a Computer Scientist. O’Reilly Media, Incorporated. http://www.greenteapress.com/thinkpython/html/index.html\nBird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python—Analyzing Text with the Natural Language Toolkit. O’Reilly Media. https://sites.google.com/site/naturallanguagetoolkit/book."
list_string =string1.split(' ')
import pandas as pd
import numpy as np
np.array(list_string)
老实说,你很快就描述了任务......我认为,在创建数组之前,你可以清理列表并选择你需要的 wordls。
推荐阅读
- javascript - 无法创建异步 Svelte 动作
- excel - 如何通过宏插入公式?
- python - 在 Python 的 tkinter 中,有人可以找出原因 <
> 是不是开火了? - laravel - 无法使用作曲家安装 laravel
- sorting - kotlin 中的排序和分组
- javascript - Openlayers:不支持的 GeoJSON 类型:未定义
- ios - iOS Objective-C 在 VoiceOver 处于活动状态时关闭接近监控
- networking - 从路由器阻止 Instagram
- django - Django在一对一字段模型中查询复杂查询
- symfony - 为 CollectionType 使用 FormTheme?