首页 > 解决方案 > 如何构造一个 pd.Series 的字符串

问题描述

如何从包含每个单元格作为数据字段的字符串(作为 txt 文件导入)构造一个 pd.Series 对象?

细绳:

 'Hegselmann, R. (2012). Thomas C. Schelling and the Computer: Some Notes on Schelling’s Essay „Letting a Computer Help with the Work“. Journal of Artificial Societies and Social Simulation, 15(4). http://jasss.soc.surrey.ac.uk/15/4/9.html\nDowney, A. (2012). Think Python. How to Think Like a Computer Scientist. O’Reilly Media, Incorporated. http://www.greenteapress.com/thinkpython/html/index.html\nBird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python—Analyzing Text with the Natural Language Toolkit. O’Reilly Media. https://sites.google.com/site/naturallanguagetoolkit/book'

首先,我将文件更改为 csv

 import pandas as pd
    import numpy as np
    df = pd.read_fwf('E1_TM_1.txt')
    df.to_csv('E1_TM_1.csv')

如果我现在想将其呈现为矢量(这是正确的术语吗?)它应该看起来像一个简单的表格。第一列从索引 1 开始,第二列包含字符串中的每个引用。

我已经尝试了代码,但它看起来不像我想要的。

pd.read_fwf('E1_TM_1.csv', encoding='utf8', index_col=0)

,"Hegselmann, R. (2012). Thomas C.","Schelling and the Computer: Some Notes on Schelling’s Essay „Letting a Computer Help with the Work“. Journal of Artificial Societies and Social Simulation, 15(4). http://jasss.soc.surrey.ac.uk/15/4/9.html"
0,"Downey, A. (2012). Think Python.","How to Think Like a Computer Scientist. O’Reilly Media, Incorporated. http://www.greenteapress.com/thinkpython/html/index.html"
1,"Bird, S., Klein, E., & Loper, E.",(2009). Natural Language Processing with Python—Analyzing Text with the Natural Language Toolkit. O’Reilly Media. https://sites.google.com/site/naturallanguagetoolkit/book

此外,对 utf8 的编码不适用于完整的字符串。

标签: pythonpandascsv

解决方案


首先,我建议您使用 ' ' 拆分字符串:

string1 = "Hegselmann, R. (2012). Thomas C. Schelling and the     Computer: Some Notes on Schelling’s Essay „Letting a Computer Help with the Work“. Journal of Artificial Societies and Social Simulation, 15(4). http://jasss.soc.surrey.ac.uk/15/4/9.html\nDowney, A. (2012). Think Python. How to Think Like a Computer Scientist. O’Reilly Media, Incorporated. http://www.greenteapress.com/thinkpython/html/index.html\nBird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python—Analyzing Text with the Natural Language Toolkit. O’Reilly Media. https://sites.google.com/site/naturallanguagetoolkit/book."
list_string =string1.split(' ')   
import pandas as pd   
import numpy as np
np.array(list_string)   

老实说,你很快就描述了任务......我认为,在创建数组之前,你可以清理列表并选择你需要的 wordls。


推荐阅读