python - Pandas 删除换行符并将文本文件转换为 CSV
问题描述
我有一个文本文件,我想从中删除换行符并添加标题以将其转换为 CSV 文件。
该文件如下所示:
3G LOJISTIK VE HAVACILIK HIZMETLARI LTD., No. 3/182 Altintepe
Bagdat Cad. Istasyon Yolu Sok., Istanbul 34840, Turkey; Additional
Sanctions Information - Subject to Secondary Sanctions [SDGT]
[IFSR] (Linked To: MAHAN AIR).
7 KARNES, Avenida Ciudad de Cali No. 15A-91, Local A06-07, Bogota,
Colombia; Matricula Mercantil No 1978075 (Colombia) [SDNTK].
我使用的代码:
sdnlist = pd.DataFrame(pd.read_csv('sndlist.txt',delimiter="\t"))
sdnlist.to_csv('sdnlist.csv',index=False)
colnames=["a","b", "c", "d"]
sndlist_data = pd.read_csv("sdnlist.csv",names=colnames)
sndlist_data.head()
所需的输出只是用逗号分隔所有内容:(a,b,c..)是标题名称
a b c d c
3G LO... No. 3/18.... Ista.... Turk..... Sancti... - Subject to....
这是一个来自 Pastbin Pastbin中的文本文件的示例
全文文件取自以下链接FULL SDN TEXT
解决方案
您可以使用 Python 的itertools.groupby()
函数一次读取整个块。然后可以对其进行处理以将其放入一行中,并在似乎是逗号和分号的地方进行拆分。正则表达式可以定位括号内的逗号并将其替换为不同的字符,例如-
.
from itertools import groupby
import csv
import io
import re
with open('sdnlist.txt') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(list('abcdefg'))
for key, block in groupby(f_input, lambda x: x.strip() != ''):
if key:
single_line = ' '.join(block).replace('\n', '').replace(';', ',')
single_line = re.sub('(\([^)]*?)(,)([^)]*?\))', r'\1-\3', single_line)
row = next(csv.reader(io.StringIO(single_line), skipinitialspace=True))
csv_output.writerow(row)
#print('\n'.join(row) + '\n')
这应该为您提供以下输出:
a,b,c,d,e,f,g
3G LOJISTIK VE HAVACILIK HIZMETLARI LTD.,No. 3/182 Altintepe Bagdat Cad. Istasyon Yolu Sok.,Istanbul 34840,Turkey,Additional Sanctions Information - Subject to Secondary Sanctions [SDGT] [IFSR] (Linked To: MAHAN AIR).
7 KARNES,Avenida Ciudad de Cali No. 15A-91,Local A06-07,Bogota,Colombia,Matricula Mercantil No 1978075 (Colombia) [SDNTK].
7 MAKARA PHARY CO.,LTD.,Deaum Mien,Daeum Mien,Ta Khmau,Kandal 8252,Cambodia,Company Number 00037307 (Cambodia) [GLOMAG] (Linked To: SOPHARY- Kim).
7TH OF TIR (a.k.a. 7TH OF TIR COMPLEX- a.k.a. 7TH OF TIR INDUSTRIAL COMPLEX,a.k.a. 7TH OF TIR INDUSTRIES,a.k.a. 7TH OF TIR INDUSTRIES OF ISFAHAN/ESFAHAN,a.k.a. MOJTAMAE SANATE HAFTOME TIR,a.k.a. SANAYE HAFTOME TIR,a.k.a. SEVENTH OF TIR),Mobarakeh Road Km 45,Isfahan,Iran,P.O. Box 81465-478,Isfahan,Iran,Additional Sanctions Information - Subject to Secondary Sanctions [NPWMD] [IFSR].
7TH OF TIR COMPLEX (a.k.a. 7TH OF TIR- a.k.a. 7TH OF TIR INDUSTRIAL COMPLEX,a.k.a. 7TH OF TIR INDUSTRIES,a.k.a. 7TH OF TIR INDUSTRIES OF ISFAHAN/ESFAHAN,a.k.a. MOJTAMAE SANATE HAFTOME TIR,a.k.a. SANAYE HAFTOME TIR,a.k.a. SEVENTH OF TIR),Mobarakeh Road Km 45,Isfahan,Iran,P.O. Box 81465-478,Isfahan,Iran,Additional Sanctions Information - Subject to Secondary Sanctions [NPWMD] [IFSR].
7TH OF TIR INDUSTRIAL COMPLEX (a.k.a. 7TH OF TIR- a.k.a. 7TH OF TIR COMPLEX,a.k.a. 7TH OF TIR INDUSTRIES,a.k.a. 7TH OF TIR INDUSTRIES OF ISFAHAN/ESFAHAN,a.k.a. MOJTAMAE SANATE HAFTOME TIR,a.k.a. SANAYE HAFTOME TIR,a.k.a. SEVENTH OF TIR),Mobarakeh Road Km 45,Isfahan,Iran,P.O. Box 81465-478,Isfahan,Iran,Additional Sanctions Information - Subject to Secondary Sanctions [NPWMD] [IFSR].
7TH OF TIR INDUSTRIES (a.k.a. 7TH OF TIR- a.k.a. 7TH OF TIR COMPLEX,a.k.a. 7TH OF TIR INDUSTRIAL COMPLEX,a.k.a. 7TH OF TIR INDUSTRIES OF ISFAHAN/ESFAHAN,a.k.a. MOJTAMAE SANATE HAFTOME TIR,a.k.a. SANAYE HAFTOME TIR,a.k.a. SEVENTH OF TIR),Mobarakeh Road Km 45,Isfahan,Iran,P.O. Box 81465-478,Isfahan,Iran,Additional Sanctions Information - Subject to Secondary Sanctions [NPWMD] [IFSR].
7TH OF TIR INDUSTRIES OF ISFAHAN/ESFAHAN (a.k.a. 7TH OF TIR- a.k.a. 7TH OF TIR COMPLEX,a.k.a. 7TH OF TIR INDUSTRIAL COMPLEX,a.k.a. 7TH OF TIR INDUSTRIES,a.k.a. MOJTAMAE SANATE HAFTOME TIR,a.k.a. SANAYE HAFTOME TIR,a.k.a. SEVENTH OF TIR),Mobarakeh Road Km 45,Isfahan,Iran,P.O. Box 81465-478,Isfahan,Iran,Additional Sanctions Information - Subject to Secondary Sanctions [NPWMD] [IFSR].
8TH IMAM INDUSTRIES GROUP (a.k.a. CRUISE MISSILE INDUSTRY GROUP- a.k.a. CRUISE SYSTEMS INDUSTRY GROUP,a.k.a. NAVAL DEFENCE MISSILE INDUSTRY GROUP,a.k.a. SAMEN AL-A'EMMEH INDUSTRIES GROUP),Tehran,Iran,Additional Sanctions Information - Subject to Secondary Sanctions [NPWMD] [IFSR].
"14 STAR SHIPPING MANAGEMENT (a.k.a. FOURTEEN STAR SHIPPING MANAGEMENT- a.k.a. ""FOURTEEN STARS"")",United Arab Emirates,Additional Sanctions Information - Subject to Secondary Sanctions [SDGT] (Linked To: MEHDI GROUP).
你仍然很难挑选地址。
推荐阅读
- excel - 使用 SendKeys 选择单元格
- c# - 来自 PSObject C# 的数据
- awesome-wm - 连接新屏幕时强制 Awesome 重启
- python - Python - 使用带线程的 args 从函数调用变量
- r - R中data.frame的嵌套采样
- controller - 将 GPIO 按钮 (RaspberryPi) 映射到游戏手柄 (Gamepad API) 的方法
- ios - 带视频播放的 admob 插页式广告 iOS
- python - 在 R 中解压缩 gz 文件,并进行逐位操作
- c++ - 如何修复已在 AVL.obj 中定义的错误 LNK2005 struct node * root" (?root@@3PAUnode@@A)?
- gpu - 使用 GPU 对具有 128 个颜色通道的图像进行光栅化