python - 为什么我不能通过 `pandas.read_csv()` 打开一些 .ann 文件?
问题描述
import pandas as pd
from pathlib import Path
Drugs = ['ARTHROTEC', 'CAMBIA', 'CATAFLAM', 'DICLOFENAC-POTASSIUM', 'DICLOFENAC-SODIUM',
'FLECTOR', 'LIPITOR', 'PENNSAID', 'SOLARAZE', 'VOLTAREN', 'VOLTAREN-XR', 'ZIPSOR']
def extract_Tags(drug):
Files = Path('E:/TM/Final/CADEC/original').glob(drug+'*.ann')
for file in Files:
try:
data = pd.read_csv(file, sep='\t', header=None)
except:
print('Cannot open ', file)
print(drug, '\n')
我在一个目录下有很多.ann
文件,每个标题都以药物名称开头。我试图从他们那里读取数据pandas.read_csv()
。但是,有些文件可以打开,有些则不能。我得到了我得到的,但我不知道如何检查那些无法打开的文件出了什么问题。我应该使用其他命令打开它们吗?
for drug in Drugs:
extract_Tags(drug)
ARTHROTEC
Cannot open E:\TM\Final\CADEC\original\CAMBIA.1.ann
CAMBIA
CATAFLAM
DICLOFENAC-POTASSIUM
DICLOFENAC-SODIUM
FLECTOR
Cannot open E:\TM\Final\CADEC\original\LIPITOR.197.ann
Cannot open E:\TM\Final\CADEC\original\LIPITOR.243.ann
Cannot open E:\TM\Final\CADEC\original\LIPITOR.28.ann
...
Cannot open E:\TM\Final\CADEC\original\LIPITOR.964.ann
LIPITOR
Cannot open E:\TM\Final\CADEC\original\PENNSAID.2.ann
PENNSAID
Cannot open E:\TM\Final\CADEC\original\SOLARAZE.1.ann
Cannot open E:\TM\Final\CADEC\original\SOLARAZE.3.ann
SOLARAZE
Cannot open E:\TM\Final\CADEC\original\VOLTAREN-XR.11.ann
Cannot open E:\TM\Final\CADEC\original\VOLTAREN-XR.13.ann
Cannot open E:\TM\Final\CADEC\original\VOLTAREN-XR.4.ann
...
VOLTAREN
Cannot open E:\TM\Final\CADEC\original\VOLTAREN-XR.11.ann
Cannot open E:\TM\Final\CADEC\original\VOLTAREN-XR.13.ann
Cannot open E:\TM\Final\CADEC\original\VOLTAREN-XR.4.ann
...
VOLTAREN-XR
ZIPSOR
如果我尝试具体打开其中一个文件,它会返回"No columns to parse from file",我不太明白。如何确定数据文件是否损坏或者我应该以其他方式做事?顺便说一句,因为这是一个基准数据集,我发现文件结构不正确是很奇怪的。
pd.read_csv("E:\TM\Final\CADEC\original\LIPITOR.197.ann", sep='\t', header=None)
---------------------------------------------------------------------------
EmptyDataError Traceback (most recent call last)
<ipython-input-4-8f6a7735c992> in <module>
1 # check one of the unopenable files
----> 2 pd.read_csv("E:\TM\Final\CADEC\original\LIPITOR.197.ann", sep='\t', header=None)
d:\python\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
674 )
675
--> 676 return _read(filepath_or_buffer, kwds)
677
678 parser_f.__name__ = name
d:\python\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
446
447 # Create the parser.
--> 448 parser = TextFileReader(fp_or_buf, **kwds)
449
450 if chunksize or iterator:
d:\python\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
878 self.options["has_index_names"] = kwds["has_index_names"]
879
--> 880 self._make_engine(self.engine)
881
882 def close(self):
d:\python\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
1112 def _make_engine(self, engine="c"):
1113 if engine == "c":
-> 1114 self._engine = CParserWrapper(self.f, **self.options)
1115 else:
1116 if engine == "python":
d:\python\lib\site-packages\pandas\io\parsers.py in __init__(self, src, **kwds)
1889 kwds["usecols"] = self.usecols
1890
-> 1891 self._reader = parsers.TextReader(src, **kwds)
1892 self.unnamed_cols = self._reader.unnamed_cols
1893
pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
EmptyDataError: No columns to parse from file
解决方案
推荐阅读
- swift - 我的“让”术语在按钮内,但我无法使其可在按钮外访问
- mongoose - 使用 Mongoose find() 从嵌套对象中选择一个值
- wso2-am - WSO2 AM - 将 JSON 转换为 XML
- java - Flink SQL Result 字段与 LocalDateTime 上的请求类型错误不匹配
- c++ - 理解 C++11 中的 `memory_order_acquire` 和 `memory_order_release`
- java - Eclipse Java 项目中的文件路径
- javascript - 如何制作可以编辑和保存的函数
- c# - 我可以在派生类层次结构中使用 PostSharp DisposableAttribute
- spring-boot - spring boot 授权和认证是如何工作的
- python - 如何从python 3.5中的字符串python表达式中获取字段值