首页 > 解决方案 > 数据框错误:EmptyDataError:没有要从文件中解析的列

问题描述

我正在尝试从输入文件中提取 4 个表。这是所述文件的摘录,其中包含四个表之一:

    ...
*** USER INFORMATION MESSAGE 7570 (GPWG1S)
         RESULTS OF RIGID BODY CHECKS OF MATRIX KGG      (G-SET)  FOLLOW:
         PRINT RESULTS IN ALL SIX DIRECTIONS AGAINST THE LIMIT OF   1.000000E-03
               DIRECTION        STRAIN ENERGY        PASS/FAIL
               ---------        -------------        ---------
                # Table I am trying to extract
                 1               2.783836E-05          PASS
                 2               1.069251E-04          PASS
                 3               1.004842E-04          PASS
                 4               1.589776E-04          PASS
                 5               1.644181E-06          PASS
                 6               2.628610E-05          PASS
                 # End of table
 SOME POSSIBLE REASONS MAY LEAD TO THE FAILURE:
   1. CELASI ELEMENTS CONNECTING TO ONLY ONE GRID POINT;
      ...

我的代码是:

import pandas as pd

tram_f06 = open('tram.txt', 'r')


g_set_word = 'KGG'
n_set_word = 'KNN'
f_set_word = 'KFF'
a_set_word = 'KAA'

index_tram = 0
with tram_f06 as in_file:
    ## First we get the lines of the tables we're interested in 
    for line in in_file:        
        if g_set_word in line:
            start_g = index_tram + 4 # Le debut de la table se situe 4 lignes apres l'apparition du mot cle
            stop_g = start_g + 6 # Il y a 6 lignes, une pour chaque degre de liberte
        elif n_set_word in line:
            start_n = index_tram + 4
            stop_n = start_n + 6
        elif f_set_word in line:
            start_f = index_tram + 4
            stop_f = start_f + 6
        elif a_set_word in line:
            start_a = index_tram + 4
            stop_a = start_a + 6
        index_tram = index_tram + 1
    in_file.seek(0)
    ## Then we extract those lines
    gset_df = pd.read_csv(in_file, header = None, delim_whitespace=True, skiprows = start_g, skipfooter = index_tram - stop_g, engine = 'python')
    nset_df = pd.read_csv(in_file, header = None, delim_whitespace=True, skiprows = start_n, skipfooter = index_tram - stop_n, engine = 'python')
    fset_df = pd.read_csv(in_file, header = None, delim_whitespace=True, skiprows = start_f, skipfooter = index_tram - stop_f, engine = 'python')
    aset_df = pd.read_csv(in_file, header = None, delim_whitespace=True, skiprows = start_a, skipfooter = index_tram - stop_a, engine = 'python')

我得到了我想要提取的行的索引(start_X/stop_X 变量),但我无法得到数据帧。无论有没有这in_file.seek(0)条线,我都会收到错误消息:

EmptyDataError: No columns to parse from file

有人知道如何解决这个问题吗?

先感谢您!

标签: pythondataframefile

解决方案


在这里可以利用固定宽度格式。您可以读取整个文件,指定宽度以正确获取所需的 3 列,然后从第一行中过滤掉非数字值。

import pandas as pd
from io import StringIO

data = '''
*** USER INFORMATION MESSAGE 7570 (GPWG1S)
         RESULTS OF RIGID BODY CHECKS OF MATRIX KGG      (G-SET)  FOLLOW:
         PRINT RESULTS IN ALL SIX DIRECTIONS AGAINST THE LIMIT OF   1.000000E-03
               DIRECTION        STRAIN ENERGY        PASS/FAIL
               ---------        -------------        ---------
                 1               2.783836E-05          PASS
                 2               1.069251E-04          PASS
                 3               1.004842E-04          PASS
                 4               1.589776E-04          PASS
                 5               1.644181E-06          PASS
                 6               2.628610E-05          PASS
 SOME POSSIBLE REASONS MAY LEAD TO THE FAILURE:
   1. CELASI ELEMENTS CONNECTING TO ONLY ONE GRID POINT;
'''


df = pd.read_fwf(StringIO(data), widths=[32,22,10], names=['DIRECTION','STRAIN ENERGY',' PASS/FAIL'])
df = df.loc[~pd.to_numeric(df['DIRECTION'], errors='coerce').isnull()]

输出

   DIRECTION STRAIN ENERGY  PASS/FAIL
6          1  2.783836E-05       PASS
7          2  1.069251E-04       PASS
8          3  1.004842E-04       PASS
9          4  1.589776E-04       PASS
10         5  1.644181E-06       PASS
11         6  2.628610E-05       PASS

推荐阅读