首页 > 解决方案 > 如何将csv文件读取到某一行并存储在变量中

问题描述

我想读入一个 csv 文件,然后将标题下的数据存储为特定变量。

我的 csv 文件:

multiplicity  
4.123  
lattice parameters  
1,0,0  
0,1,0  
0,0,1  
atom sites  
0,0,0  
0.5,0.5,0.5  
occupancy  
1,0  
0,1  

我想创建一个代码,该代码可以自动将多重性下的行存储为变量的数据,依此类推,以用于 csv 的其余部分。我不能在csv中对诸如multiplicity is line[2]之类的值进行硬编码,因为每个行的行数都会改变。我想创建一个循环,将标头之间的数据存储为变量,但我不确定如何。

理想情况下,我希望代码搜索第一个标头和第二个标头,然后将其间的值保存为多重性变量。然后我希望它找到第二个标题和第三个标题并将这些值保存为 lattice 参数。找到第三个标头和第四个标头以及它们之间的值作为原子站点。最后找到第四个标题和 csv 的结尾,并将其间的值保存为占用。

标签: pythoncsv

解决方案


You could try collecting your rows in a collections.defaultdict().

As for grouping lines to their respective headers, it seems that you can just check if a line has all letters and spaces, and is one item read by csv.reader(). It's difficult to say since you've only shown a snapshot of your data. I've made these assumptions in the example below. After you have identified how you find the headers, you can simply add all the proceeding rows until another header is found.

I've also assumed that your normal rows contain integers and floats. You can convert them directly to their proper types with ast.literal_eval().

Demo:

from csv import reader
from collections import defaultdict
from ast import literal_eval
from pprint import pprint

# Create a dictionary of lists
data = defaultdict(list)

# Open your file
with open('data.csv') as f:

    # Get the csv reader
    csv_reader = reader(f)

    # Initialise current header
    # If rows fall under this header, they don't have a header
    current_header = None

    # Go over each line in the csv file
    for line in csv_reader:

        # Found header
        if len(line) == 1 and all(item.isalpha() or item.isspace() for item in line[0]):
            current_header = line[0]
            continue

        #  If we get here, normal line with ints and floats
        data[current_header].append(list(map(literal_eval, line)))

pprint(data)

Output:

defaultdict(<class 'list'>,
            {'atom sites': [[0, 0, 0], [0.5, 0.5, 0.5]],
             'lattice parameters': [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
             'multiplicity': [[4.123]],
             'occupancy': [[1, 0], [0, 1]]})

And now you have a dictionary that stores each header with its respective rows. This can be manipulated later, and added to if needed.

Here is an example of printing each header and their respective rows(nested list):

for header, rows in data.items():
    print("Header: %s, Rows: [%s]" % (header, ",".join(map(str, rows))))

# Header: multiplicity, Rows: [[4.123]]
# Header: lattice parameters, Rows: [[1, 0, 0],[0, 1, 0],[0, 0, 1]]
# Header: atom sites, Rows: [[0, 0, 0],[0.5, 0.5, 0.5]]
# Header: occupancy, Rows: [[1, 0],[0, 1]]

You can also have a look at How to use dictionaries in Python to understand more about dictionaries and how to manipulate them.


推荐阅读