首页 > 解决方案 > FileCreateError: [Errno 24] 打开的文件太多 | Windows 10、Python、Spyder IDE

问题描述

我正在处理来自远程驱动器的大量文件,最后我想将带有结果的 pandas 数据框保存在 Excel 电子表格中。打开的文件数似乎超过了可用内存。我在网上搜索并发现win32file._setmaxstdio(2048),但似乎我的一些结果也超过了这个阈值。下面展示了数据检索和处理的每个步骤:

图像处理:

我从图像中提取特征(例如图像中的区域区域)和图像的相应时间戳。然后将特征数据和时间戳存储在 pandas 数据框中。数据框中的行数通常在 1000 以上,而列数始终为 13。

CSV文件处理:

我将 csv 文件转换为 pandas 数据帧,并根据共同标准将它们连接在一起。接下来,具有图像特征的数据框和具有从 csv 文件中选择的数据的数据框根据它们的索引(这是一个日期时间对象)连接在一起。从远程驱动器检索图像和 csv 文件。

完整代码如下所示:

# Path to folder with tiff images 
path = "Z:\\GroupA\\SharedFolder\\Cameras\\Camera1\\Sample_folder1\\sample1\\Images\\*.tif*"
path_to_csv = "Z:\\GroupA\\SharedFolder\\Cameras\\Camera1\\Sample_folder1\\sample1"

power = 168
velocity = 1500

# Creating an empty list for time stamps of tiff images
datetime_list1 = []

# Importing libraries
import time
from glob import glob
import datetime
import pandas as pd
from PIL.TiffTags import TAGS
from PIL import Image
import numpy as np
from matplotlib import pyplot as plt
#from scipy import ndimage
from skimage import measure
from skimage.filters import gaussian, threshold_otsu

properties = ['area','perimeter','major_axis_length','max_intensity']

dataframe = pd.DataFrame(columns=properties)                                                                
                                                                
feedrate = pd.read_csv(path_to_csv + '\\Feedrate.csv',names=['Feedrate','Time'])
LaserPower = pd.read_csv(path_to_csv + '\\LaserPower.csv',names=['LaserPower','Time'])
LaserStatus = pd.read_csv(path_to_csv + '\\LaserStatus.csv',names=['LaserStatus','Time'])
x_pos = pd.read_csv(path_to_csv + '\\XPos.csv',names = ['XPos','TimeX'])
x_pos.XPos = x_pos.XPos-x_pos.XPos[0]
y_pos = pd.read_csv(path_to_csv + '\\YPos.csv',names = ['YPos','TimeY'])
y_pos.YPos = y_pos.YPos-y_pos.YPos[0] 
z_pos = pd.read_csv(path_to_csv + '\\ZPos.csv',names = ['ZPos','TimeZ'])
z_pos.ZPos = z_pos.ZPos-z_pos.ZPos[0]

#Converting time (saved as strings in csv files) to  datetime format
x_pos['TimeX'] = pd.to_datetime(x_pos['TimeX'])

y_pos['TimeY'] = pd.to_datetime(y_pos['TimeY'])

z_pos['TimeZ'] = pd.to_datetime(z_pos['TimeZ'])

feedrate['Time'] = pd.to_datetime(feedrate['Time'])

LaserPower['Time'] = pd.to_datetime(LaserPower['Time'])

LaserStatus['Time'] = pd.to_datetime(LaserStatus['Time'])

# LaserPower*LaserStatus indicates when the laser is on or off
Lp_Ls = pd.DataFrame(LaserPower['LaserPower']*LaserStatus['LaserStatus'],columns=['LP*LS'])

Lp_Ls['Feedrate'] = feedrate['Feedrate']
Lp_Ls['X'] = x_pos['XPos']
Lp_Ls['Y'] = y_pos['YPos']
Lp_Ls['Z'] = z_pos['ZPos']
# Time of the dataframe is set to be the same as in LaserPower dataframe
Lp_Ls['Time'] = LaserPower['Time']
# Selecting rows with non-zero values only
Lp_Ls = Lp_Ls.loc[(Lp_Ls['LP*LS']!=0)]
Lp_Ls.set_index('Time',inplace=True)
Lp_Ls.sort_index(ascending=True,inplace=True)

#%% Image Processing
# Scanning path folder for tif files 
for file in glob(path):
    # Accept files with even number as their last digit in file name to limit number of files passed for processing
    if np.float(file[-5])%2 == 0:
        # Open image file for reading
        with Image.open(file) as f:
            imarray = np.asarray(f)
           
            if imarray.max() > 100:      
                # Denoising by Gaussian filter
                blurred = gaussian(imarray, sigma=.8)

                #Threshold image to binary using OTSU. ALl thresholded pixels will be set to 255
                binary = blurred > threshold_otsu(blurred)

                # Label melt pool
                labels = measure.label(binary,connectivity=imarray.ndim)

                # Retrieve properties of labelled melt pools and put them into a dataframe
                regions = measure.regionprops_table(labels,imarray,properties=properties)

                # Properties (area, intensity, length , etc.) of a single TIFF image
                data = pd.DataFrame(regions)
               
                for i in data['area']:
                    if i > 2000 and i < 15000:
                        selected_data = data[data['area'] == i]

                        # Extracting meta data from TIFF images
                        meta_dict = {TAGS[key] : f.tag[key] for key in f.tag.keys()}

                        # Extracting DateTime from metadata dictionary
                        datetime_tag = datetime.datetime.strptime(meta_dict['DateTime'][0],'%H:%M:%S.%f ')
                        # Append to a list
                        datetime_list1.append(datetime_tag)

                        # Concatenate propeties to another dataframe to save the properties for all images
                        dataframe = pd.concat([dataframe,selected_data])
                        
   
df_tiff = pd.DataFrame(datetime_list1,columns=['Time'])
df_tiff['Time'] = pd.to_datetime(df_tiff['Time']) 

# Joining dataframes            
dataframe.reset_index(drop=True,inplace=True)
dataframe = dataframe.join(df_tiff)   
dataframe.set_index('Time',inplace=True)

# Adding power and velocity as columns to dataframe with meltpool areas
dataframe['Power'] = np.ones(len(dataframe)) * power
dataframe['Velocity'] = np.ones(len(dataframe)) * velocity

# Sorting images chronologically 
dataframe.sort_index(ascending=True,inplace=True)

# Find time difference between the 1st image and when LaserPower*LaserStatus != 0
difference_in_time = dataframe.index[0] - Lp_Ls.index[0]
Lp_Ls.index = Lp_Ls.index + difference_in_time

# Joining dataframes
data_frames = dataframe.join(Lp_Ls,how='outer')

# Interpolation of data
data_frames['Power'].interpolate(inplace=True)
data_frames['Velocity'].interpolate(inplace=True)
data_frames['Hatch'].interpolate(inplace=True)
data_frames['LP*LS'].interpolate(inplace=True)
data_frames['Feedrate'].interpolate(inplace=True)
data_frames['X'].interpolate(inplace=True)
data_frames['Y'].interpolate(inplace=True)
data_frames['Z'].interpolate(inplace=True)

# Removing rows NaN
data_frames.dropna(inplace=True)

#%% Saving dataframes as csv and excel files

data_frames.to_excel('path to a folder on my local drive',index=True)

FileCreateError: [Errno 24] 当 data_frames 的尺寸为 (6023,13) 时发生。在 data_frames 中最多大约 4000 行不会发生此错误。任何人都可以就如何更改代码以提高文件处理效率提供建议吗?

标签: windows-10spyderpython-3.8

解决方案


推荐阅读