python - 使用 python 读取大型文本文件比使用 Matlab 读取相同文本的相同代码要慢得多,知道为什么吗?
问题描述
我在 Matlab 中有以下代码用于读取文本文件文本文件具有 XML 格式,但我将其作为文本文件读取:
function [jointAngleData,PositionData, AccelerationData,OrientationData,
AngularVelocityData,AngularAccelerationData,TimeStamps] = getDatafromMVNX
(file,eliminate_samples)
fid=fopen (file);
currentline=fgetl(fid);
jointAngleData =[];
PositionData = [];
AccelerationData = [];
OrientationData = [];
AngularVelocityData = [];
AngularAccelerationData = [];
while ischar(currentline)
if (contains(currentline,'<jointAngle>'))
[data,~]=strsplit(currentline,'<\D*>','DelimiterType', 'RegularExpression');
currentlinedata = str2num(data{2}); %#ok<*ST2NM>
jointAngleData = [jointAngleData ; currentlinedata]; %#ok<*AGROW>
end
if (contains(currentline,'<position>'))
[data,~]=strsplit(currentline,'<\D*>','DelimiterType', 'RegularExpression');
currentlinedata = str2num(data{2});
PositionData = [PositionData ; currentlinedata];
end
if (contains(currentline,'<acceleration>'))
[data,~]=strsplit(currentline,'<\D*>','DelimiterType', 'RegularExpression');
currentlinedata = str2num(data{2});
AccelerationData = [AccelerationData ; currentlinedata];
end
if (contains(currentline,'<orientation>'))
[data,~]=strsplit(currentline,'<\D*>','DelimiterType', 'RegularExpression');
currentlinedata = str2num(data{2});
OrientationData = [OrientationData ; currentlinedata];
end
if (contains(currentline,'<angularVelocity>'))
[data,~]=strsplit(currentline,'<\D*>','DelimiterType', 'RegularExpression');
currentlinedata = str2num(data{2});
AngularVelocityData = [AngularVelocityData ; currentlinedata];
end
if (contains(currentline,'<angularAcceleration>'))
[data,~]=strsplit(currentline,'<\D*>','DelimiterType', 'RegularExpression');
currentlinedata = str2num(data{2});
AngularAccelerationData = [AngularAccelerationData ; currentlinedata];
end
currentline=fgetl(fid);
end
Data_ends = size(jointAngleData,1)-eliminate_samples;
jointAngleData = jointAngleData(1:Data_ends,:);
AccelerationData = AccelerationData(1:Data_ends,:);
OrientationData = OrientationData(4:Data_ends+3,:);
PositionData = PositionData(4:Data_ends+3,:);
AngularVelocityData = AngularVelocityData(1:Data_ends,:);
AngularAccelerationData = AngularAccelerationData(1:Data_ends,:);
TimeStamps = size(OrientationData,1);
end
对于相同的任务,我在 python 中编写了一个代码:
def _read_feature_text(line):
start = line.find('>')+1
lend = line.find('</')
workingportion = line[start:lend]
return pd.DataFrame([np.fromstring(workingportion,sep= ' ')])
def read_mvnx(mvnxfile):
from bs4 import BeautifulSoup
myfile = open (mvnxfile,"r")
contents = myfile.read()
orientation = pd.DataFrame()
positions = pd.DataFrame()
velocities = pd.DataFrame()
accelerations = pd.DataFrame()
angularVelocities = pd.DataFrame()
angularAccelerations = pd.DataFrame()
jointAngles = pd.DataFrame()
with myfile:
wholefilecontent = myfile.readlines()
#line = myfile.readline()
start_time = timeit.default_timer()
for line in wholefilecontent:
if ('orientation' in line):
orientation = orientation.append(_read_feature_text(line),ignore_index = True)
elif ('position' in line):
positions = positions.append(_read_feature_text(line),ignore_index = True)
elif ('velocity' in line):
velocities = velocities.append(_read_feature_text(line),ignore_index = True)
elif ('acceleration' in line):
accelerations = accelerations.append(_read_feature_text(line),ignore_index = True)
elif ('angularVelocity' in line):
angularVelocities = angularVelocities.append(_read_feature_text(line),ignore_index = True)
elif ('angularAcceleration' in line):
angularAccelerations = angularAccelerations.append(_read_feature_text(line),ignore_index = True)
elif ('joinAngle' in line):
jointAngles = jointAngles.append(_read_feature_text(line),ignore_index = True)
elapsed = timeit.default_timer() -start_time
print(elapsed)
我什至尝试使用正则表达式和 BeautifulSoup 包。两者都没有给我更好的时机。任何建议为什么?有没有其他方法可以让它更快。更快,我的意思是比这个更快。
解决方案
对于我的代码,我发现让它变得太慢的原因是,在每一行中找到数据后,我将其转换为数据帧并将其附加到全局数据帧的末尾。这种转换使它超级慢。我通过将数据放在一个 numpy 数组中来修复它,然后在最后将整个 numpy 数组转换为一个数据帧。
我还使用 xmltodic 包来解析文件而不是逐行解析。
推荐阅读
- jspdf - 有没有办法在 AutoTable 中旋转标题,例如 90 度或 45 度?
- c# - 如何修复“无法将类型 System.Collections.Generic.List<> 隐式转换为 System.Collections.Generic.List<>”
- ansible - ansible/json_query 加入列出 2 个键
- javascript - 使用 JSXGraph 时如何删除创建的点?
- python - CUMSUM 加法如下
- python - 从中提取 data-keyword=
- swift - 如何为 SCNPhysicsBody 的某些属性添加观察者?
- r - 如何合并具有相同列和行中一些相同数据的多张excel表
- android - 如何使用 Kotlin 在片段内播放 YouTube 视频
- java - 构造函数中的对象创建链接 wrt 继承