python - R readBin 与 Python 结构
问题描述
我正在尝试使用 Python 读取二进制文件。其他人使用以下代码使用 R 读取了数据:
x <- readBin(webpage, numeric(), n=6e8, size = 4, endian = "little")
myPoints <- data.frame("tmax" = x[1:(length(x)/4)],
"nmax" = x[(length(x)/4 + 1):(2*(length(x)/4))],
"tmin" = x[(2*length(x)/4 + 1):(3*(length(x)/4))],
"nmin" = x[(3*length(x)/4 + 1):(length(x))])
使用 Python,我正在尝试以下代码:
import struct
with open('file','rb') as f:
val = f.read(16)
while val != '':
print(struct.unpack('4f', val))
val = f.read(16)
我的结果略有不同。例如,R 中的第一行返回 4 列作为 -999.9、0、-999.0、0。Python 为所有四列返回 -999.0(下图)。
我知道他们用一些[]
代码按文件的长度切片,但我不知道如何在 Python 中准确地做到这一点,我也不明白他们为什么这样做。基本上,我想重新创建 R 在 Python 中所做的事情。
如果需要,我可以提供更多代码库。我不想被不必要的代码淹没。
解决方案
从R代码推断,二进制文件首先包含一定数量tmax
的's,然后是相同数量的nmax
's,然后tmin
是's和nmin
's。代码所做的是读取整个文件,然后使用切片将其分成 4 个部分(tmax、nmax 等)。
在 python 中做同样的事情:
import struct
# Read entire file into memory first. This is done so we can count
# number of bytes before parsing the bytes. It is not a very memory
# efficient way, but it's the easiest. The R-code as posted wastes even
# more memory: it always takes 6e8 * 4 bytes (~ 2.2Gb) of memory no
# matter how small the file may be.
#
data = open('data.bin','rb').read()
# Calculate number of points in the file. This is
# file-size / 16, because there are 4 numeric()'s per
# point, and they are 4 bytes each.
#
num = int(len(data) / 16)
# Now we know how much there are, we take all tmax numbers first, then
# all nmax's, tmin's and lastly all nmin's.
# First generate a format string because it depends on the number points
# there are in the file. It will look like: "fffff"
#
format_string = 'f' * num
# Then, for cleaner code, calculate chunk size of the bytes we need to
# slice off each time.
#
n = num * 4 # 4-byte floats
# Note that python has different interpretation of slicing indices
# than R, so no "+1" is needed here as it is in the R code.
#
tmax = struct.unpack(format_string, data[:n])
nmax = struct.unpack(format_string, data[n:2*n])
tmin = struct.unpack(format_string, data[2*n:3*n])
nmin = struct.unpack(format_string, data[3*n:])
print("tmax", tmax)
print("nmax", nmax)
print("tmin", tmin)
print("nmin", nmin)
如果目标是将此数据结构化为点列表(?)(tmax,nmax,tmin,nmin)
,则将其附加到代码中:
print()
print("Points:")
# Combine ("zip") all 4 lists into a list of (tmax,nmax,tmin,nmin) points.
# Python has a function to do this at once: zip()
#
i = 0
for point in zip(tmax, nmax, tmin, nmin):
print(i, ":", point)
i += 1
推荐阅读
- r - geom_label 中的下标
- linux - Linux bash/shell 如何读取我们给它的命令?
- c# - 没有 Task.Delay,Quartz 作业无法执行
- java - 将整数列表与整数进行比较以找到最大的
- vuepress - Vuepress 构建 - 我怎样才能减少内存大小
- java - 无法确定任务 ':app:compileDebugJavaWithJavac' 的依赖关系。在科尔多瓦有问题
- data-structures - SAT 求解器是否将子句打包成字节?
- javascript - 自定义 React Native 评级
- r - rec <- recipes::recipe + caret :: train + Can't rename variables in this context
- flutter - 对于 Android API 级别 30 及更高级别的外部存储请求,Flutter 应用程序被拒绝。下面是代码: