首页 > 解决方案 > 用 python 转换一个 .dat 文件

问题描述

我们有一个.dat在文本编辑器中完全不可读的文件。该.dat文件包含有关材料的信息,我在可以显示数据的机器上使用。

现在我们要将这些文件用于其他目的,最终需要将它们转换为可读格式。当我在 NOtepad++ 中打开 .dat 文件时,我看到的是:

2ñg½x¹¾T?B½ÛÁ@½^fÓ¼ ":°êȽ¸ô»YY‚½g *½$S)¼¤“è¼F„J¼c1¼$ ¼*‡Ç»½Ú7¼F]S¼Ê(Ï<(‚¤½Y´½å½@ø;N‡;o¸¼¨*S:ΣC¼ÎÀR½žO<š_å¼T÷½p4>½8«»«=ýÆZ<¿[=²”&lt;æt¼pc»q³<×R<ï4¼}Ž‚8pýw<~ï»z†:Qš¼^Kp;XI=<Ѷ ¼ j½…é-=*Ý=;-X7½ßÓ:<ÐZ<Ás!=²LÀ;æã=võu<„4½§V9„燺ý$D<"Š|»å€,<E{=+»¥;2wN¼¸rF=h®<ç[=²=é\=Îý<...À¦¼Î,è¼ u…&lt;#_.¼¾Ã¨9æ3½Å°“&lt;ª×½°ÇD¼JÝþ»ph{=Ÿv8;Ne¼’Q; ´{»(ì¿<6Þï»éõ¼*p½©m¼ÝM–&lt;ròä¼½™™¼Õö=j|½±‰Í;2¥C¼¯ 輓?½>¼:„3» ­ù¼¦k ¼wÞ¹¼Öm‚»=T¼êy¦¼k[…»ÎÉO¼Žc¼$ï½ÖN;H¼4Ø:8¸ž¼dLý¼ø9ø»cI(;4뼈Q¼ž7½,h?¼À ɼy½Å’œ¼¶Åº¼å"±¼bžu¼ Z;½¨½øáY¼ZÖ»2 ½ð^š<Þ„§&lt;»ƒ<@±c<f<ŸPÝ;‹œlºÐöï»ö²ñ;ÜŠb=¦';f´<ò=¬3B<\mÛ¼¹©»åB<»Xô;€ºp»¸ ±¼‰Øâ¼7Ug¼€÷ø¼lËû»j}»²‘ô;wu½®ö²¼Ÿ„¼ŠÉ¼ÖV8 Š¼‹÷¯¼ål¼é°ª¼‹o4½ðî$<4Q:.A< <Ž¬ë<^·G<n œ<¶l<: è;’MÜ9êÁa<’¢T;~&¼gY®»"P¼¤µº;$H=½…o<6ëæ»ûÒ¼Ê,<‚p½¯À¼@êw»Ír¥¼¸wغA:«<TDI»Nºµ<€ŠMºwnܸ·6:CÕj<àÆ:Dr<7ëo9STÏ<G¼R?M<:)N;.3 <†L<ºZ=I,Y<ñF;iÙ.» pºå0<;:=Tʪ;—ÄË;?'й0Ž:J’J<jR¯»´/½Ô”ؼ•¥˜¼hμd™<9¼i^'<(Šd<ɇÖ#·³È‚»@O ><Úo<Ó¸ <ëî;ÒQ<õöî<#Nm¼öw4¼'O¼v <:3<

我们知道.dat文件中的数据格式如下:

MaterialBase  ThicknessBase  ThicknessIterated  Pixel  Value
Plastic        0              0                   1     -5.662651e-02
Plastic        0              0                   2     -1.501216e-01   
Plastic        0              0                   3     -4.742368e-02

通过搜索大量代码,当然也是在这里,我想出了以下代码:

import time
import binascii
import csv
import serial
import numpy as np

with open('validationData.dat.201805271617', 'rb') as input_file:
    lines = input_file.readlines()
    newLines = []
    for line in lines:
        newLine = line.strip('|').split()
        newLines.append(newLine)

with open('file.csv', 'w') as output_file:
    file_writer = csv.writer(output_file)
    file_writer.writerows(newLines)

我现在得到的错误是:

File "c:\Users\joost.bazelmans\Documents\python\dat2csv.py", line 15, in 
<module>
newLine = line.strip('|').split()
TypeError: a bytes-like object is required, not 'str'

看起来脚本正在读取文件,但它不能按|字符拆分它。但我现在迷路了。关于如何继续的任何想法?

编辑 2018-07-23 13:00 根据 guidot 的回复,我们尝试使用.struct。我们现在能够从 .dat 文件中获取浮点值列表。这也是 R 脚本所做的。但在那之后,R 脚本确实将浮点数转换为可读数据。导入结构数据向量 = []

with open('validationData.dat.201805271617', "rb") as f:
    n = 1000000
    count = 0
    byte = f.read(4) # same as size = 4 in R
    while count < n and byte != b"":
        datavector.append(struct.unpack('f',byte))
        count += 1
        byte = f.read(4)

print(datavector)

the result we get is something like this: [(-0.05662650614976883,), (-0.1501215696334839,), (-0.047423675656318665,), (-0.04705987498164177,), (-0.025805648416280746,), (0.0006194132147356868,), (-0.09810388088226318,) , (-0.007468236610293388,), (-0.06364697962999344,), (-0.04153480753302574,), (-0.010334763675928116,), (-0.028390713036060333,), (-0.01236063800752163,), (-0.010809036903083324,), (-0.0195484422147274,), ( -0.006089110858738422,), (-0.011221584863960743,), (-0.012900656089186668,), (0.02528800442814827,), (-0.0803263783454895,), (-0.03630480542778969,), (-0.03244496509432793,), (0.007571130990982056,), (0.004120028577744961,), (-0.022513896226882935,), (0.0008055367507040501,), (-0.011940909549593925,), (-0.05145340412855148,), (0.008258728310465813,), (-0.02799968793988228,), (-0.035880401730537415,), (-0.04643672704696655,), (-0.005221989005804062,), (0.03542486950755119,), (0.013353106565773487,), (0.035976167768239975,), (0.008336232975125313,), (-0.01492307148873806,), (-0.003470425494015217,), (0.02190450392663479,), (0.012822589837014675,), (-0.008801682852208614, ), (6.225423567229882e-05,), (0.015136107802391052,), (-0.007297097705304623,), (0.0010259768459945917,), (-0.018891485407948494,), (0.0036666016094386578,), (0.01155313104391098,), (-0.009809211827814579,), (- 0.03696637228131294,), (0.04245902970433235,), (0.002897093538194895,), (-0.04476182535290718,), (0.011403053067624569,), (0.01330728828907013,), (0.03941703215241432,), (0.005868517793715,), (0.031955622136592865,), (0.015012135729193687,), (-0.0439620167016983,), (0.00014146660396363586,), (-0.0010368679650127888,), (0.011971709318459034,), (-0.003853448200970888,), (0.010528777725994587,), (0.06129004433751106,), (0.00505771255120635,), (-0.012601660564541817,), (0.01481446623802185,), (0.019019771367311478,), (0.004633020609617233,), (-0.021741455420851707,), (-0.033449672162532806,), (-0.021316081285476685,), (0.00593474181368947 ,), (0.0030296281911432743,), (0.023055575788021088,), (0.0256675872951746,), (0.03663543984293938,), (0.044298700988292694,), (0.01264342200011015,), (0.032493121922016144,), (-0.06546197831630707,), (0.031123168766498566,), ( 0.005013703368604183,), (-0.006611336953938007,), (-0.041526272892951965,), (0.0007577596697956324,), (0.030475322157144547,), (0.034476157277822495,), (-0.015037396922707558,), (0.07587681710720062,)](-0.021316081285476685,), (0.00593474181368947,), (0.0030296281911432743,), (0.023055575788021088,), (0.0256675872951746,), (0.03663543984293938,), (0.044298700988292694,), (0.01264342200011015,), (0.032493121922016144,), (-0.06546197831630707, ), (0.031123168766498566,), (0.005013703368604183,), (-0.006611336953938007,), (-0.041526272892951965,), (0.0007577596697956324,), (0.030475322157144547,), (0.034476157277822495,), (-0.015037396922707558,), (0.07587681710720062,)](-0.021316081285476685,), (0.00593474181368947,), (0.0030296281911432743,), (0.023055575788021088,), (0.0256675872951746,), (0.03663543984293938,), (0.044298700988292694,), (0.01264342200011015,), (0.032493121922016144,), (-0.06546197831630707, ), (0.031123168766498566,), (0.005013703368604183,), (-0.006611336953938007,), (-0.041526272892951965,), (0.0007577596697956324,), (0.030475322157144547,), (0.034476157277822495,), (-0.015037396922707558,), (0.07587681710720062,)]030475322157144547,), (0.034476157277822495,), (-0.015037396922707558,), (0.07587681710720062,)]030475322157144547,), (0.034476157277822495,), (-0.015037396922707558,), (0.07587681710720062,)]

现在的问题是如何将这些浮点数转换为可读内容

标签: pythoncsv

解决方案


由于您以二进制格式打开文件mode='rb',因此我认为您可能只需要指定一个类似字节的字符即可:

newLine = line.strip(b'|').split()

推荐阅读