python-3.x - 如何解决行/列的熊猫长度错误
问题描述
我在这里提出了 SO 问题,很幸运能得到@Scott Boston 的回答。
但是,当我正在读取文本文件并且所有行/列的长度不同时,我提出了另一个关于错误的问题ValueError: Columns must be same length as key
,我尝试使用谷歌搜索但没有得到答案,因为我不希望它们被跳过。
错误
b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n'
我的熊猫数据框生成器
#!/usr/bin/python3
import pandas as pd
#
cvc_file = pd.read_csv('kids_cvc',header=None,error_bad_lines=False)
cvc_file[['cols', 0]] = cvc_file[0].str.split(':', expand=True) #Split first column on ':'
df = cvc_file.set_index('cols').transpose() #set_index and transpose
print(df)
结果
$ ./read_cvc.py
b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n'
cols ab ad an ed eg et en eck ell it id ig im ish ob og ock ut ub ug um un ud uck ush
0 cab bad ban bed beg bet den beck bell bit bid big dim fish cob bog dock but cub bug bum bun bud buck gush
1 dab dad can fed keg get hen deck cell fit did dig him dish gob cog lock cut hub dug gum fun cud duck hush
2 gab had fan led leg jet men neck dell hit hid fig rim wish job dog rock gut nub hug hum gun dud luck lush
3 jab lad man red peg let pen peck jell kit kid gig brim swish lob fog sock hut rub jug mum nun mud muck mush
4 lab mad pan wed NaN met ten check sell lit lid jig grim NaN mob hog tock jut sub lug sum pun spud puck rush
5 nab pad ran bled NaN net then fleck tell pit rid pig skim NaN rob jog block nut tub mug chum run stud suck blush
文件内容
$ cat kids_cvc
ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab
at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat
ad: bad, dad, had, lad, mad, pad, sad, tad, glad
an: ban, can, fan, man, pan, ran, tan, van, clan, plan, scan, than
ag: bag, gag, hag, lag, nag, rag, sag, tag, wag, brag, drag, flag, snag, stag
ap: cap, gap, lap, map, nap, rap, sap, tap, yap, zap, chap, clap, flap, slap, snap, trap
am: bam, dam, ham, jam, ram, yam, clam, cram, scam, slam, spam, swam, tram, wham
ack: back, hack, jack, lack, pack, rack, sack, tack, black, crack, shack, snack, stack, quack, track
ash: bash, cash, dash, gash, hash, lash, mash, rash, sash, clash, crash, flash, slash, smash
ed: bed, fed, led, red, wed, bled, bred, fled, pled, sled, shed
eg: beg, keg, leg, peg
et: bet, get, jet, let, met, net, pet, set, vet, wet, yet, fret
en: den, hen, men, pen, ten, then, when
eck: beck, deck, neck, peck, check, fleck, speck, wreck
ell: bell, cell, dell, jell, sell, tell, well, yell, dwell, shell, smell, spell, swell
it: bit, fit, hit, kit, lit, pit, sit, wit, knit, quit, slit, spit
id: bid, did, hid, kid, lid, rid, skid, slid
ig: big, dig, fig, gig, jig, pig, rig, wig, zig, twig
im: dim, him, rim, brim, grim, skim, slim, swim, trim, whim
ip: dip, hip, lip, nip, rip, sip, tip, zip, chip, clip, drip, flip, grip, ship, skip, slip, snip, trip, whip
ick: kick, lick, nick, pick, sick, tick, wick, brick, chick, click, flick, quick, slick, stick, thick, trick
ish: fish, dish, wish, swish
in: bin, din, fin, pin, sin, tin, win, chin, grin, shin, skin, spin, thin, twin
ot: cot, dot, got, hot, jot, lot, not, pot, rot, tot, blot, knot, plot, shot, slot, spot
ob: cob, gob, job, lob, mob, rob, sob, blob, glob, knob, slob, snob
og: bog, cog, dog, fog, hog, jog, log, blog, clog, frog
op: cop, hop, mop, pop, top, chop, crop, drop, flop, glop, plop, shop, slop, stop
ock: dock, lock, rock, sock, tock, block, clock, flock, rock, shock, smock, stock
ut: but, cut, gut, hut, jut, nut, rut, shut
ub: cub, hub, nub, rub, sub, tub, grub, snub, stub
ug: bug, dug, hug, jug, lug, mug, pug, rug, tug, drug, plug, slug, snug
um: bum, gum, hum, mum, sum, chum, drum, glum, plum, scum, slum
un: bun, fun, gun, nun, pun, run, sun, spun, stun
ud: bud, cud, dud, mud, spud, stud, thud
uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck
ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush
笔记:
它将第一行/列作为具有 13 个值的主行/列,并跳过所有超过 13 列的列。
解决方案
我想不出一种扩展列的熊猫方式,但是将行转换为字典让事情变得更容易。
ss = '''
ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab
at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat
ad: bad, dad, had, lad, mad, pad, sad, tad, glad
.......
un: bun, fun, gun, nun, pun, run, sun, spun, stun
ud: bud, cud, dud, mud, spud, stud, thud
uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck
ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush
'''.strip()
with open ('kids.cvc','w') as f: f.write(ss) # write data file
######################################
import pandas as pd
dd = {}
maxcnt=0
with open('kids.cvc') as f:
lines = f.readlines()
for line in lines:
line = line.strip() # remove \n
len1 = len(line) # words have leading space
line = line.replace(' ','')
cnt = len1 - len(line) # get word (space) count
if cnt > maxcnt: maxcnt = cnt # max word count
rec = line.split(':') # header : words
dd[rec[0]] = rec[1].split(',') # split words
for k in dd:
dd[k] = dd[k] + ['']*(maxcnt-len(dd[k])) # add extra values to match max column
df = pd.DataFrame(dd) # convert dictionary to dataframe
print(df.to_string(index=False))
输出
ab at ad an ag ap am ack ash ed eg et en eck ell it id ig im ip ick ish in ot ob og op ock ut ub ug um un ud uck ush
cab bat bad ban bag cap bam back bash bed beg bet den beck bell bit bid big dim dip kick fish bin cot cob bog cop dock but cub bug bum bun bud buck gush
dab cat dad can gag gap dam hack cash fed keg get hen deck cell fit did dig him hip lick dish din dot gob cog hop lock cut hub dug gum fun cud duck hush
gab fat had fan hag lap ham jack dash led leg jet men neck dell hit hid fig rim lip nick wish fin got job dog mop rock gut nub hug hum gun dud luck lush
jab hat lad man lag map jam lack gash red peg let pen peck jell kit kid gig brim nip pick swish pin hot lob fog pop sock hut rub jug mum nun mud muck mush
lab mat mad pan nag nap ram pack hash wed met ten check sell lit lid jig grim rip sick sin jot mob hog top tock jut sub lug sum pun spud puck rush
nab pat pad ran rag rap yam rack lash bled net then fleck tell pit rid pig skim sip tick tin lot rob jog chop block nut tub mug chum run stud suck blush
tab rat sad tan sag sap clam sack mash bred pet when speck well sit skid rig slim tip wick win not sob log crop clock rut grub pug drum sun thud tuck brush
blab sat tad van tag tap cram tack rash fled set wreck yell wit slid wig swim zip brick chin pot blob blog drop flock shut snub rug glum spun yuck crush
crab vat glad clan wag yap scam black sash pled vet dwell knit zig trim chip chick grin rot glob clog flop rock stub tug plum stun chuck flush
grab brat plan brag zap slam crack clash sled wet shell quit twig whim clip click shin tot knob frog glop shock drug scum cluck slush
scab chat scan drag chap spam shack crash shed yet smell slit drip flick skin blot slob plop smock plug slum pluck
stab flat than flag clap swam snack flash fret spell spit flip quick spin knot snob shop stock slug stuck
slab gnat snag flap tram stack slash swell grip slick thin plot slop snug truck
spat stag slap wham quack smash ship stick twin shot stop
snap track skip thick slot
trap slip trick spot
推荐阅读
- reactjs - 将数组传递给组件
- java - 带有 Maven 的 Oracle JDBC 项目:在 IDE 中运行良好,但在命令行中运行 ClassNotFoundException
- android - Align Alignment.bottomCenter 在 Row 中没有帮助
- arduino - arduino,使用 SD 配置文件中的数据设置以太网和网络
- python - 如何使用 matplotlib ArtistAnimation 绘制直方图或条形动画?
- haskell - haskell中类型的函数应用程序操作数($)?
- matlab - 在 MATLAB 上使用持久变量
- python - 使用 pdfrw 处理后无法移动/删除 PDF
- javascript - JavaScript:每列生成表列
- npm - JFrog 私有注册表无法 npm 安装