首页 > 解决方案 > 如何解决行/列的熊猫长度错误

问题描述

我在这里提出了 SO 问题,很幸运能得到@Scott Boston 的回答。

但是,当我正在读取文本文件并且所有行/列的长度不同时,我提出了另一个关于错误的问题ValueError: Columns must be same length as key,我尝试使用谷歌搜索但没有得到答案,因为我不希望它们被跳过。

错误

b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n'

我的熊猫数据框生成器

#!/usr/bin/python3
import pandas as pd
#
cvc_file = pd.read_csv('kids_cvc',header=None,error_bad_lines=False)
cvc_file[['cols', 0]] = cvc_file[0].str.split(':', expand=True) #Split first column on ':'
df = cvc_file.set_index('cols').transpose()  #set_index and transpose
print(df)

结果

$ ./read_cvc.py
b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n'
cols     ab     ad     an     ed    eg     et     en     eck     ell     it     id     ig     im     ish     ob     og      ock     ut     ub     ug     um     un     ud     uck     ush
0       cab    bad    ban    bed   beg    bet    den    beck    bell    bit    bid    big    dim    fish    cob    bog     dock    but    cub    bug    bum    bun    bud    buck    gush
1       dab    dad    can    fed   keg    get    hen    deck    cell    fit    did    dig    him    dish    gob    cog     lock    cut    hub    dug    gum    fun    cud    duck    hush
2       gab    had    fan    led   leg    jet    men    neck    dell    hit    hid    fig    rim    wish    job    dog     rock    gut    nub    hug    hum    gun    dud    luck    lush
3       jab    lad    man    red   peg    let    pen    peck    jell    kit    kid    gig   brim   swish    lob    fog     sock    hut    rub    jug    mum    nun    mud    muck    mush
4       lab    mad    pan    wed   NaN    met    ten   check    sell    lit    lid    jig   grim     NaN    mob    hog     tock    jut    sub    lug    sum    pun   spud    puck    rush
5       nab    pad    ran   bled   NaN    net   then   fleck    tell    pit    rid    pig   skim     NaN    rob    jog    block    nut    tub    mug   chum    run   stud    suck   blush

文件内容

$ cat kids_cvc
ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab
at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat
ad: bad, dad, had, lad, mad, pad, sad, tad, glad
an: ban, can, fan, man, pan, ran, tan, van, clan, plan, scan, than
ag: bag, gag, hag, lag, nag, rag, sag, tag, wag, brag, drag, flag, snag, stag
ap: cap, gap, lap, map, nap, rap, sap, tap, yap, zap, chap, clap, flap, slap, snap, trap
am: bam, dam, ham, jam, ram, yam, clam, cram, scam, slam, spam, swam, tram, wham
ack: back, hack, jack, lack, pack, rack, sack, tack, black, crack, shack, snack, stack, quack, track
ash: bash, cash, dash, gash, hash, lash, mash, rash, sash, clash, crash, flash, slash, smash
ed: bed, fed, led, red, wed, bled, bred, fled, pled, sled, shed
eg: beg, keg, leg, peg
et: bet, get, jet, let, met, net, pet, set, vet, wet, yet, fret
en: den, hen, men, pen, ten, then, when
eck: beck, deck, neck, peck, check, fleck, speck, wreck
ell: bell, cell, dell, jell, sell, tell, well, yell, dwell, shell, smell, spell, swell
it: bit, fit, hit, kit, lit, pit, sit, wit, knit, quit, slit, spit
id: bid, did, hid, kid, lid, rid, skid, slid
ig: big, dig, fig, gig, jig, pig, rig, wig, zig, twig
im: dim, him, rim, brim, grim, skim, slim, swim, trim, whim
ip: dip, hip, lip, nip, rip, sip, tip, zip, chip, clip, drip, flip, grip, ship, skip, slip, snip, trip, whip
ick: kick, lick, nick, pick, sick, tick, wick, brick, chick, click, flick, quick, slick, stick, thick, trick
ish: fish, dish, wish, swish
in: bin, din, fin, pin, sin, tin, win, chin, grin, shin, skin, spin, thin, twin
ot: cot, dot, got, hot, jot, lot, not, pot, rot, tot, blot, knot, plot, shot, slot, spot
ob: cob, gob, job, lob, mob, rob, sob, blob, glob, knob, slob, snob
og: bog, cog, dog, fog, hog, jog, log, blog, clog, frog
op: cop, hop, mop, pop, top, chop, crop, drop, flop, glop, plop, shop, slop, stop
ock: dock, lock, rock, sock, tock, block, clock,  flock, rock, shock, smock, stock
ut: but, cut, gut, hut, jut, nut, rut, shut
ub: cub, hub, nub, rub, sub, tub, grub, snub, stub
ug: bug, dug, hug, jug, lug, mug, pug, rug, tug, drug, plug, slug, snug
um: bum, gum, hum, mum, sum, chum, drum, glum, plum, scum, slum
un: bun, fun, gun, nun, pun, run, sun, spun, stun
ud: bud, cud, dud, mud, spud, stud, thud
uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck
ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush

笔记:

它将第一行/列作为具有 13 个值的主行/列,并跳过所有超过 13 列的列。

标签: python-3.xpandasdataframe

解决方案


我想不出一种扩展列的熊猫方式,但是将行转换为字典让事情变得更容易。

ss = '''
ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab
at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat
ad: bad, dad, had, lad, mad, pad, sad, tad, glad
.......
un: bun, fun, gun, nun, pun, run, sun, spun, stun
ud: bud, cud, dud, mud, spud, stud, thud
uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck
ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush
'''.strip()

with open ('kids.cvc','w') as f: f.write(ss)  # write data file

######################################

import pandas as pd

dd = {}
maxcnt=0
with open('kids.cvc') as f:
   lines = f.readlines()
   for line in lines:
      line = line.strip()  # remove \n
      len1 = len(line) # words have leading space 
      line = line.replace(' ','')
      cnt = len1 - len(line)  # get word (space) count
      if cnt > maxcnt: maxcnt = cnt  # max word count
      rec = line.split(':')  # header : words
      dd[rec[0]] = rec[1].split(',')  # split words

   for k in dd:
      dd[k] = dd[k] + ['']*(maxcnt-len(dd[k])) # add extra values to match max column

df = pd.DataFrame(dd) # convert dictionary to dataframe
print(df.to_string(index=False))

输出

   ab    at    ad    an    ag    ap    am    ack    ash    ed   eg    et    en    eck    ell    it    id    ig    im    ip    ick    ish    in    ot    ob    og    op    ock    ut    ub    ug    um    un    ud    uck    ush
  cab   bat   bad   ban   bag   cap   bam   back   bash   bed  beg   bet   den   beck   bell   bit   bid   big   dim   dip   kick   fish   bin   cot   cob   bog   cop   dock   but   cub   bug   bum   bun   bud   buck   gush
  dab   cat   dad   can   gag   gap   dam   hack   cash   fed  keg   get   hen   deck   cell   fit   did   dig   him   hip   lick   dish   din   dot   gob   cog   hop   lock   cut   hub   dug   gum   fun   cud   duck   hush
  gab   fat   had   fan   hag   lap   ham   jack   dash   led  leg   jet   men   neck   dell   hit   hid   fig   rim   lip   nick   wish   fin   got   job   dog   mop   rock   gut   nub   hug   hum   gun   dud   luck   lush
  jab   hat   lad   man   lag   map   jam   lack   gash   red  peg   let   pen   peck   jell   kit   kid   gig  brim   nip   pick  swish   pin   hot   lob   fog   pop   sock   hut   rub   jug   mum   nun   mud   muck   mush
  lab   mat   mad   pan   nag   nap   ram   pack   hash   wed        met   ten  check   sell   lit   lid   jig  grim   rip   sick          sin   jot   mob   hog   top   tock   jut   sub   lug   sum   pun  spud   puck   rush
  nab   pat   pad   ran   rag   rap   yam   rack   lash  bled        net  then  fleck   tell   pit   rid   pig  skim   sip   tick          tin   lot   rob   jog  chop  block   nut   tub   mug  chum   run  stud   suck  blush
  tab   rat   sad   tan   sag   sap  clam   sack   mash  bred        pet  when  speck   well   sit  skid   rig  slim   tip   wick          win   not   sob   log  crop  clock   rut  grub   pug  drum   sun  thud   tuck  brush
 blab   sat   tad   van   tag   tap  cram   tack   rash  fled        set        wreck   yell   wit  slid   wig  swim   zip  brick         chin   pot  blob  blog  drop  flock  shut  snub   rug  glum  spun         yuck  crush
 crab   vat  glad  clan   wag   yap  scam  black   sash  pled        vet               dwell  knit         zig  trim  chip  chick         grin   rot  glob  clog  flop   rock        stub   tug  plum  stun        chuck  flush
 grab  brat        plan  brag   zap  slam  crack  clash  sled        wet               shell  quit        twig  whim  clip  click         shin   tot  knob  frog  glop  shock              drug  scum              cluck  slush
 scab  chat        scan  drag  chap  spam  shack  crash  shed        yet               smell  slit                    drip  flick         skin  blot  slob        plop  smock              plug  slum              pluck
 stab  flat        than  flag  clap  swam  snack  flash             fret               spell  spit                    flip  quick         spin  knot  snob        shop  stock              slug                    stuck
 slab  gnat              snag  flap  tram  stack  slash                                swell                          grip  slick         thin  plot              slop                     snug                    truck
       spat              stag  slap  wham  quack  smash                                                               ship  stick         twin  shot              stop
                               snap        track                                                                      skip  thick               slot
                               trap                                                                                   slip  trick               spot

推荐阅读