首页 > 解决方案 > 如何计算字典中单词的词频?

问题描述

我有一本像下面这样的字典:

[{'mississippi': 1, 'worth': 1, 'reading': 1}, {'commonplace': 1, 'river': 1, 'contrary': 1, 'ways': 1, 'remarkable': 1}, {'considering': 1, 'missouri': 1, 'main': 1, 'branch': 1, 'longest': 1, 'river': 1, 'world--four': 1}, {'seems': 1, 'safe': 1, 'crookedest': 1, 'river': 1, 'part': 1, 'journey': 1, 'uses': 1, 'cover': 1, 'ground': 1, 'crow': 1, 'fly': 1, 'six': 1, 'seventy-five': 1}, {'discharges': 1, 'water': 1, 'st': 1}, {'lawrence': 1, 'twenty-five': 1, 'rhine': 1, 'thirty-eight': 1, 'thames': 1}, {'river': 1, 'vast': 1, 'drainage-basin:': 1, 'draws': 1, 'water': 1, 'supply': 1, 'twenty-eight': 1, 'states': 1, 'territories': 1, 'delaware': 1, 'atlantic': 1, 'seaboard': 1, 'country': 1, 'idaho': 1, 'pacific': 1, 'slope--a': 1, 'spread': 1, 'forty-five': 1, 'degrees': 1, 'longitude': 1}, {'mississippi': 1, 'receives': 1, 'carries': 1, 'gulf': 1, 'water': 1, 'fifty-four': 1, 'subordinate': 1, 'rivers': 1, 'navigable': 1, 'steamboats': 1, 'hundreds': 1, 'flats': 1, 'keels': 1}, {'area': 1, 'drainage-basin': 1, 'combined': 1, 'areas': 1, 'england': 1, 'wales': 1, 'scotland': 1, 'ireland': 1, 'france': 1, 'spain': 1, 'portugal': 1, 'germany': 1, 'austria': 1, 'italy': 1, 'turkey': 1, 'almost': 1, 'wide': 1, 'region': 1, 'fertile': 1, 'mississippi': 1, 'valley': 1, 'proper': 1, 'exceptionally': 1}]

我想将其更改为我想要的输出,如下所示,以计算两个目标词之间的相似度得分:

river 4
    ground: 1
    journey: 1
    longitude: 1
    main: 1
    world--four: 1
    contrary: 1
    cover: 1
    delaware: 1
    remarkable: 1
    vast: 1
    forty-five: 1
    crookedest: 1
    territories: 1
    spread: 1
    country: 1
    longest: 1
    fly: 1
    atlantic: 1
    crow: 1
    supply: 1
    seems: 1
    idaho: 1
    seaboard: 1
    states: 1
    ways: 1
    degrees: 1
    part: 1
    twenty-eight: 1
    pacific: 1
    branch: 1
    water: 1
    considering: 1
    six: 1
    safe: 1
    commonplace: 1
    draws: 1
    drainage-basin: 1
    uses: 1
    seventy-five: 1
    slope--a: 1
    missouri: 1
mississippi 3
    area: 1
    steamboats: 1
    germany: 1
    reading: 1
    france: 1
    proper: 1
    fifty-four: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    carries: 1
    combined: 1
    flats: 1
    receives: 1
    england: 1
    italy: 1
    scotland: 1
    wales: 1
    almost: 1
    navigable: 1
    austria: 1
    region: 1
    wide: 1
    spain: 1
    subordinate: 1
    drainage-basin: 1
    hundreds: 1
    keels: 1
    portugal: 1
    water: 1
    gulf: 1
    ireland: 1
    rivers: 1
    valley: 1
    fertile: 1
    worth: 1
water 3
    steamboats: 1
    spread: 1
    country: 1
    states: 1
    longitude: 1
    fifty-four: 1
    pacific: 1
    vast: 1
    subordinate: 1
    carries: 1
    keels: 1
    flats: 1
    supply: 1
    receives: 1
    atlantic: 1
    forty-five: 1
    river: 1
    rivers: 1
    idaho: 1
    mississippi: 1
    seaboard: 1
    navigable: 1
    discharges: 1
    degrees: 1
    twenty-eight: 1
    drainage-basin: 1
    hundreds: 1
    st: 1
    gulf: 1
    draws: 1
    delaware: 1
    territories: 1
    slope--a: 1
drainage-basin 2
    area: 1
    spread: 1
    country: 1
    states: 1
    mississippi: 1
    longitude: 1
    france: 1
    proper: 1
    vast: 1
    turkey: 1
    forty-five: 1
    areas: 1
    combined: 1
    germany: 1
    exceptionally: 1
    valley: 1
    supply: 1
    fertile: 1
    atlantic: 1
    italy: 1
    river: 1
    idaho: 1
    wales: 1
    almost: 1
    seaboard: 1
    spain: 1
    austria: 1
    region: 1
    degrees: 1
    twenty-eight: 1
    wide: 1
    england: 1
    portugal: 1
    water: 1
    ireland: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    scotland: 1
    slope--a: 1
area 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
journey 1
    ground: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
seems 1
    ground: 1
    journey: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
states 1
    spread: 1
    country: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
slope--a 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
remarkable 1
    contrary: 1
    river: 1
    commonplace: 1
    ways: 1
vast 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    pacific: 1
    forty-five: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
forty-five 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    pacific: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
crookedest 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
carries 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
germany 1
    area: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
longest 1
    main: 1
    river: 1
    world--four: 1
    branch: 1
    missouri: 1
    considering: 1
flats 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    rivers: 1
    receives: 1
supply 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    twenty-eight: 1
    river: 1
    idaho: 1
receives 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
crow 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
scotland 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    spain: 1
    italy: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
country 1
    spread: 1
    idaho: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
thames 1
    thirty-eight: 1
    rhine: 1
    lawrence: 1
    twenty-five: 1
england 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    region: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
navigable 1
    mississippi: 1
    steamboats: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
austria 1
    area: 1
    germany: 1
    mississippi: 1
    france: 1
    proper: 1
    region: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    exceptionally: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
rhine 1
    thirty-eight: 1
    thames: 1
    lawrence: 1
    twenty-five: 1
part 1
    ground: 1
    journey: 1
    seems: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
twenty-eight 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
branch 1
    main: 1
    longest: 1
    river: 1
    world--four: 1
    missouri: 1
    considering: 1
hundreds 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
st 1
    water: 1
    discharges: 1
considering 1
    main: 1
    longest: 1
    river: 1
    world--four: 1
    branch: 1
    missouri: 1
six 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    fly: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
gulf 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    flats: 1
    rivers: 1
    receives: 1
ireland 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    valley: 1
safe 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
commonplace 1
    contrary: 1
    river: 1
    remarkable: 1
    ways: 1
draws 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    supply: 1
    delaware: 1
    territories: 1
    atlantic: 1
    twenty-eight: 1
    river: 1
    idaho: 1
delaware 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
thirty-eight 1
    thames: 1
    rhine: 1
    lawrence: 1
    twenty-five: 1
longitude 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
world--four 1
    main: 1
    longest: 1
    river: 1
    branch: 1
    missouri: 1
    considering: 1
lawrence 1
    thirty-eight: 1
    thames: 1
    rhine: 1
    twenty-five: 1
ground 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
steamboats 1
    mississippi: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
spread 1
    seaboard: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
idaho 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
reading 1
    mississippi: 1
    worth: 1
almost 1
    area: 1
    germany: 1
    austria: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    mississippi: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
contrary 1
    river: 1
    remarkable: 1
    commonplace: 1
    ways: 1
cover 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
france 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
spain 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
pacific 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
turkey 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
fifty-four 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    hundreds: 1
    keels: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
subordinate 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
territories 1
    spread: 1
    idaho: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    supply: 1
    atlantic: 1
    slope--a: 1
    river: 1
    country: 1
combined 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
exceptionally 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    region: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
region 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
twenty-five 1
    thirty-eight: 1
    thames: 1
    lawrence: 1
    rhine: 1
rivers 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    receives: 1
fly 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
atlantic 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    river: 1
    supply: 1
    twenty-eight: 1
    idaho: 1
italy 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
main 1
    world--four: 1
    longest: 1
    river: 1
    branch: 1
    missouri: 1
    considering: 1
areas 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
seaboard 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
fertile 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
ways 1
    contrary: 1
    river: 1
    remarkable: 1
    commonplace: 1
discharges 1
    water: 1
    st: 1
degrees 1
    spread: 1
    country: 1
    states: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
wide 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
proper 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
keels 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    water: 1
    fifty-four: 1
    hundreds: 1
    subordinate: 1
    carries: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
portugal 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    ireland: 1
    valley: 1
worth 1
    mississippi: 1
    reading: 1
uses 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    fly: 1
    seventy-five: 1
    river: 1
seventy-five 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    river: 1
    fly: 1
valley 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
missouri 1
    main: 1
    longest: 1
    river: 1
    branch: 1
    world--four: 1
    considering: 1
wales 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1

第一行是目标词及其在整个词典中的频率。下面是相关词及其在与目标词相同的句子中的频率。与第一本词典一样,与“mississippi”相关的profile将包含对“worth”和“reading”的引用,它们在句子中的词频为1,但mississippi的词频在整个词典中为3。我想按降序对目标词的词频进行排序。任何人都可以帮忙吗?

标签: pythondictionary

解决方案


从您想要的输出和代码中都不清楚您到底想要实现什么,但如果它只是计算单个句子中的单词,那么策略应该是:

  1. 阅读您common.txt的内容set以进行快速查找。
  2. 阅读您的sample.txt并拆分.以获取单个句子。
  3. 清除所有非单词字符(您必须定义它们或使用正则表达式\b来捕获单词边界)并用空格替换它们。
  4. 拆分空格并计算步骤 1中不存在的set单词。

所以:

import collections

with open("common.txt", "r") as f:  # open the `common.txt` for reading
    common_words = {l.strip().lower() for l in f}  # read each line and and add it to a set

interpunction = ";,'\""  # define word separating characters and create a translation table
trans_table = str.maketrans(interpunction, " " * len(interpunction))

sentences_counter = []  # a list to hold a word count for each sentence
with open("sample.txt", "r") as f:  # open the `sample.txt` for reading
    # read the whole file to include linebreaks and split on `.` to get individual sentences
    sentences = [s for s in f.read().split(".") if s.strip()]  # ignore empty sentences
    for sentence in sentences:  # iterate over each sentence
        sentence = sentence.translate(trans_table)  # replace the interpunction with spaces
        word_counter = collections.defaultdict(int)  # a string:int default dict for counting
        for word in sentence.split():  # split the sentence and iterate over the words
            if word.lower() not in common_words:  # count only words not in the common.txt
                word_counter[word.lower()] += 1
        sentences_counter.append(word_counter)  # add the current sentence word count

注意:在 Python 2.x 上使用string.maketrans()而不是str.maketrans().

这将生成sentences_counter包含 中每个句子的字典计数sample.txt,其中键是实际单词,其关联值是单词计数。您可以将结果打印为:

for i, v in enumerate(sentences_counter):
    print("Sentence #{}:".format(i+1))
    print("\n".join("\t{}: {}".format(w, c) for w, c in v.items()))

这将产生(对于您的样本数据):

第 1 句:
    面积:1
    排水盆:1
    伟大的:1
    合计:1
    领域:1
    英格兰:1
    威尔士:1
    宽:1
    地区:1
    肥沃的:1
句子#2:
    密西西比州:1
    山谷:1
    正确:1
    异常:1

请记住,(英语)语言比这更复杂 - 例如,“一只猫在生气时摇尾巴,所以远离 ”取决于你如何对待撇号,会有很大的不同此外,点不一定表示句子的结尾。如果你想做严肃的语言分析,你应该研究NLP 。

更新:虽然我看不到重复每个单词重复数据的有用性(计数不会在一个句子中改变)如果你想打印每个单词并将所有其他计数嵌套在下面,你可以添加一个内部打印时循环:

for i, v in enumerate(sentences_counter):
    print("Sentence #{}:".format(i+1))
    for word, count in v.items():
        print("\t{} {}".format(word, count))
        print("\n".join("\t\t{}: {}".format(w, c) for w, c in v.items() if w != word))

这会给你:

第 1 句:
    1区
        排水盆:1
        伟大的:1
        合计:1
        领域:1
        英格兰:1
        威尔士:1
        宽:1
        地区:1
        肥沃的:1
    排水盆 1
        面积:1
        伟大的:1
        合计:1
        领域:1
        英格兰:1
        威尔士:1
        宽:1
        地区:1
        肥沃的:1
    伟大的 1
        面积:1
        排水盆:1
        合计:1
        领域:1
        英格兰:1
        威尔士:1
        宽:1
        地区:1
        肥沃的:1
    组合 1
        面积:1
        排水盆:1
        伟大的:1
        领域:1
        英格兰:1
        威尔士:1
        宽:1
        地区:1
        肥沃的:1
    区域 1
        面积:1
        排水盆:1
        伟大的:1
        合计:1
        英格兰:1
        威尔士:1
        宽:1
        地区:1
        肥沃的:1
    英格兰 1
        面积:1
        排水盆:1
        伟大的:1
        合计:1
        领域:1
        威尔士:1
        宽:1
        地区:1
        肥沃的:1
    威尔士 1
        面积:1
        排水盆:1
        伟大的:1
        合计:1
        领域:1
        英格兰:1
        宽:1
        地区:1
        肥沃的:1
    宽 1
        面积:1
        排水盆:1
        伟大的:1
        合计:1
        领域:1
        英格兰:1
        威尔士:1
        地区:1
        肥沃的:1
    区域 1
        面积:1
        排水盆:1
        伟大的:1
        合计:1
        领域:1
        英格兰:1
        威尔士:1
        宽:1
        肥沃的:1
    肥沃的 1
        面积:1
        排水盆:1
        伟大的:1
        合计:1
        领域:1
        英格兰:1
        威尔士:1
        宽:1
        地区:1
句子#2:
    密西西比 1
        山谷:1
        正确:1
        异常:1
    谷 1
        密西西比州:1
        正确:1
        异常:1
    正确的 1
        密西西比州:1
        山谷:1
        异常:1
    异常 1
        密西西比州:1
        山谷:1
        正确:1

随意删除句号打印并减少选项卡缩进之一,以从您的问题中获得更多所需的输出。如果您更喜欢的话,您还可以构建一个树状字典,而不是将所有内容打印到 STDOUT。

更新 2:如果你愿意,你不必setcommon_words. 在这种情况下,它几乎可以与 a 互换,list因此您可以使用列表推导而不是集合推导(即用方括号替换大括号),但是通过 a查找是list一种操作,O(n)set查找是一种O(1)操作,因此set这里首选 a。更不用说自动重复数据删除的附带好处,以防common.txt有重复的单词。

至于collections.defaultdict()它的存在只是为了节省我们一些编码/检查,方法是在请求时自动将字典初始化为一个键 - 没有它你必须手动完成:

with open("common.txt", "r") as f:  # open the `common.txt` for reading
    common_words = {l.strip().lower() for l in f}  # read each line and and add it to a set

interpunction = ";,'\""  # define word separating characters and create a translation table
trans_table = str.maketrans(interpunction, " " * len(interpunction))

sentences_counter = []  # a list to hold a word count for each sentence
with open("sample.txt", "r") as f:  # open the `sample.txt` for reading
    # read the whole file to include linebreaks and split on `.` to get individual sentences
    sentences = [s for s in f.read().split(".") if s.strip()]  # ignore empty sentences
    for sentence in sentences:  # iterate over each sentence
        sentence = sentence.translate(trans_table)  # replace the interpunction with spaces
        word_counter = {}  # initialize a word counting dictionary
        for word in sentence.split():  # split the sentence and iterate over the words
            word = word.lower()  # turn the word to lowercase
            if word not in common_words:  # count only words not in the common.txt
                word_counter[word] = word_counter.get(word, 0) + 1  # increase the last count
        sentences_counter.append(word_counter)  # add the current sentence word count

更新 3:如果您只想要一个原始单词列表,就像您上次更新问题时所显示的那样,您甚至不需要考虑句子本身 - 只需在插值列表中添加一个点,阅读文件行按行,在空格上拆分并像以前一样计算单词:

import collections

with open("common.txt", "r") as f:  # open the `common.txt` for reading
    common_words = {l.strip().lower() for l in f}  # read each line and and add it to a set

interpunction = ";,'\"."  # define word separating characters and create a translation table
trans_table = str.maketrans(interpunction, " " * len(interpunction))

sentences_counter = []  # a list to hold a word count for each sentence

word_counter = collections.defaultdict(int)  # a string:int default dict for counting
with open("sample.txt", "r") as f:  # open the `sample.txt` for reading
    for line in f:  # read the file line by line
        for word in line.translate(trans_table).split():  # remove interpunction and split
            if word.lower() not in common_words:  # count only words not in the common.txt
                word_counter[word.lower()] += 1  # increase the count

print("\n".join("{}: {}".format(w, c) for w, c in word_counter.items()))  # print the counts

推荐阅读