首页 > 解决方案 > 如何创建带有循环的 DataFrame?

问题描述

data = {'col1':['Country', 'State', 'City', 'park' ,'avenue'],
       'col2':['County','stats','PARK','Avenue', 'cities']}



    col1     col2
0   Country   County
1   State     stats
2   City      PARK
3   park      Avenue
4   avenue    cities

我试图用模糊模糊技术匹配两列的名称并按分数排序。

输出:

col1    col2   score  order
0 Country County  92     1
1 Country stats   31     2
2 Country PARK    18     3
3 Country Avenue  17     4
4 Country cities  16     5
5 State   County  80     1
6 State   stats   36     2
7 State   PARK    22     3
8 State   Avenue  18     4
9 State   cities  16     5
.....

我做了什么:

'''

from fuzzywuzzy import fuzz
import pandas as pd
import numpy as np

    for i in df.col1:
        for j in df.col2:
            print(i,j,fuzz.token_set_ratio(i, j))

'''

我被困在这里..

标签: pythonpython-3.xpandaspython-2.7fuzzy-logic

解决方案


让我们做

df['score']=df.apply(lambda x : fuzz.ratio(x['col1'],x['col2']),1)
df['score']
0    92
1    60
2     0
3     0
4    17
dtype: int64

然后

df['order']=(-df['score']).groupby(df['col1']).rank(method='first')

推荐阅读