首页 > 解决方案 > 在数据框中附加匹配的列

问题描述

wharton['conm']我有 2 个与名称匹配的数据集(与匹配xls['coname'],这是我对第一个代码块所做的,但我想在数据框中添加一个excel名为的列,它是附加数据点HQ_LOC的对应值。wharton[city]matched_name

wharton数据框是这样的:

GVKEY   LPERMCO datadate    fyear   indfmt  consol  popsrc  datafmt tic conm    ... city    conml   county  ggroup  gind    gsector gsubind idbflag state   weburl
0   1004    20000   20090531    2008.0  INDL    C   D   STD AIR AAR CORP    ... Wood Dale   AAR Corp    NaN 2010.0  201010.0    20.0    20101010.0  D   IL  www.aarcorp.com
1   1004    20000   20100531    2009.0  INDL    C   D   STD AIR AAR CORP    ... Wood Dale   AAR Corp    NaN 2010.0  201010.0    20.0    20101010.0  D   IL  www.aarcorp.com
2   1004    20000   20110531    2010.0  INDL    C   D   STD AIR AAR CORP    ... Wood Dale   AAR Corp    NaN 2010.0  201010.0    20.0    20101010.0  D   IL  www.aarcorp.com
3   1004    20000   20120531    2011.0  INDL    C   D   STD AIR AAR CORP    ... Wood Dale   AAR Corp    NaN 2010.0  201010.0    20.0    20101010.0  D   IL  www.aarcorp.com
4   1004    20000   20130531    2012.0  INDL    C   D   STD AIR AAR CORP    ... Wood Dale   AAR Corp    NaN 2010.0  201010.0    20.0    20101010.0  D   IL  www.aarcorp.com
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..

wharton['conm']格式如下:

0                           AAR CORP
1                           AMERICAN AIRLINES
2                           APPLE
3                           x
4                           y

excel是这样的

top10   FiscalYear  coname  H1B_DEPENDENT   firm_indian EmployerCity    EmployerState   total_petitions petitiontype_new    petitiontype_cont   ... salary_med  salary_med_new  salary_med_cont salary_med_new_app  salary_med_new_den  prowess_h1b compustat_h1b   prowess_compustat_h1b   matched_name    HQ_LOC
0   1.0 2010    INFOSYS LTD.    1   0   PLANO   TX  7546    4028    3245    ... 60000.0 60000.0 60000.0 60000.0 60000.0 1   1   1   NaN NaN
1   2.0 2010    COGNIZANT TECH SOLUTIONS    1   0   TEANECK NJ  7273    4678    1310    ... 65312.0 59600.0 59600.0 59779.0 56500.0 1   1   1   NaN NaN
2   3.0 2010    MICROSOFT CORP  0   0   NAPERVILLE  IL  3696    1947    1283    ... 91000.0 88000.0 88000.0 88000.0 102035.0    1   1   1   NaN NaN
3   4.0 2010    WIPRO LTD.  1   0   EAST BRUNSWICK  NJ  3398    2044    320 ... 63586.0 63856.0 63856.0 63856.0 60000.0 1   1   1   NaN NaN
4   5.0 2010    DELOITTE CONSULTING (I) PVT. LTD.   0   1   PHILADELPHIA    PA  1522    546 363 ... 85000.0 68000.0 68000.0 68000.0 68000.0 1   0   0   NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
19656   NaN 2019    ZSCALER INC 0   0   SAN JOSE    CA  7   4   1   ... 145000.0    110864.0    110864.0    110864.0    NaN 0   1   0   NaN NaN
19657   NaN 2019    ZULILY INC  0   0   SEATTLE WA  1   1   0   ... 107000.0    107000.0    107000.0    107000.0    NaN 0   1   0   NaN NaN
19658   NaN 2019    ZUORA INC   0   0   SAN MATEO   CA  6   3   0   ... 152500.0    170000.0    170000.0    170000.0    NaN 0   1   0   NaN NaN
19659   NaN 2019    ZYLOG SYSTEMS LTD.  1   1   EDISON  NJ  16  0   12  ... 77022.0 NaN NaN NaN NaN 1   0   0   NaN NaN
19660   NaN 2019    ZYNGA INC   0   0   SAN FRANCISCO   CA  2   1   1   ... 110500.0    83000.0 83000.0 83000.0 NaN 0   1   0   NaN NaN

我想excel['HQ_LOC']如下

0                        New York City
1                        Atlanta
2                        Cupertino
3                        location x
4                        location y
xls = excel[(excel['prowess_compustat_h1b'] == 1) | (excel['compustat_h1b'] == 1)]
excel['matched_name'] = wharton['conm'][wharton['conm'].isin(xls['coname'])] #adds matched_name of wharton to excel sheet
#append city and state of the corresponding coname of matched_name
excel['HQ_LOC'] = pd.Series([])
for i in wharton['matched_name']:
    for j in wharton['city']:
        if i in excel['matched_name']:
            excel['HQ_LOC'].append(j)
print(excel['HQ_LOC'])

但是当我运行以下代码时,我得到以下输出(并且运行速度非常慢)

0       NaN
1       NaN
2       NaN
3       NaN
4       NaN
         ..
19656   NaN
19657   NaN
19658   NaN
19659   NaN
19660   NaN

标签: pythonpandas

解决方案


推荐阅读