python - 在数据框中附加匹配的列
问题描述
wharton['conm']
我有 2 个与名称匹配的数据集(与匹配xls['coname']
,这是我对第一个代码块所做的,但我想在数据框中添加一个excel
名为的列,它是附加数据点HQ_LOC
的对应值。wharton[city]
matched_name
wharton
数据框是这样的:
GVKEY LPERMCO datadate fyear indfmt consol popsrc datafmt tic conm ... city conml county ggroup gind gsector gsubind idbflag state weburl
0 1004 20000 20090531 2008.0 INDL C D STD AIR AAR CORP ... Wood Dale AAR Corp NaN 2010.0 201010.0 20.0 20101010.0 D IL www.aarcorp.com
1 1004 20000 20100531 2009.0 INDL C D STD AIR AAR CORP ... Wood Dale AAR Corp NaN 2010.0 201010.0 20.0 20101010.0 D IL www.aarcorp.com
2 1004 20000 20110531 2010.0 INDL C D STD AIR AAR CORP ... Wood Dale AAR Corp NaN 2010.0 201010.0 20.0 20101010.0 D IL www.aarcorp.com
3 1004 20000 20120531 2011.0 INDL C D STD AIR AAR CORP ... Wood Dale AAR Corp NaN 2010.0 201010.0 20.0 20101010.0 D IL www.aarcorp.com
4 1004 20000 20130531 2012.0 INDL C D STD AIR AAR CORP ... Wood Dale AAR Corp NaN 2010.0 201010.0 20.0 20101010.0 D IL www.aarcorp.com
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..
wharton['conm']
格式如下:
0 AAR CORP
1 AMERICAN AIRLINES
2 APPLE
3 x
4 y
excel
是这样的
top10 FiscalYear coname H1B_DEPENDENT firm_indian EmployerCity EmployerState total_petitions petitiontype_new petitiontype_cont ... salary_med salary_med_new salary_med_cont salary_med_new_app salary_med_new_den prowess_h1b compustat_h1b prowess_compustat_h1b matched_name HQ_LOC
0 1.0 2010 INFOSYS LTD. 1 0 PLANO TX 7546 4028 3245 ... 60000.0 60000.0 60000.0 60000.0 60000.0 1 1 1 NaN NaN
1 2.0 2010 COGNIZANT TECH SOLUTIONS 1 0 TEANECK NJ 7273 4678 1310 ... 65312.0 59600.0 59600.0 59779.0 56500.0 1 1 1 NaN NaN
2 3.0 2010 MICROSOFT CORP 0 0 NAPERVILLE IL 3696 1947 1283 ... 91000.0 88000.0 88000.0 88000.0 102035.0 1 1 1 NaN NaN
3 4.0 2010 WIPRO LTD. 1 0 EAST BRUNSWICK NJ 3398 2044 320 ... 63586.0 63856.0 63856.0 63856.0 60000.0 1 1 1 NaN NaN
4 5.0 2010 DELOITTE CONSULTING (I) PVT. LTD. 0 1 PHILADELPHIA PA 1522 546 363 ... 85000.0 68000.0 68000.0 68000.0 68000.0 1 0 0 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
19656 NaN 2019 ZSCALER INC 0 0 SAN JOSE CA 7 4 1 ... 145000.0 110864.0 110864.0 110864.0 NaN 0 1 0 NaN NaN
19657 NaN 2019 ZULILY INC 0 0 SEATTLE WA 1 1 0 ... 107000.0 107000.0 107000.0 107000.0 NaN 0 1 0 NaN NaN
19658 NaN 2019 ZUORA INC 0 0 SAN MATEO CA 6 3 0 ... 152500.0 170000.0 170000.0 170000.0 NaN 0 1 0 NaN NaN
19659 NaN 2019 ZYLOG SYSTEMS LTD. 1 1 EDISON NJ 16 0 12 ... 77022.0 NaN NaN NaN NaN 1 0 0 NaN NaN
19660 NaN 2019 ZYNGA INC 0 0 SAN FRANCISCO CA 2 1 1 ... 110500.0 83000.0 83000.0 83000.0 NaN 0 1 0 NaN NaN
我想excel['HQ_LOC']
如下
0 New York City
1 Atlanta
2 Cupertino
3 location x
4 location y
xls = excel[(excel['prowess_compustat_h1b'] == 1) | (excel['compustat_h1b'] == 1)]
excel['matched_name'] = wharton['conm'][wharton['conm'].isin(xls['coname'])] #adds matched_name of wharton to excel sheet
#append city and state of the corresponding coname of matched_name
excel['HQ_LOC'] = pd.Series([])
for i in wharton['matched_name']:
for j in wharton['city']:
if i in excel['matched_name']:
excel['HQ_LOC'].append(j)
print(excel['HQ_LOC'])
但是当我运行以下代码时,我得到以下输出(并且运行速度非常慢)
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
..
19656 NaN
19657 NaN
19658 NaN
19659 NaN
19660 NaN
解决方案
推荐阅读
- php - AWS PHP SDK IAM createPolicy MalformedPolicyDocument
- delphi - 如何从 TMenuItem 上的 ImageList 绘制透明位图?
- python - 使用熊猫数据框的多个图
- javascript - 单击 Book 时以模式打开 iframe
- sql - 在 SQL 中分组后添加额外的总行
- python - 带有请求和已安装 SSL 证书的 Python Https 查询 - 错误 SSL 模块不可用
- dart - 如何提高颤振视频的速度?
- spring-boot - 在 kubernetes pod 中访问 spring boot 控制器端点
- c - 使用简单回溯的 C 语言中的 TSP 问题
- android - 如何检查时间设置是否在android中是自动的