python - Pandas.apply returning random substrings
问题描述
pandas.apply function returning random substrings instead of full string
I've already tried:
def extract_ticker(title):
for word in title:
word_str = word.encode('utf-8')
if word_str in constituents['Symbol'].values:
return word_str
sp500news3['tickers'] = sp500news3['title'].apply(extract_ticker)
which returns
sp500news3['tickers']
79944 M
181781 M
213175 C
93554 C
257327 T
instead of expected output
79944 MSFT
181781 WMB
213175 CSX
93554 C
257327 TWX
Create sample from below
constituents = pd.DataFrame({"Symbol":["TWX","C","MSFT","WMB"]})
sp500news3 = pd.DataFrame({"title":["MSFT Vista corporate sales go very well","WMB No Anglican consensus on Episcopal Church","CSX quarterly profit rises",'C says 30 bln capital helps exceed target','TWX plans cable spinoff']})
解决方案
为什么不使用正则表达式提取股票代码呢?
tickers = ('TWX', 'C', 'MSFT', 'WMB')
regex = '({})'.format('|'.join(tickers))
sp500news3['tickers'] = sp500news3['title'].str.extract(regex)
推荐阅读
- lua - 尝试替换 lua 文件中的参数时出现问题
- mysql - mysql对所有列求和并插入表中
- html - 在 iOS 设备上发出 POST 请求后保持相同的网页(Django 发送 204 状态响应)
- angular - 在Angular中使用工厂模式时如何从DOM中获取元素
- javascript - 错误:元素类型无效,应为字符串
- javascript - 是否有任何工具可以从我的 JavaScript 代码中检测/避免 IE 专有语法?
- python - 查找列表中所有元素的百分位数
- linux - Linux将不以字符串开头的txt文件移动到另一个文件夹
- css - 自动保存图像或按按钮
- node.js - /usr/bin/env 'node' 权限被拒绝