首页 > 解决方案 > Pandas.apply returning random substrings

问题描述

pandas.apply function returning random substrings instead of full string

I've already tried:

def extract_ticker(title):
    for word in title:
        word_str = word.encode('utf-8')
        if word_str in constituents['Symbol'].values:
            return word_str
sp500news3['tickers'] = sp500news3['title'].apply(extract_ticker)

which returns

sp500news3['tickers'] 

79944        M
181781       M
213175       C
93554        C
257327       T

instead of expected output

79944        MSFT
181781       WMB
213175       CSX
93554        C
257327       TWX

Create sample from below

constituents =  pd.DataFrame({"Symbol":["TWX","C","MSFT","WMB"]})

sp500news3 = pd.DataFrame({"title":["MSFT Vista corporate sales go very well","WMB No Anglican consensus on Episcopal Church","CSX quarterly profit rises",'C says 30 bln capital helps exceed target','TWX plans cable spinoff']})

标签: pythonpandas

解决方案


为什么不使用正则表达式提取股票代码呢?

tickers = ('TWX', 'C', 'MSFT', 'WMB')
regex = '({})'.format('|'.join(tickers))

sp500news3['tickers'] = sp500news3['title'].str.extract(regex)

推荐阅读