python-3.x - append() 方法的意外行为。为什么在向列表中添加字典时,列表的先前元素会被覆盖?
问题描述
以下代码检查新闻文本中是否存在公司名称(来自tickerList)或其片段(来自newsList)。如果在新闻 打印中找到公司时给出了公司的预期股票代码,但是在将此新闻添加到列表后,会发生一些废话:(看起来,当将字典(新闻)附加到列表时(tickersNews ),列表的前面的元素被覆盖了。为什么?
应该注意的是,当作为字典附加的新闻转换为字符串时,一切正常
import re
tickersList = [('ATI', 'Allegheny rporated', 'Allegheny Technologies Incorporated'), ('ATIS', 'Attis', 'Attis Industries, Inc.'), ('ATKR', 'Atkore International Group', 'Atkore International Group Inc.'), ('ATMP', 'Barclays + Select M', 'Barclays ETN+ Select MLP'), ('ATNM', 'Actinium', 'Actinium Pharmaceuticals, Inc.'), ('ATNX', 'Athenex', 'Athenex, Inc.'), ('ATOS', 'Atossa Genetics', 'Atossa Genetics Inc.'), ('ATRA', 'Atara Biotherapeutics', 'Atara Biotherapeutics, Inc.'), ('ATRC', 'AtriCure', 'AtriCure, Inc.'), ('ATRO', 'Astronics', 'Astronics Corporation'), ('ATRS', 'Antares Pharma', 'Antares Pharma, Inc.'), ('ATSG', 'Air Transport Services Group', 'Air Transport Services Group, Inc.'), ('CJ', 'C&J Energy', 'C&J Energy Services, Inc.'), ('CJJD', 'China Jo-Jo Drugstores', 'China Jo-Jo Drugstores, Inc.'), ('CLAR', 'Clarus', 'Clarus Corporation'), ('CLD', 'Cloud Peak Energy', 'Cloud Peak Energy Inc.'), ('CLDC', 'China Lending', 'China Lending Corporation'), ('CLDR', 'Cloudera', 'Cloudera, Inc.')]
newsList = [
{'title':'Atara Biotherapeutics Announces Planned Chief Executive Officer Transition'},
{'title':'Chongqing Jingdong Pharmaceutical and Athenex Announce a Strategic Partnership and Licensing Agreement to Develop and Commercialize KX2-391 in China'}
]
tickersNews = []
for news in newsList:
# pass through the list of companies looking for their mention in the news
for ticker, company, company_full in tickersList:
# clear the full name of the company from brackets, spaces, articles,
# points and commas and save fragments of the full name to the list
companyFullFragments = company_full.replace(',', '')\
.replace('.', '').replace('The ', ' ')\
.replace('(', ' ').replace(')', ' ')\
.replace(' ', ' ').strip().split()
# looking for a company in the news every time cutting off
# the last fragment from the full company name
for i in range(len(companyFullFragments), 0, -1):
companyFullFragmentsString = ' '.join(companyFullFragments[:i]).strip()
lookFor_company = r'(^|\s){0}(\s|$)'.format(companyFullFragmentsString)
results_company = re.findall(lookFor_company, news['title'])
# if the title of the news contains the name of the company,
# then we add the ticker, the found fragment and the full name
# of the company to the news, print the news and add it to the list
if results_company:
news['ticker'] = ticker#, companyFullFragmentsString, company_full
print(news['ticker'], 'found')
#tickersNews.append(str(news))
#-----------------------------Here is the problem!(?)
tickersNews.append(news)
# move on to the next company
break
print(20*'-', 'appended:')
for news in tickersNews:
print(news['ticker'])
输出(字典列表):
ATRA found
ATNX found
CJJD found
CLDC found
-------------------- appended:
ATRA
CLDC
CLDC
CLDC
输出(字符串列表):
ATRA found
ATNX found
CJJD found
CLDC found
-------------------- appended as a strings:
["{'title': 'Atara Biotherapeutics Announces Planned Chief Executive Officer Transition', 'ticker': 'ATRA'}", "{'title': 'Chongqing Jingdong Pharmaceutical and Athenex Announce a Strategic Partnership and Licensing Agreement to Develop and Commercialize KX2-391 in China', 'ticker': 'ATNX'}", "{'title': 'Chongqing Jingdong Pharmaceutical and Athenex Announce a Strategic Partnership and Licensing Agreement to Develop and Commercialize KX2-391 in China', 'ticker': 'CJJD'}", "{'title': 'Chongqing Jingdong Pharmaceutical and Athenex Announce a Strategic Partnership and Licensing Agreement to Develop and Commercialize KX2-391 in China', 'ticker': 'CLDC'}"]
解决方案
问题源于 2 行:位于 for 循环内news['ticker'] = ticker
。tickersNews.append(news)
您的问题更简单的版本是:
a = 10
a = 20
a = 30
print(a, a, a)
输出将是30 30 30
. 我想这很明显。
要解决问题,您可以使用多种方法。
第一种可能性(最简单)。替换tickersNews.append(news)
为tickersNews.append(news.copy())
。
第二种可能性(首选)。不要使用tickersNews
. 对于每个news
创建空列表news['ticker_list'] = list()
。对于每个ticker
附加到news['ticker_list']
:
import re
tickersList = [('ATI', 'Allegheny rporated', 'Allegheny Technologies Incorporated'), ('ATIS', 'Attis', 'Attis Industries, Inc.'), ('ATKR', 'Atkore International Group', 'Atkore International Group Inc.'), ('ATMP', 'Barclays + Select M', 'Barclays ETN+ Select MLP'), ('ATNM', 'Actinium', 'Actinium Pharmaceuticals, Inc.'), ('ATNX', 'Athenex', 'Athenex, Inc.'), ('ATOS', 'Atossa Genetics', 'Atossa Genetics Inc.'), ('ATRA', 'Atara Biotherapeutics', 'Atara Biotherapeutics, Inc.'), ('ATRC', 'AtriCure', 'AtriCure, Inc.'), ('ATRO', 'Astronics', 'Astronics Corporation'), ('ATRS', 'Antares Pharma', 'Antares Pharma, Inc.'), ('ATSG', 'Air Transport Services Group', 'Air Transport Services Group, Inc.'), ('CJ', 'C&J Energy', 'C&J Energy Services, Inc.'), ('CJJD', 'China Jo-Jo Drugstores', 'China Jo-Jo Drugstores, Inc.'), ('CLAR', 'Clarus', 'Clarus Corporation'), ('CLD', 'Cloud Peak Energy', 'Cloud Peak Energy Inc.'), ('CLDC', 'China Lending', 'China Lending Corporation'), ('CLDR', 'Cloudera', 'Cloudera, Inc.')]
newsList = [
{'title':'Atara Biotherapeutics Announces Planned Chief Executive Officer Transition'},
{'title':'Chongqing Jingdong Pharmaceutical and Athenex Announce a Strategic Partnership and Licensing Agreement to Develop and Commercialize KX2-391 in China'}
]
for news in newsList:
news['ticker_list'] = list()
for ticker, company, company_full in tickersList:
companyFullFragments = company_full.replace(',', '')\
.replace('.', '').replace('The ', ' ')\
.replace('(', ' ').replace(')', ' ')\
.replace(' ', ' ').strip().split()
for i in range(len(companyFullFragments), 0, -1):
companyFullFragmentsString = ' '.join(companyFullFragments[:i]).strip()
lookFor_company = r'(^|\s){0}(\s|$)'.format(companyFullFragmentsString)
results_company = re.findall(lookFor_company, news['title'])
if results_company:
news['ticker_list'].append(ticker)
# print(ticker, 'found')
break
print('tickers for news:')
for news in newsList:
print(news['ticker_list'])
输出将是:
tickers for news:
['ATRA']
['ATNX', 'CJJD', 'CLDC']
推荐阅读
- css - 使用水平滚动创建一系列正方形
- .net - 在单行中写入 if 条件
- python - 如何在Python中有效地替换巨大的半ASCII半二进制文件中的一行
- lytro - 如何使用 Stanford Lytro Light Field Archive 数据集去马赛克?
- javascript - 未处理的拒绝 (TypeError):keytar.setPassword 不是函数
- excel - 是否有 VBA 使用自动填充对不同的数据集进行编号
- swift - 使用tesseract库拍摄身份证并提取文本
- python - 如何绘制带有向外指向向量的虚线圆圈?
- javascript - 不要让一个方格超出第二个方格
- ios - 如何启用从 UIDocumentPickerViewController 中选择文件?