首页 > 解决方案 > 有人在使用 Trafilatura 时遇到重复文本的问题吗?


尝试使用trafilatura抓取以下网页时遇到重复问题,即使我设置了deduplicate=True. 有谁知道这是否是软件包的缺点,或者是否有任何参数我可以切换以摆脱这种行为?


import trafilatura

downloaded = trafilatura.fetch_url(url)
text = trafilatura.extract(downloaded, target_language='en', include_tables=True, deduplicate=True)



**Notation Vote**
By notation vote completed on May 19, 2020, the Committee unanimously approved the minutes of the Committee meeting held on April 28–29, 2020.
Notation Vote
By notation vote completed on May 19, 2020, the Committee unanimously approved the minutes of the Committee meeting held on April 28–29, 2020. 
**Staff Economic Outlook**
The projection for the U.S. economy prepared by the staff for the June FOMC meeting was downgraded, on balance, as compared with the April meeting forecast in response to information on the spread of the coronavirus and changes in the measures undertaken to contain it both at home and abroad, along with incoming economic data. U.S. real GDP was forecast to show a historically large decline in the second quarter of this year, and the unemployment rate was expected to be sharply higher than in the first quarter. The substantial fiscal policy measures and appreciable support from monetary policy, along with the Federal Reserve's liquidity and lending facilities, were expected to help mitigate the deterioration in current economic conditions and to help boost the recovery.
Staff Economic Outlook
The projection for the U.S. economy prepared by the staff for the June FOMC meeting was downgraded, on balance, as compared with the April meeting forecast in response to information on the spread of the coronavirus and changes in the measures undertaken to contain it both at home and abroad, along with incoming economic data. U.S. real GDP was forecast to show a historically large decline in the second quarter of this year, and the unemployment rate was expected to be sharply higher than in the first quarter. The substantial fiscal policy measures and appreciable support from monetary policy, along with the Federal Reserve's liquidity and lending facilities, were expected to help mitigate the deterioration in current economic conditions and to help boost the recovery.

标签: pythonhtmlparsingxml-parsingscrape

