首页 > 解决方案 > 美丽的汤输出到数据框

问题描述

我正在用漂亮的汤刮谷歌学者。使用下面的代码,我只得到数据帧的第一行。需要在数据帧中逐行获取三个输出。不知道如何去做,我是新手。谢谢

查询 = ['10.1371/journal.pone.0213627', '10.1186/s13223-019-0377-7', '10.1371/journal.pmed.1002751']

出版物= []

with requests.Session() as s:
    for query in queries:
        url = 'https://scholar.google.com/scholar?q=' + query + '&ie=UTF-8&oe=UTF-8&hl=en&btnG=Search'
        r = s.get(url)
        soup = bs(r.content, 'lxml') # or 'html.parser'
        title = soup.select_one('.gs_rt a')
        if title is None:
            title = 'No title'
            link = 'No link'
        else:  
            link = title['href']
            title = title.text
        citations = soup.select_one('[title=Cite] + a')
        if citations is None:
            citations = 'No citation count'
        else:
             citations = citations.text
       
publications.append ((title, link,citations))

df = pd.DataFrame(publications)

df

输出

在此处输入图像描述

标签: pythonpandasdataframebeautifulsoup

解决方案


您可以尝试以下代码,而不是:

df = pd.DataFrame(publications)

采用

df = pd.DataFrame({"content":publications})

推荐阅读