首页 > 解决方案 > 如何在 Stack Overflow 上抓取每个问题的“内容”和“链接”?

问题描述

我想爬取Stack Overflow网站上提出的每个问题的“问题内容”和“链接”,但是我爬取的结果是“None”,类型打印为“NoneType”,这很奇怪。

我可以成功爬取问题的标题、投票数和标签,但爬取问题内容和链接失败。

import requests
from bs4 import BeautifulSoup
import re
import json
from selenium import webdriver

browser = webdriver.Chrome('./chromedriver')

browser.get('https://stackoverflow.com/questions/tagged/android?sort=votes&page=1&pagesize=50')

soup = BeautifulSoup(browser.page_source, 'html.parser')
divs = soup.find_all('div', class_ = 'question-summary')

for div in divs:

    #title ok
    y= div.h3.a.text

    #tag ok
    tags = div.find('div', class_ = 'summary').find_all('div')[1].find_all('a')
    h= [tag.text for tag in tags]

    #vote ok
    v = div.div.div.div.div.span.strong.text

    #link??
    l = div.h3.a.href
    #print(type(l))

    #contents??
    b = div.class_excerpt
    #print(b)

我希望问题中的“问题链接”和“内容”也可以提取出来,但不知道如何修改结构。

标签: python-3.xseleniumbeautifulsoup

解决方案


试试下面的代码。它给了我链接和内容。

browser.get('https://stackoverflow.com/questions/tagged/android?sort=votes&page=1&pagesize=50')

soup = BeautifulSoup(browser.page_source, 'html.parser')

links=soup.find_all('a', class_='question-hyperlink')
contents=soup.find_all('div',class_='excerpt')

for link,content in zip(links,contents):
    print('link :' + link['href'] , 'content :' + content.text.replace('\n',' ').strip())

输出:

(link :/questions/2025282/what-is-the-difference-between-px-dip-dp-and-sp', content :What is the difference between Android units of measure? px dip dp sp')
(link :/questions/1109022/close-hide-the-android-soft-keyboard', content :I have an EditText and a Button in my layout.  After writing in the edit field and clicking on the Button, I want to hide the virtual keyboard. I assume that this is a simple piece of code, but where ...')
(link :/questions/13375357/proper-use-cases-for-android-usermanager-isuseragoat', content :I was looking at the new APIs introduced in Android 4.2. While looking at the UserManager class I came across the following method: public boolean isUserAGoat()       Used to determine whether the ...')
(link :/questions/1554099/why-is-the-android-emulator-so-slow-how-can-we-speed-up-the-android-emulator', content :I have got a 2.67\xa0 GHz Celeron processor, and 1.21\xa0 GB of RAM on a x86 Windows XP Professional machine.   My understanding is that the Android Emulator should start fairly quickly on such a ...')
(link :/questions/1555109/stop-edittext-from-gaining-focus-at-activity-startup', u"content :I have an Activity in Android, with two elements: EditText  ListView When my Activity starts, the EditText immediately has input focus (flashing cursor). I don't want any control to have input focus ...")
(link :/questions/2785485/is-there-a-unique-android-device-id', content :Do Android devices have a unique ID, and if so, what is a simple way to access it using Java?')
(link :/questions/151777/how-do-save-an-android-activity-state-using-save-instance-state', u"content :I've been working on the Android SDK platform, and it is a little unclear how to save an application's state. So given this minor re-tooling of the 'Hello, Android' example:  package com.android.hello;...")
(link :/questions/6343166/how-do-i-fix-android-os-networkonmainthreadexception', content :I got an error while running my Android project for RssReader.   Code:  URL url = new URL(urlToRssFeed); SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser parser = factory....')
(link :/questions/101754/is-there-a-way-to-run-python-on-android', content :We are working on an S60 version and this platform has a nice Python API..  However, there is nothing official about Python on Android, but since Jython exists, is there a way to let the snake and the ...')
(link :/questions/541966/lazy-load-of-images-in-listview', content :I am using a ListView to display some images and captions associated with those images. I am getting the images from the Internet. Is there a way to lazy load the images so while the text displays, ...')
(link :/questions/432037/how-do-i-center-text-horizontally-and-vertically-in-a-textview', content :How do I center the text horizontally and vertically in a TextView, so that it appears exactly in the middle of the TextView in Android?')
(link :/questions/2194808/debug-certificate-expired-error-in-eclipse-android-plugins', content :I am using Eclipse Android plugins to build a project, but I am getting this error in the console window:  [2010-02-03 10:31:14 - androidVNC]Error generating final archive: Debug certificate expired ...')
(link :/questions/1016896/get-screen-dimensions-in-pixels', content :I created some custom elements, and I want to programmatically place them to the upper right corner (n pixels from the top edge and m pixels from the right edge). Therefore I need to get the screen ...')
(link :/questions/3572463/what-is-context-on-android', content :In Android programming, what exactly is a Context class and what is it used for?  I read about it on the developer site, but I am unable to understand it clearly.')
(link :/questions/5761960/what-is-the-difference-between-match-parent-and-fill-parent', u"content :I'm a little confused about two XML properties: match_parent and fill_parent. It seems that both are the same. Is there any difference between them?")
(link :/questions/456211/activity-restart-on-rotation-android', u"content :In my Android application, when I rotate the device (slide out the keyboard) then my Activity is restarted (onCreate is called). Now, this is probably how it's supposed to be, but I do a lot of ...")
(link :/questions/3482742/what-is-the-difference-between-gravity-and-layout-gravity-in-android', content :I know we can set the following values to the android:gravity and  android:layout_gravity properties: center center_vertical center_horizontal, etc. But I am confused regarding both of these.  What ...')
(link :/questions/2201917/how-can-i-open-a-url-in-androids-web-browser-from-my-application', content :How to open an URL from code in the built-in web browser rather than within my application?  I tried this:   try {     Intent myIntent = new Intent(Intent.ACTION_VIEW, Uri.parse(download_link));     ...')
(link :/questions/2091465/how-do-i-pass-data-between-activities-in-android-application', content :I have a scenario where, after logging in through a login page, there will be a sign-out button on each activity.  On clicking sign-out, I will be passing the session id of the signed in user to sign-...')
(link :/questions/477572/strange-out-of-memory-issue-while-loading-an-image-to-a-bitmap-object', content :I have a list view with a couple of image buttons on each row. When you click the list row, it launches a new activity. I have had to build my own tabs because of an issue with the camera layout. The ...')
(link :/questions/1678122/must-override-a-superclass-method-errors-after-importing-a-project-into-eclips', content :Anytime I have to re-import my projects into Eclipse (if I reinstalled Eclipse, or changed the location of the projects), almost all of my overridden methods are not formatted correctly, causing the ...')
(link :/questions/4382178/android-sdk-installation-doesnt-find-jdk', u"content :I'm trying to install the Android SDK on my Windows 7 x64 System. jdk-6u23-windows-x64.exe is installed, but the Android SDK setup refuses to proceed because it doesn't find the JDK installation.  Is ...")
(link :/questions/3593420/is-there-a-way-to-get-the-source-code-from-an-apk-file', content :The hard drive on my laptop just crashed and I lost all the source code for an app that I have been working on for the past two months. All I have is the APK file that is stored in my email from when ...')
(link :/questions/15852122/hex-transparency-in-colors', u"content :I'm working on implementing a widget transparency option for my app widget although I'm having some trouble getting the hex color values right. Being completely new to hex color transparency I ...")
(link :/questions/4616095/how-to-get-the-build-version-number-of-your-android-application', content :I need to figure out how to get or make a build number for my Android application. I need the build number to display in the UI.   Do I have to do something with AndroidManifest.xml?')
(link :/questions/2033914/is-quitting-an-application-frowned-upon', content :Moving on in my attempt to learn Android, I just read the following:   Question: Does the user have a choice to kill the application    unless we put a menu option in to kill it? If no such option ...')
(link :/questions/937313/fling-gesture-detection-on-grid-layout', u"content :I want to get fling gesture detection working in my Android application.  What I have is a GridLayout that contains 9 ImageViews. The source can be found here: Romain Guys's Grid Layout.  That file I ...")
(link :/questions/2850573/activity-has-leaked-window-that-was-originally-added', content :What is this error, and why does it happen?  05-17 18:24:57.069: ERROR/WindowManager(18850): Activity com.mypkg.myP has leaked window com.android.internal.policy.impl.PhoneWindow$DecorView@44c46ff0 ...')
(link :/questions/885009/r-cannot-be-resolved-android-error', content :I just downloaded and installed the new Android SDK. I wanted to create a simple application to test drive it.  The wizard created this code:  package eu.mauriziopz.gps;  import android.app.Activity; ...')
(link :/questions/4535298/how-do-i-rotate-the-android-emulator-display', content :How can I rotate the Android emulator display to see it in landscape mode?')
(link :/questions/3028306/download-a-file-with-android-and-showing-the-progress-in-a-progressdialog', content :I am trying to write a simple application that gets updated. For this I need a simple function that can download a file and show the current progress in a ProgressDialog. I know how to do the ...')
(link :/questions/5369682/get-current-time-and-date-on-android', content :How can I get the current time and date in an Android app?')
(link :/questions/16608135/android-studio-add-jar-as-library', u"content :I'm trying to use the new Android Studio but I can't seem to get it working correctly.  I'm using the Gson library to serialize/deserialize JSON-objects. But the library somehow isn't included in the ...")
(link :/questions/2115758/how-do-i-display-an-alert-dialog-on-android', content :I want to display a dialog/popup window with a message to the user that shows "Are you sure you want to delete this entry?" with one button that says \'Delete\'. When Delete is touched, it should delete ...')
(link :/questions/513084/ship-an-application-with-a-database', content :If your application requires a database and it comes with built in data, what is the best way to ship that application? Should I: Precreate the SQLite database and include it in the .apk? Include the ...')
(link :/questions/6495898/findviewbyid-in-fragment', content :I am trying to create an ImageView in a Fragment which will refer to the ImageView element which I have created in the XML for the Fragment. However, the findViewById method only works if I extend an ...')
(link :/questions/2734270/how-do-i-make-links-in-a-textview-clickable', content :I have the following TextView defined:   <TextView android:layout_width="wrap_content"     android:layout_height="wrap_content" android:text="@string/txtCredits"     android:autoLink="web" android:...')
(link :/questions/24885223/why-doesnt-recyclerview-have-onitemclicklistener', content :I was exploring RecyclerView and I was surprised to see that RecyclerView does not have onItemClickListener(). Because RecyclerView extends        android.view.ViewGroup and ListView extends    ...')
(link :/questions/2680827/conversion-to-dalvik-format-failed-with-error-1-on-external-jar', content :In my Android application in Eclipse I get the following error.   UNEXPECTED TOP-LEVEL EXCEPTION:   java.lang.IllegalArgumentException: already added: Lorg/xmlpull/v1/XmlPullParser;   ....   ...')
(link :/questions/11461607/cant-start-eclipse-java-was-started-but-returned-exit-code-13', content :I am trying to get my first taste of Android development using Eclipse. I ran into this problem when trying to run Eclipse, having installed version 4.2 only minutes ago.  After first trying to start ...')
(link :/questions/2002288/static-way-to-get-context-in-android', u"content :Is there a way to get the current Context instance inside a static method?   I'm looking for that way because I hate saving the 'Context' instance each time it changes.")
(link :/questions/11078487/whats-toolscontext-in-android-layout-files', content :Starting with a recent new version of ADT, I\'ve noticed this new attribute on the layout XML files, for example:  <LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"     xmlns:...')
(link :/questions/10407159/how-to-manage-startactivityforresult-on-android', u"content :In my activity, I'm calling a second activity from the main activity by startActivityForResult. In my second activity there are some methods that finish this activity (maybe without result), however, ...")
(link :/questions/21814825/you-need-to-use-a-theme-appcompat-theme-or-descendant-with-this-activity', content :Android Studio 0.4.5  Android documentation for creating custom dialog boxes: http://developer.android.com/guide/topics/ui/dialogs.html  If you want a custom dialog, you can instead display an ...')
(link :/questions/3035692/how-to-convert-a-drawable-to-a-bitmap', u"content :I would like to set a certain Drawable as the device's wallpaper, but all wallpaper functions accept Bitmaps only. I cannot use WallpaperManager because I'm pre 2.1.  Also, my drawables are downloaded ...")
(link :/questions/600207/how-to-check-if-a-service-is-running-on-android', content :How do I check if a background service (on Android) is running?  I want an Android activity that toggles the state of the service -- it lets me turn it on if it is off and off if it is on.')
(link :/questions/582185/disable-landscape-mode-in-android', content :How can I disable landscape mode for some of the views in my Android app?')
(link :/questions/3875184/cant-create-handler-inside-thread-that-has-not-called-looper-prepare', content :What does the following exception mean; how can I fix it?  This is the code:  Toast toast = Toast.makeText(mContext, "Something", Toast.LENGTH_SHORT); This is the exception:  java.lang....')
(link :/questions/4893953/run-install-debug-android-applications-over-wi-fi', u"content :I thought there was a way to test your applications in development over Wi-Fi. Is this possible?  I'd love to be able to untether my phone and develop wirelessly.")
(link :/questions/2176922/how-do-i-create-a-transparent-activity-on-android', content :I want to create a transparent Activity on top of another activity.  How can I achieve this?')

已编辑

from selenium import webdriver
from bs4 import BeautifulSoup
browser = webdriver.Chrome('./chromedriver')

browser.get('https://stackoverflow.com/questions/tagged/android?sort=votes&page=1&pagesize=50')

soup = BeautifulSoup(browser.page_source, 'html.parser')

links=soup.find_all('a', class_='question-hyperlink')

for link in links:
    print('link :' + link['href'] )
    browser.get("https://stackoverflow.com" + link['href'])
    soup = BeautifulSoup(browser.page_source, 'html.parser')
    print('link :' + link['href'] , 'cotent :' + soup.find('div',class_='post-text').text.replace('\n',' ').strip())

推荐阅读