首页 > 解决方案 > RSS 到 SQL 数据库

问题描述

我想从http://www.reddit.com/new/.rss?sort=new获取 rss 提要并将其放入 SQL 表中。

我能够将 RSS 提要输入 python(下面的代码)

我只是不知道如何从这里将其导入 SQL 数据库?

我正在开发一个 jupyter 笔记本,只需要一些帮助来启动这个项目。我还想确保一切都是 DISTINCT 而不是重复的。


    import feedparser

    a_reddit_rss_url = 'http://www.reddit.com/new/.rss?sort=new'

    feed = feedparser.parse( a_reddit_rss_url )

    if (feed['bozo'] == 1):
        print("Error Reading/Parsing Feed XML Data")    
    else:
        for item in feed[ "items" ]:
            print(item) ```

``` python

    import feedparser
    from bs4 import BeautifulSoup
    from bs4.element import Comment


    def tag_visible(element):
        if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
            return False
        if isinstance(element, Comment):
            return False
        return True

    def text_from_html(body):
        soup = BeautifulSoup(body, 'html.parser')
        texts = soup.findAll(text=True)
        visible_texts = filter(tag_visible, texts)  
        return u" ".join(t.strip() for t in visible_texts)

    # Define URL of the RSS Feed I want
    a_reddit_rss_url = 'http://www.reddit.com/new/.rss?sort=new'

    feed = feedparser.parse( a_reddit_rss_url )

    if (feed['bozo'] == 1):
        print("Error Reading/Parsing Feed XML Data")    
    else:
        for item in feed[ "items" ]:
            dttm = item[ "date" ]
            title = item[ "title" ]
            summary_text = text_from_html(item[ "summary" ])
            link = item[ "link" ]


            print("====================")
            print("Title: {} ({})\nTimestamp: {}".format(title,link,dttm))
            print("--------------------\nSummary:\n{}".format(summary_text))

带有日期、标题、摘要和链接的 SQL 表/数据库都有自己的列。

标签: pythonsqlrss

解决方案


由于没有人回复您的帖子,我将对其进行破解。要从 Pandas DF 插入 SQL Server,请尝试以下操作:

import time
import pandas as pd
import pyodbc

# create timer
start_time = time.time()
from sqlalchemy import create_engine


# df = this_is_your_pandas_dataframe
# this sample code assumes 4 fields: [Name],[Address],[Age],[Work]
# obviously, change this to suit your specific needs

conn_str = (
    r'DRIVER={SQL Server Native Client 11.0};'
    r'SERVER=Name_Of_Server;'
    r'DATABASE=Name_Of_Database;'
    r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)

cursor = cnxn.cursor()

for index,row in df.iterrows():
    cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)', 
                    row['Name'], 
                    row['Address'], 
                    row['Age'],
                    row['Work'])
    cnxn.commit()
cursor.close()
cnxn.close()

# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))

推荐阅读