首页 > 解决方案 > 使用 scrapy 抓取帖子(输出 xml)并导入到 workpress(wp\tool\import)

问题描述

我抓取发布和输出 xml 文件,但我无法使用 wordpress 导入工具导入文件。错误:“抱歉,出现错误。这似乎不是 WXR 文件,缺少/无效的 WXR 版本号”

[在此处输入图像描述][1] 如何修复它,谢谢 [1]:https://i.stack.imgur.com/QhEsd.png

        # -*- coding: utf-8 -*-
import scrapy
 class TenSpider(scrapy.Spider):
    name = 'ten'
    allowed_domains = ['tenhay.net']
    start_urls = ['https://tenhay.net/ten-hay-cho-con-gai']

    def parse(self, response):
        posts = response.xpath("//h2")

        for post in posts:
            title = response.xpath("//article[@class='post-6279 post type-post status-publish format-standard has-post-thumbnail category-ten-hay-cho-con-gai entry boxed flex-entry col col-xs-12 col-sm-6 col-md-4 has-image-before_title column']")
            link = post.xpath(".//@href").get()
            yield  response.follow(url=link,callback=self.post_content,meta={'titles':title})
            
       
    def post_content(self,response): 
             
            yield {
                
               'title': response.xpath("//div[@class='section-content width-auto']/h1/text()").get(),
                'image url' : response.xpath("//figure/img/@src").getall(),
                'content': response.xpath("//div[@id='ftwp-postcontent']//text()").getall()
            }\

标签: pythonwordpressscrapy

解决方案


推荐阅读