首页 > 解决方案 > PHP 无法读取媒体:内容属性

问题描述

我使用以下 PHP 代码将 RSS 提要解析为 HTML:

function get_rss_feed_as_html($feed_url, $max_item_cnt = 10, $show_date = true, $show_description = true, $max_words = 0, $cache_timeout = 7200, $cache_prefix = "/tmp/rss2html-")
    {
    $result = "";
    $rss = new DOMDocument();
    $cache_file = $cache_prefix . md5($feed_url);

    if ($cache_timeout > 0 &&
        is_file($cache_file) &&
        (filemtime($cache_file) + $cache_timeout > time())) {
            $rss->load($cache_file);
    } else {
        $rss->load($feed_url);
        if ($cache_timeout > 0) {
            $rss->save($cache_file);
        }
    }

    $feed = array();
    foreach ($rss->getElementsByTagName('entry') as $node) {
        
        $item = array (
            'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
            'desc' => $node->getElementsByTagName('content ')->item(0)->nodeValue,
            'content' => $node->getElementsByTagName('content')->item(0)->nodeValue,
            'link' => $node->getElementsByTagName('link')->item(0)->getAttribute('href'),
            'date' => $node->getElementsByTagName('updated')->item(0)->nodeValue,
            'media' => $node->getElementsByTagName('media:content')->item(0)->getAttribute('url'),
        );
        $content = $node->getElementsByTagName('encoded');
        if ($content->length > 0) {
            $item['content'] = $content->item(0)->nodeValue;
        }
        array_push($feed, $item);
    }

    if ($max_item_cnt > count($feed)) {
        $max_item_cnt = count($feed);
    }
    $result .= '<div class="bw-feedly-list">';
    for ($x=0;$x<$max_item_cnt;$x++) {
        $title = str_replace(' & ', ' &amp; ', $feed[$x]['title']);
        $link = $feed[$x]['link'];
        $result .= '<div class="bw-feedly-item-col">';
        $result .= '<a class="bw-feedly-item" href="'.$link.'" title="'.$title.'" target="_blank">';
        if ($show_date) {
            $date = date('F d, Y', strtotime($feed[$x]['date']));
            $result .= '<div class="bw-feedly-date">'.$date.'</div>';
        }
        
        $result .= '<strong class="bw-feedly-title">'.$title.'</strong>';
        
        if ($show_description) {
            $result .= '<div class="bw-feedly-row">';
            $result .= '<div class="bw-feedly-summary-col">';
            
            $description = $feed[$x]['content'];
            $content = $feed[$x]['content'];

            // no html tags
            $description = strip_tags(preg_replace('/(<(script|style)\b[^>]*>).*?(<\/\2>)/s', "$1$3", $description), '');
            // whether cut by number of words
            if ($max_words > 0) {
                $arr = explode(' ', $description);
                if ($max_words < count($arr)) {
                    $description = '';
                    $w_cnt = 0;
                    foreach($arr as $w) {
                        $description .= $w . ' ';
                        $w_cnt = $w_cnt + 1;
                        if ($w_cnt == $max_words) {
                            break;
                        }
                    }
                    $description .= " ...";
                }
            }
            
            $result .= '<div class="feed-description">' . $description . '</div>';
            
            $media = $feed[$x]['media'];
            
            // add img if it exists
            //if ($media !== '') {
                $result .= '<div class="bw-feedly-image-col">';
                $result .= '<div class="bw-feedly-image-wrap" style="background-image: url('. $media .');">';
                $result .= '<img class="bw-feedly-image" src="'. $media .'">';
                $result .= '</div></div>';
            //}
            
            $result .= '</div></div>';
        }
        $result .= '</div>';
    }
    $result .= '</a></div>';
    return $result;
}

它工作正常,除了检索正确的媒体(URL)属性:

'media' => $node->getElementsByTagName('media:content')->item(0)->getAttribute('url'),

发生以下错误:致命错误:未捕获的错误:在 null 上调用成员函数 getAttribute()

在这里我可以毫无问题地访问该属性..

'link' => $node->getElementsByTagName('link')->item(0)->getAttribute('href')

并非 XML 提要中的所有条目都有媒体元素,但任何空检查都不会改变任何事情。

我也尝试了这段代码,我想我很接近,但仍然没有成功。它打印所有条目'内容为空'..

 if($node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->length > 0){
        $image = $node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url');
    } else {
    
        echo '<p>content is null</p>';
    }

xPath 表达式也没有帮助我。

$xpath = new DOMXpath($rss);
$xpath->registerNamespace('m', 'http://search.yahoo.com/mrss/');

foreach ($xpath->evaluate('//entry') as $item) 
{
    $media = $xpath->evaluate('string(m:content/@url)', $item);
    echo '<p> MEDIA ITEM: '.$media.'</p>';
}

这里是 XML 的一部分。

    <entry>
     <id>tag:04ac51c7-b707-43cc-8a73-c482da986a27</id>
     <title type="html">Lorem Ipsum</title>
     <published>2020-09-28T19:36:26Z</published>
     <updated>2020-09-28T06:01:22Z</updated>
     <link rel="alternate" href="https://www.lipsum.com/" type="text/html"/>
     <content type="html">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. ...</content>
     <author>
     <name/>
     </author>
     <media:content medium="image" url="https://picsum.photos/200/300"/>
     <source>
     <id>tag:04ac51c7-b707-43cc-8a73-c482da986a27</id>
     <title type="html">Lorum ipsum</title>
     <link rel="alternate" type="text/html" href="https://www.lipsum.com/"/>
     <updated>2020-09-28T06:01:22Z</updated>
     </source>
    </entry>
    <entry>

这里的诀窍是什么?

标签: phprss

解决方案


它应该与 getElementsByTagNameNS 函数一起使用。

您应该能够在没有命名空间标记的情况下使用 getElementsByTagName。所以省略“媒体”。

$node->getElementsByTagName('content')->item(0)->getAttribute('url')

但是,如果您有多个包含内容的名称空间,这将发生冲突。


推荐阅读