首页 > 解决方案 > URL 编码显示为 404 页面

问题描述

新来的。我管理着几个 Wordpress 站点,但 Screaming Frog SEO Spider 显示 UTF-8 编码的 URL,仅针对该站点给出 404 错误。网站本身按预期工作,浏览器中的 URL 正常工作,但几乎网站上的每个链接都在爬虫中“加倍”,正确的链接显示 200 状态,然后是重复的 404:

地址:
https://xxxxx.com/locations/<%=%20_.escape(url)%20%>

URL编码地址:
https://xxxxx.com/locations/%3C%25=%20_.escape(url)%20%25%3E

截图:https ://i.imgur.com/p7vPVTe.jpg

该站点运行的是 PHP 5.6,所以我认为它可能是一个过时的 PHP 问题,所以我们升级到 PHP 7.3,但问题仍然存在。

来自 FTP/文件管理器的错误日志没有显示我可以看到的任何问题。在 WordPress 5.2.2、mysqli 5.7.26/mysqlnd 5.0.12 中使用自定义结构永久链接。

.htaceess:(隐藏域/管理员信息)

text/x-generic .htaccess ( UTF-8 Unicode text, with very long lines )
# BEGIN WpFastestCache
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTPS} =on
RewriteCond %{HTTP_HOST} ^xxxxxxx.com
# Start WPFC Exclude
# End WPFC Exclude
# Start_WPFC_Exclude_Admin_Cookie
RewriteCond %{HTTP:Cookie} !wordpress_logged_in_[^\=]+\=admin|xxxxxxxxxxxxxx
RewriteCond %{HTTP:Cookie} !wordpress_logged_in_[^\=]+\=xxxxxxxxxxxxx
# End_WPFC_Exclude_Admin_Cookie
RewriteCond %{HTTP_HOST} ^xxxxxxxxxxxx.com
RewriteCond %{HTTP_USER_AGENT} !(facebookexternalhit|Twitterbot|LinkedInBot|WhatsApp|Mediatoolkitbot)
RewriteCond %{HTTP_USER_AGENT} !(WP\sFastest\sCache\sPreload(\siPhone\sMobile)?\s*Bot)
RewriteCond %{REQUEST_METHOD} !POST
RewriteCond %{REQUEST_URI} !(\/){2}$
RewriteCond %{REQUEST_URI} \/$
RewriteCond %{QUERY_STRING} !.+
RewriteCond %{HTTP:Cookie} !wordpress_logged_in
RewriteCond %{HTTP:Cookie} !comment_author_
RewriteCond %{HTTP:Cookie} !safirmobilswitcher=mobil
RewriteCond %{HTTP:Profile} !^[a-z0-9\"]+ [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*(\bCrMo\b|CriOS|Android.*Chrome\/[.0-9]*\s(Mobile)?|\bDolfin\b|Opera.*Mini|Opera.*Mobi|Android.*Opera|Mobile.*OPR\/[0-9.]+|Coast\/[0-9.]+|Skyfire|Mobile\sSafari\/[.0-9]*\sEdge|IEMobile|MSIEMobile|fennec|firefox.*maemo|(Mobile|Tablet).*Firefox|Firefox.*Mobile|FxiOS|bolt|teashark|Blazer|Version.*Mobile.*Safari|Safari.*Mobile|MobileSafari|Tizen|UC.*Browser|UCWEB|baiduboxapp|baidubrowser|DiigoBrowser|Puffin|\bMercury\b|Obigo|NF-Browser|NokiaBrowser|OviBrowser|OneBrowser|TwonkyBeamBrowser|SEMC.*Browser|FlyFlow|Minimo|NetFront|Novarra-Vision|MQQBrowser|MicroMessenger|Android.*PaleMoon|Mobile.*PaleMoon|Android|blackberry|\bBB10\b|rim\stablet\sos|PalmOS|avantgo|blazer|elaine|hiptop|palm|plucker|xiino|Symbian|SymbOS|Series60|Series40|SYB-[0-9]+|\bS60\b|Windows\sCE.*(PPC|Smartphone|Mobile|[0-9]{3}x[0-9]{3})|Window\sMobile|Windows\sPhone\s[0-9.]+|WCE;|Windows\sPhone\s10.0|Windows\sPhone\s8.1|Windows\sPhone\s8.0|Windows\sPhone\sOS|XBLWP7|ZuneWP7|Windows\sNT\s6\.[23]\;\sARM\;|\biPhone.*Mobile|\biPod|\biPad|Apple-iPhone7C2|MeeGo|Maemo|J2ME\/|\bMIDP\b|\bCLDC\b|webOS|hpwOS|\bBada\b|BREW).*$ [NC]
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/all/$1/index.html -f [or]
RewriteCond /home/xxxxxxxx/public_html/wp-content/cache/all/$1/index.html -f
RewriteRule ^(.*) "/wp-content/cache/all/$1/index.html" [L]
</IfModule>
<FilesMatch "index\.(html|htm)$">
AddDefaultCharset UTF-8
<ifModule mod_headers.c>
FileETag None
Header unset ETag
Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires "Mon, 29 Oct 1923 20:30:00 GMT"
</ifModule>
</FilesMatch>
# END WpFastestCache
# BEGIN GzipWpFastestCache
<IfModule mod_deflate.c>
AddType x-font/woff .woff
AddType x-font/ttf .ttf
AddOutputFilterByType DEFLATE image/svg+xml
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/javascript
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
AddOutputFilterByType DEFLATE application/x-font-ttf
AddOutputFilterByType DEFLATE x-font/ttf
AddOutputFilterByType DEFLATE application/vnd.ms-fontobject
AddOutputFilterByType DEFLATE font/opentype font/ttf font/eot font/otf
</IfModule>
# END GzipWpFastestCache
# BEGIN LBCWpFastestCache
<FilesMatch "\.(webm|ogg|mp4|ico|pdf|flv|jpg|jpeg|png|gif|webp|js|css|swf|x-html|css|xml|js|woff|woff2|otf|ttf|svg|eot)(\.gz)?$">
<IfModule mod_expires.c>
AddType application/font-woff2 .woff2
AddType application/x-font-opentype .otf
ExpiresActive On
ExpiresDefault A0
ExpiresByType video/webm A10368000
ExpiresByType video/ogg A10368000
ExpiresByType video/mp4 A10368000
ExpiresByType image/webp A10368000
ExpiresByType image/gif A10368000
ExpiresByType image/png A10368000
ExpiresByType image/jpg A10368000
ExpiresByType image/jpeg A10368000
ExpiresByType image/ico A10368000
ExpiresByType image/svg+xml A10368000
ExpiresByType text/css A10368000
ExpiresByType text/javascript A10368000
ExpiresByType application/javascript A10368000
ExpiresByType application/x-javascript A10368000
ExpiresByType application/font-woff2 A10368000
ExpiresByType application/x-font-opentype A10368000
ExpiresByType application/x-font-truetype A10368000
</IfModule>
<IfModule mod_headers.c>
Header set Expires "max-age=A10368000, public"
Header unset ETag
Header set Connection keep-alive
FileETag None
</IfModule>
</FilesMatch>
# END LBCWpFastestCache

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress
# php -- BEGIN cPanel-generated handler, do not edit
# Set the “ea-php70” package as the default “PHP” programming language.
<IfModule mime_module>
  AddType application/x-httpd-ea-php70 .php .php7 .phtml
</IfModule>
# php -- END cPanel-generated handler, do not edit

# BEGIN cPanel-generated php ini directives, do not edit
# Manual editing of this file may result in unexpected behavior.
# To make changes to this file, use the cPanel MultiPHP INI Editor (Home >> Software >> MultiPHP INI Editor)
# For more information, read our documentation (https://go.cpanel.net/EA4ModifyINI)
<IfModule php7_module>
   php_flag display_errors Off
   php_value max_execution_time 300
   php_value max_input_time 60
   php_value max_input_vars 1500
   php_value memory_limit 512M
   php_value post_max_size 32M
   php_value session.gc_maxlifetime 1440
   php_value session.save_path "/var/cpanel/php/sessions/ea-php56"
   php_value upload_max_filesize 32M
   php_flag zlib.output_compression Off
   php_flag log_errors On
   php_value error_log "/home/xxxxxxxxx/public_html/phpupgradelog.log"
</IfModule>
<IfModule lsapi_module>
   php_flag display_errors Off
   php_value max_execution_time 300
   php_value max_input_time 60
   php_value max_input_vars 1500
   php_value memory_limit 512M
   php_value post_max_size 32M
   php_value session.gc_maxlifetime 1440
   php_value session.save_path "/var/cpanel/php/sessions/ea-php56"
   php_value upload_max_filesize 32M
   php_flag zlib.output_compression Off
   php_flag log_errors On
   php_value error_log "/home/xxxxxxxx/public_html/phpupgradelog.log"
</IfModule>
# END cPanel-generated php ini directives, do not edit

预期输出:保留 Screaming Frog SEO 中显示的 Status 200 链接,并删除显示的 404 URL 编码链接。

标签: urlurlencode

解决方案


原来这是爬虫工具盲目地拉入 href 脚本标签并将其计为 URL 的问题。他们正在最终解决问题。

解决了!

<script id='wpvpm-menu-item' type='text/html'><li>
    <% if(children.length > 0) { %>
        <a href="#" class="has-children <%= _.escape(classes.join(' ')) %>" title="<%= _.escape(attr_title) %>"><%= title %></a>
        <div class="mp-level">
            <div class="mp-level-header">
                <h2><%= title %></h2>
                <a class="mp-back" href="#"><%= WpvPushMenu.back %></a>
            </div>
            <ul>
                <% if(! (/^\s*$/.test(url)) ) { %>
                    <li><a href="<%= _.escape(url) %>" class="<%= _.escape(classes.join(' ')) %>" title="<%= _.escape(attr_title) %>"><%= title %></a></li>
                <% } %>
                <%= content %>
            </ul>
        </div>
    <% } else { %>
        <a href="<%= _.escape(url) %>" class="<%= _.escape(classes.join(' ')) %>" title="<%= _.escape(attr_title) %>"><%= title %></a>
    <% } %>
</li></script>

推荐阅读