.htaccess - 使用 .htaccess 覆盖现有的“noindex，nofollow”X-Robots-Tag 标头？

问题描述

我正在尝试设置X-Robots-Tag为允许 Googlebot 为我的网站编制索引。我没有文件，也没有与我的任何 html 文件robots.txt相关的任何元标记。X-Robots-TagApache 服务器正在返回一个X-Robots-Tag设置为的标头"noindex, nofollow"。如何通过编辑.htaccess文件取消设置此标签？

这是我在使用 Chrome 插件“机器人排除检查器”时得到的：

X-Robots 状态被封锁 noindex,nofollow。

Date: Thu, 23 Jul 2020 20:27:46 GMT
Content-Type: text/html
Content-Length: 1272
Connection: keep-alive
Keep-Alive: timeout=30
Server: Apache/2
X-Robots-Tag: noindex, nofollow
Last-Modified: Fri, 09 Mar 2018 19:26:43 GMT
ETag: "ae0-xxxxxxxxxx-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Cache-Control: max-age=3600
Expires: Thu, 23 Jul 2020 21:27:46 GMT

我的.htaccess文件内容：

# compress text, html, javascript, css, xml:
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript

# Or, compress certain file types by extension:
<files *.html>
SetOutputFilter DEFLATE
</files>

Header onsuccess unset X-Robots-Tag
Header always set X-Robots-Tag "index,follow"

我尝试将其添加到.htaccess文件的底部：

<files *.html>
Header set X-Robots-Tag "index,follow"
</files>

然后我从 Chrome 扩展中得到这个响应：

X-Robots BLOCKED noindex,nofollow,index,follow.

（注意它在下面的列表中出现了两次。）

Date: Thu, 23 Jul 2020 20:39:42 GMT
Content-Type: text/html
Content-Length: 1272
Connection: keep-alive
Keep-Alive: timeout=30
Server: Apache/2
X-Robots-Tag: noindex, nofollow
Last-Modified: Fri, 09 Mar 2018 19:26:43 GMT
ETag: "ae0-xxxxxxxxxxxxx-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Cache-Control: max-age=3600
Expires: Thu, 23 Jul 2020 21:39:42 GMT
X-Robots-Tag: index,follow

有没有办法删除原始X-Robots-tag标题并用新标题替换它？我试过Header unset X-Robots-Tag了，但不行（仍然显示“BLOCKED noindex，nofollow”）。

解决方案：对我有用的是包含一个 robots.txt 文件并确保所有超链接都以斜杠结尾。似乎没有斜杠，我得到了一个 301 重定向，其中包括有问题的 noindex,nofollow 标头。

标签： .htaccessx-robots-tag

解决方案

我的 index.html 页面非常非常简单，只有正文内的超链接指向网站的其他部分。
该网站托管在...

如评论中所述，您应该首先真正确定设置此标头的源，而不是尝试覆盖（或取消设置）它。这不是 Apache 默认做的事情，这个头文件必须在某处显式设置。

如果您没有设置此标头（在您的服务器端脚本或.htaccess文件系统路径上的任何文件中 - 甚至在文档根目录之上），则必须在 vHost/server 配置中设置它。如果您无权访问服务器配置，那么您应该联系您的虚拟主机以查看问题所在。

<files *.html>
Header set X-Robots-Tag "index,follow"
</files>

这通常会“工作”，除非该标头先前已在always响应标头表中设置。在这种情况下，您也需要这样做。例如：

Header always set X-Robots-Tag "index,follow"

您不需要<Files>包装器 - 除非您特别想针对仅映射到*.html文件的请求？我想在每个请求（例如图像和其他静态资源）上都设置了“noindex，nofollow”标头。

但是，您不需要显式设置“index,follow” - 因为这是搜索引擎执行的默认行为，无论是否设置了标头。因此，在这种情况下，您只需要取消设置标题（正如您也建议的那样），但同样，您需要使用always标题表（如果这是开始设置标题的表）。例如：

Header always unset X-Robots-Tag

“always”表的命名可能有点误导，因为上面看起来（对于不经意的读者）标题可能总是未设置（而不是有时） - 但事实并非如此。有两个单独的响应头组/表：“always”和“onsuccess”（默认）。两者是互斥的。不同之处在于始终应用“始终”组 - 即使在错误和内部重写/子请求上也是如此。默认组不是。

参考：
https ://httpd.apache.org/docs/2.4/mod/mod_headers.html#header

.htaccess - 使用 .htaccess 覆盖现有的“noindex，nofollow”X-Robots-Tag 标头？

问题描述

解决方案

推荐阅读