php - 将爬虫重定向到 NGINX 中的内部微服务
问题描述
我正在运行一个使用 create-react-app 构建的客户端渲染 React 应用程序,我需要使用它来处理 OpenGraph 元标记。我编写了一些 PHP(基于此https://rck.ms/angular-handlebars-open-graph-facebook-share/),旨在根据 JSON 文件的内容为特定页面提供 OpenGraph 元标记. 我需要做的是从 NGINX 内部将来自爬虫用户代理的请求传递到这个 PHP 页面。
server {
server_name example.com www.example.com;
root /var/www/example;
index index.html;
listen 80;
location @crawler {
fastcgi_pass unix:/run/php/php7.0-fpm.sock;
fastcgi_index crawler.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
location / {
if ($http_user_agent ~* "linkedinbot|googlebot|yahoo|bingbot|baiduspider|yandex|yeti|yodaobot|gigabot|ia_archiver|facebookexternalhit|twitterbot|developers\.google\.com") {
proxy_pass @crawler;
}
try_files $uri /index.html;
}
}
这导致 NGINX 失败并出现以下错误:
May 10 00:01:59 ip-172-31-14-46 nginx[10400]: nginx: [emerg] invalid URL prefix in /etc/nginx/sites-enabled/example.com:23
May 10 00:01:59 ip-172-31-14-46 systemd[1]: nginx.service: Control process exited, code=exited status=1
May 10 00:01:59 ip-172-31-14-46 systemd[1]: Reload failed for A high performance web server and a reverse proxy server.
供参考 - 这是 PHP 文件的内容:
<?php
// 1. get the content Id (here: an Integer) and sanitize it properly
$uri = $_SERVER[REQUEST_URI];
$hash = hash('md5', $uri);
// 2. get the content from a flat file (or API, or Database, or ...)
$contents = file_get_contents("./meta/". $hash . ".json");
$data = array();
if ($contents) {
$data = json_decode($contents);
}
$data = array_merge(json_decode(file_get_contents("./meta/default.json")), $data);
// 3. return the page
return makePage($data);
function makePage($data) {
// 1. get the page
$pageUrl = "https://example.com" . $uri;
// 2. generate the HTML with open graph tags
$html = '<!doctype html>'.PHP_EOL;
$html .= '<html>'.PHP_EOL;
$html .= '<head>'.PHP_EOL;
$html .= '<title>'.$data->title.'</title>'.PHP_EOL;
$html .= '<meta property="og:title" content="'.$data->title.'"/>'.PHP_EOL;
$html .= '<meta property="og:description" content="'.$data->description.'"/>'.PHP_EOL;
$html .= '<meta property="og:image" content="'.$data->poster.'"/>'.PHP_EOL;
$html .= '<meta http-equiv="refresh" content="0;url='.$pageUrl.'">'.PHP_EOL;
$html .= '</head>'.PHP_EOL;
$html .= '<body></body>'.PHP_EOL;
$html .= '</html>';
// 3. return the page
echo $html;
}
解决方案
从错误来看,您似乎缺少传递给的地址上的 URL 前缀proxy_pass
,也许应该是:fastcgi_pass http://unix:/run/php/php7.0-fpm.sock;
有关同一问题,请参阅此问答:Nginx 无效的 URL 前缀
推荐阅读
- oop - 构建器模式和 LSP
- wordpress - WordPress 仪表板被锁定
- javascript - Visual Studio 中的“JavaScript (V3) 异常”是什么?
- python - 将列表值附加到数组
- c# - C# 如何将 DataTable 添加到 TabPage 控件
- hash - 为什么 crypto.subtle.digest 返回一个空对象
- c# - Is it possible to add a service from a controller or manager class other than the ConfigureService method
- java - Gradle 构建失败并出现 Java IOException
- single-page-application - 如何像这个一样快速加载网站?
- wordpress - WordPress, Nginx | wp-admin 返回 405 但我可以访问网站