首页 > 解决方案 > Wget converting links differently on different environments

问题描述

I set up a Ruby script to download and archive some pages of a website using Wget. We have a dockerized Rails application and we use Elastic Beanstalk to manage testing environments. The issue is that I have a different outcome in my test environment than in all the other environments, even if the Wget version is the same (1.18).

This is the command

docker exec -it xxxxxxxx sh -c 'wget -H -E -p -k -N --no-cookies --header "Cookie: myCookie=123" --timeout=2000 --restrict-file-names=windows --no-check-certificate -e robots=off "http://path.to/resource"'

where

-H (--span-hosts), -E (--adjust-extension), -p (--page-requisites), -k (--convert-links), -N (--timestamping)

and this is how links to stylesheets are correctly converted into relative, after downloading the CSS file

<link rel="stylesheet" media="all" href="../../assets/ss-standard-931774a45f6c2e79b3fb8ac6ce1eca4e4a9208b3c80a1c289f36b317b830db6b.css" />

but on that specific test environment, a .html is appended

<link rel="stylesheet" media="all" href="../../assets/ss-standard-931774a45f6c2e79b3fb8ac6ce1eca4e4a9208b3c80a1c289f36b317b830db6b.css.html" />

couldn't find anything helpful so far... Any idea?

标签: wget

解决方案


推荐阅读