首页 > 解决方案 > 在 AWS EC2 上执行爬虫后如何删除日志或数据?

问题描述

[我做了什么]

  1. 我在 EC2 ubuntu 终端上做了 git clone 这个 repo。 https://github.com/crawlab-team/crawlab

(如果我运行 EC2,它会显示抓取仪表板)

  1. 我在 EC2 上设置了 docker 和 python env

  2. 我在 crawlab 仪表板上上传了我的爬虫代码。当我在 crawlab 仪表板上运行我的爬虫时。它可以从其他网站收集数据并保存在 mongodb 中。

[问题]

爬虫运行后,类似 log(?) 或 data(?) 药丸 EC2 空间。最后,当它使用 100% 空间时,爬虫不再工作。我必须增加 EC2 空间才能使用 EBS 卷。

但我想知道如何在不受我控制的情况下自动删除“日志(?)或数据(?)”。

当我在 mongodb 上检查收集的数据时,总数据大小小于 1MB。但我认为在 EC2 上它会超过 100MB。我每次爬虫后都会上传EC2存储条件(12次和19次)

我执行了爬虫 12 次。我写了命令df -Th

Filesystem     Type      Size  Used Avail Use% Mounted on
udev           devtmpfs  3.9G     0  3.9G   0% /dev
tmpfs          tmpfs     796M  1.1M  795M   1% /run
/dev/xvda1     ext4       25G  6.7G   18G  28% /
tmpfs          tmpfs     3.9G     0  3.9G   0% /dev/shm
tmpfs          tmpfs     5.0M     0  5.0M   0% /run/lock
tmpfs          tmpfs     3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/loop1     squashfs   33M   33M     0 100% /snap/snapd/11107
/dev/loop0     squashfs   56M   56M     0 100% /snap/core18/1988
/dev/loop3     squashfs   34M   34M     0 100% /snap/amazon-ssm-agent/3552
overlay        overlay    25G  6.7G   18G  28% /var/lib/docker/overlay2/e7965025d305ee6d51b55dee17fe547b70b51ebea9750d22e1ff55337d54b3ea/merged
overlay        overlay    25G  6.7G   18G  28% /var/lib/docker/overlay2/414d298f3057de6461ca2d5b9ceeb290e35215500cbfdaa3d000ad7afda46359/merged
/dev/loop4     squashfs   33M   33M     0 100% /snap/snapd/11402
tmpfs          tmpfs     796M     0  796M   0% /run/user/1000
overlay        overlay    25G  6.7G   18G  28% /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c/merged
overlay        overlay    25G  6.7G   18G  28% /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e/merged

我执行了爬虫 19 次。我写了命令df -Th

Filesystem     Type      Size  Used Avail Use% Mounted on
udev           devtmpfs  3.9G     0  3.9G   0% /dev
tmpfs          tmpfs     796M  1.1M  795M   1% /run
/dev/xvda1     ext4       25G  8.0G   17G  33% /
tmpfs          tmpfs     3.9G     0  3.9G   0% /dev/shm
tmpfs          tmpfs     5.0M     0  5.0M   0% /run/lock
tmpfs          tmpfs     3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/loop1     squashfs   33M   33M     0 100% /snap/snapd/11107
/dev/loop0     squashfs   56M   56M     0 100% /snap/core18/1988
/dev/loop3     squashfs   34M   34M     0 100% /snap/amazon-ssm-agent/3552
overlay        overlay    25G  8.0G   17G  33% /var/lib/docker/overlay2/e7965025d305ee6d51b55dee17fe547b70b51ebea9750d22e1ff55337d54b3ea/merged
overlay        overlay    25G  8.0G   17G  33% /var/lib/docker/overlay2/414d298f3057de6461ca2d5b9ceeb290e35215500cbfdaa3d000ad7afda46359/merged
/dev/loop4     squashfs   33M   33M     0 100% /snap/snapd/11402
tmpfs          tmpfs     796M     0  796M   0% /run/user/1000
overlay        overlay    25G  8.0G   17G  33% /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c/merged
overlay        overlay    25G  8.0G   17G  33% /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e/merged

我执行了爬虫 12 次。我写了命令du -h --max-depth=10 /var/lib | sort -h

...~~~
310M    /var/lib/docker/volumes/b8a9c7ac497041f99b629fdb14e9dfc283f4108d73512d3f87a79e9a5af3d0fd
310M    /var/lib/docker/volumes/b8a9c7ac497041f99b629fdb14e9dfc283f4108d73512d3f87a79e9a5af3d0fd/_data


578M    /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c/diff/tmp
578M    /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c/merged/tmp
740M    /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e/diff/tmp
740M    /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e/merged/tmp
1.1G    /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c/diff
1.3G    /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e/diff
1.8G    /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c/merged
1.9G    /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e/merged
2.9G    /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c
3.2G    /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e
7.7G    /var/lib/docker/overlay2
8.1G    /var/lib/docker
8.4G    /var/lib

我执行了爬虫 19 次。我写了命令du -h --max-depth=10 /var/lib | sort -h

...~~~
318M    /var/lib/docker/volumes/b8a9c7ac497041f99b629fdb14e9dfc283f4108d73512d3f87a79e9a5af3d0fd
318M    /var/lib/docker/volumes/b8a9c7ac497041f99b629fdb14e9dfc283f4108d73512d3f87a79e9a5af3d0fd/_data

789M    /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e/merged/usr
789M    /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c/merged/usr
1.2G    /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c/diff/tmp
1.2G    /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c/merged/tmp
1.5G    /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e/diff/tmp
1.5G    /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e/merged/tmp
1.7G    /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c/diff
2.0G    /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e/diff
2.4G    /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c/merged
2.7G    /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e/merged
4.1G    /var/lib/docker/overlay2/b03949ee74691b6ef3f1471c37ebf58d5cd6947b8e139430b4f1c0776c06016c
4.6G    /var/lib/docker/overlay2/02159a834b6c0c51d81686f024aab575da916d001d91a7027d2c3868f7b2696e
11G     /var/lib/docker
11G     /var/lib/docker/overlay2
12G     /var/lib


标签: dockeramazon-ec2

解决方案


推荐阅读