ruby-on-rails - Is there a best practice for fetching link previews?
问题描述
Essentially given any url, I could fetch the webpage in Ruby using
doc = Nokogiri::HTML(open(my_url))
title = doc.at('meta[property="og:title"]')['content']
...
and extract the elements I need
Is there a best practice before fetching any links? It seems like a potential security risk as well.
I'm assuminig large compaines like facebook might run an image through some model to determine if it should be censored?
解决方案
Essentially given any url, I could fetch the webpage in Ruby using
I am using metainspector to get OG data from various media URLs. It works very well and it might save you some headaches.
Is there a best practice before fetching any links? It seems like a potential security risk as well.
It depends on your application, what info you scrape and what you show to the user. If you are concerned about obscene words, you can filter them out (there might be some gems), but usually in the OG meta I didn't see any of them. You could blacklist adult website domains, or allow just some domains..
I'm assuminig large compaines like facebook might run an image through some model to determine if it should be censored?
Image recognition is a way to do it but it requires a lot of work. A lot.
推荐阅读
- javascript - 如何通过javascript在10秒后设置cookie
- c - (已解决)有人可以解释为什么 fread() 不起作用吗?
- ios - iOS - Seam 3 Core Data - 如何抑制控制台输出?
- javascript - 取消选中复选框后,jQuery 为值显示“未定义”
- semaphore - 赛普拉斯:在第一次失败时中断所有测试
- c# - 为什么拖动时的gameobject / raycast不能准确地跟随触摸位置
- ansible - 无法在 CentOS 7 主机上使用 MariaDB 10.3 中的 Ansible 数据库模块创建 root 用户
- typescript - 从索引签名创建数组
- pycharm - 无法在 PyCharm 中将目录标记为“资源”或“模板”内容根
- pytorch - 如何在 PyTorch 中从 CSV 读取数值数据?