encoding - 粘贴从 PDF 复制的文本?
问题描述
我有一个 PDF 文件,它只包含文本。
当我复制如下文本时:
payload
并尝试将其粘贴到 Notepad++ 中,得到如下框:
我尝试在 Notepad++ 中更改编码,但看不到实际文本。
这发生在一个特定的 PDF 文件中。
我已验证 PDF 的安全性允许复制:
我发现了另一件事,如果我在 PDF 中搜索一些文本,尽管搜索的词在文档中,但 find 命令无法找到文本。
如何粘贴从此类文件中复制的文本?
解决方案
Though you don't include the PDF file in your question, it's fair to assume the problem is what it always is with such files - the PDF file does not provide a correct character code to Unicode translation table.
In such files, searching doesn't work and copy / paste doesn't work either, because Acrobat doesn't know what these things on the page are supposed to mean.
The problem is usually unfixable - meaning, there isn't generally something you can do to fix this particular document so that it will start to work.
A possible work-around - and this will sound stupid - is to convert the PDF to images and perform OCR on it. The OCR algorithm, if it handles the file correctly, will normally insert proper text that can be searched or copied.
推荐阅读
- java - 为什么 SharedPreference 在 RecyclerView 中不起作用
- tensorflow - 使用 JointDistributionSequential 的 Tensorflow 概率分层模型
- python - 使用 python 从网站上获取 json 字符串
- javascript - 我试图让我的盒子的边框指向屏幕的一侧。有什么建议吗?
- javascript - 将 PV(蛋白质查看器)与 Vue.js 集成
- javascript - 将字符串转换为数组,去掉 + 字符
- python - Python - 对象问题的VTK限制旋转(四元数)
- python - 石头、剪刀、纸用 Python
- arrays - Symfony - ChoiceType 表单 - 数组到字符串的转换错误
- html - 站点右侧的空白区域。仅在 chrome 手机上