首页 > 解决方案 > 粘贴从 PDF 复制的文本?

问题描述

我有一个 PDF 文件,它只包含文本。

当我复制如下文本时:

payload

并尝试将其粘贴到 Notepad++ 中,得到如下框:


我尝试在 Notepad++ 中更改编码,但看不到实际文本。

这发生在一个特定的 PDF 文件中。

我已验证 PDF 的安全性允许复制:

在此处输入图像描述

我发现了另一件事,如果我在 PDF 中搜索一些文本,尽管搜索的词在文档中,但 find 命令无法找到文本。

如何粘贴从此类文件中复制的文本?

标签: encodingadobeacrobat

解决方案


Though you don't include the PDF file in your question, it's fair to assume the problem is what it always is with such files - the PDF file does not provide a correct character code to Unicode translation table.

In such files, searching doesn't work and copy / paste doesn't work either, because Acrobat doesn't know what these things on the page are supposed to mean.

The problem is usually unfixable - meaning, there isn't generally something you can do to fix this particular document so that it will start to work.

A possible work-around - and this will sound stupid - is to convert the PDF to images and perform OCR on it. The OCR algorithm, if it handles the file correctly, will normally insert proper text that can be searched or copied.


推荐阅读