首页 > 解决方案 > 二进制数据中的Spring Mongodb搜索字符串

问题描述

我正在使用 spring rest 在 mongodb 中存储文档(文本、pdf、csv、doc、docx 等)。文档被存储为二进制数据。现在我想根据里面的内容搜索文档。例如,如果用户搜索字符串“office”,他应该看到包含字符串“office”的文档列表。如何查询 mongodb 以获取二进制数据中包含的数据?

标签: mongodbspring-boot

解决方案


You could try to define a text index over your binary files. I don't know if it would work, but even if it does, such an index would match any words that are part of the file format rather than user content which is generally undesirable.

If I was implementing your requirements I would use a transformer from all of the binary documents to plain text (e.g. pandoc), thus obtaining the user content of each of the documents, then insert that content into a field which has a text index over it, then query on that field.


推荐阅读