java - 从富文本中提取base64字符串并将它们收集到一个数组中
问题描述
我在请求中有一个富文本,它可以包含多个图像作为 base64 字符串。我需要收集所有图像及其相应的文件名。到目前为止,我已经尝试了下面的代码,并且能够为单个图像解决这个问题。如何改进以下代码或有任何有效的方法可以做到这一点。我对正则表达式没有太多想法,所以没有尝试。任何帮助表示赞赏。
在下面的示例中,我认为富文本中仅存在一张图像。
public static void main(String[] args) {
String richText = "<div class=\"se-component se-image-container __se__float-center\" contenteditable=\"false\"><figure style=\"margin: auto; width: 300px;\"><img src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABaAAAANNCAYAAABhqtmuAAAAAXNSR0IArs4c6QAAQABJREFUeAHs3QmcXtP5B/AniyAkEWuIkA0Va621BUG1CLV1sasqitr3/m21Vamllmq1VdqqorbaRYh9X0utScQWJbEFSUX+99yY6WRMZiYyZ7x53+/1ycz73vfe557zvWdmPn5z5rwdIuLy4p+NAAECBAgQIECAAAECBAgQIECAAAECBAi0qUDHNq2mGAECBAgQIECAAAECBAgQIECAAAECBAgQ+FxAAG0oECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIEngAAAAASUVORK5CYII=\" alt=\"\" data-rotate=\"0\" data-proportion=\"true\" data-align=\"center\" data-size=\"300px,300px\" data-index=\"0\" data-file-name=\"Upload Question.png\" data-file-size=\"38747\" data-origin=\"300px,300px\" style=\"width: 300px; height: 300px;\"></figure>";
String[] texts = StringUtils.substringsBetween(richText, "<img", "</figure>");
for (String td : texts) {
String fileName = StringUtils.substringBetween(td, "data-file-name=\"", "\"");
System.out.println("fileName:" + fileName); //prints fileName:Upload Question.png
String base64 = StringUtils.substringBetween(td, ",", "\"");
System.out.println(base64);// prints //iVBORw0KGgoAAAANSUhEUgAABaAAAANNCAYAAABhqtmuAAAAAXNSR0IArs4c6QAAQABJREFUeAHs3QmcXtP5B/AniyAkEWuIkA0Va621BUG1CLV1sasqitr3/m21Vamllmq1VdqqorbaRYh9X0utScQWJbEFSUX+99yY6WRMZiYyZ7x53+/1ycz73vfe557zvWdmPn5z5rwdIuLy4p+NAAECBAgQIECAAAECBAgQIECAAAECBAi0qUDHNq2mGAECBAgQIECAAAECBAgQIECAAAECBAgQ+FxAAG0oECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIEngAAAAASUVORK5CYII=
}
}
解决方案
在Javascript/(?<=base64,).+?(?=\")/gi
中可以解决您的问题
推荐阅读
- node.js - 时刻无效日期 - 字符串到时刻 - hh:mm:ss zzz mm/dd/yyyy (02:10:02 SGT 06/07/2018)
- dns - DNS 查找得到不存在的域错误
- javascript - HTML 未正确处理
- excel - 将工作表中的所有行获取到另一个工作表中,其中每行包含一个与某个值匹配的单元格
- php - 使用 Laravel 获取父子数据(树复选框)
- javascript - Javascript 开关没有给出预期的结果
- php - 显示php表
- html - 段落中的行号 CSS & HTML
- python - Python 逻辑代码:AND、NAND、OR、NOR、XOR、XNOR 和 NOT 模拟
- java - Java图像平铺处理,字节二进制图像