首页 > 解决方案 > Emgu CV, Tessdata - can't load pol language

问题描述

I've downloaded EmguCV v4.2.0 and tessdata folder with languages and pasted this folder into bin folder. In tessdata folder I have many languages, eng and pol too.

In C# I have code like this:

 using (ImageParser ip = new ImageParser(@"C:\Emgu\emgucv-windesktop 4.2.0.3662\bin\tessdata", "eng"))
 {
     if (ip.OcrImage("C:\\Users\\v-user1\\Pictures\\Saved Pictures\\bied.PNG") != string.Empty)
     {
         w.AddRange(ip?.Words.ToList<string>());
     }
 }

When I set "eng" ImageParser is created correctly, but when I change to "pol" language I get error:

System.AccessViolationException: 'Attempted to read or write protected memory. This is often an indication that other memory is corrupt.'

What is the reason of this error?

标签: c#tesseractemgucv

解决方案


根据我收集到的信息,您正在尝试加载 tessdata 文件,以便 Tesseract 在尝试检测文本时可以引用它。下面的方法是我过去使用的方法,对我来说效果很好。在 Emgucv 的早期版本中,setVariable 和白名单功能不起作用,但我不确定它是否在以后的版本中得到修复。

  public static void LoadOCREngine(String dataPath)
  {
     //create OCR engine
     _ocr = new Tesseract(dataPath, "eng", OcrEngineMode.TesseractCubeCombined);
     _ocr.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ-1234567890");
  }

推荐阅读