首页 > 解决方案 > 交叉引用列表中属于特定偏差Vision OCR c#的多个值

问题描述

在 c# xamarin 中使用 Vision OCR。我发现 API 返回的文本应该位于不同区域的 1 个区域中。这会导致意外行为和数据处理不正确。

为了解决这个问题,我需要从线条上的边界框中提取 Y 坐标,保存随附的数据。将两者都添加到列表中。与所有其他条目交叉引用每个列表条目。当两个 Y 坐标的偏差在 10 以内时,它们需要合并。

我在代码结构的下面添加了一个成绩单。我尝试使用元组、字典、结构等。但找不到交叉引用其中任何一个值的解决方案。

有没有人有正确方向的指针?

更新; 我使用结合元组比较器的递归二进制搜索取得了良好的进展。如果/当它工作时,我会发布代码。

    class Playground
    {
        public void meh()
        {
            //these are the boundingboxes of different regions when the returned value from the OCR api is parsed
            // int[] arr=new int[]{ Left X, Top Y, Width, Height};
            int[] arrA =new int[] { 178, 1141, 393, 91 };//item 3 xenos
            int[] arrB =new int[] { 171, 1296, 216, 53 };//totaal 3items
            int[] arrC =new int[] { 1183, 1134, 105, 51};//item 3 prijs
            int[] arrD =new int[] { 1192, 1287, 107, 52 };//totaal prijs

            //the strings as will be made available within the lines, as words
            string strA = "item 3";
            string strB = "totaal:";
            string strC = "2,99";
            string strD = "8,97";

            //make list to hold our fake linedata
            List<int[]> ourLines = new List<int[]>();
            ourLines.Add(arrA); ourLines.Add(arrB); ourLines.Add(arrC); ourLines.Add(arrD);


            //following structure is observed
            for(int region = 0; region < 3; region++){
                //3 regions for each region process lines
                foreach(int[] lineData in ourLines)
                {
                    //get Y coordinates from boundingbox which is: lineData[1]
                    //keep int in memory and link* words in the corresponding line to it.
                    //put int and its words in array outside region loop.
                    //repeat this a couple of 100 times

                    for (int words = 0; words < 180; words++)
                    {
                        //do stuff with words
                    }
                }
            }

            //here i need a list with the Y coordinates (for example 1141, 1296,1134, 1287)
            //cross reference all Y coordinates with eachother
            //when they fall withing a deviation of 10 with another
            //then build string with combined text
            //search in text for words resembing 'total' and check if there is an approperiate monetary value.

            //the above would link the values of arrays A + C and B + D. 
            //which causes the corresponding results to be; 'item 3 2.99' and 'totaal 8.97'
            //currently the arrays A+B are returned in region one and C+D in region two. This also varies from image to image.



            //necessary because vision OCR api sometimes decides that lists of productname + price do belong in 2 different regions
            //same for totals, and this causes incorrect responses when reading the data. (like thinking the amount of products == the price
            //so if you bought 3 items the ocr will think the total price is 3$. instead of 8.97 (when each item is 2.99)

            //note, values are monetary and culture independant. so this can mean the values can be in xx.xx or xx,xx
            //* with link i mean either make a tuple/list/struct/keyvaluepair/dictionary/or preferably something more approperiate.
            // this code will be executed on android and iOS devices.. something lightweight is preferred.
        }
    }

标签: c#xamarin.formscomputer-visionvision

解决方案


推荐阅读