首页 > 解决方案 > EMNIST - 将手写单词拆分为字母

问题描述

因此,我在 EMNIST 平衡数据集上训练了一个准确率接近 89% 且损失率为 36% 的模型,并且似乎大多数标签都被正确预测了。所以我正在尝试上传一个手写图像并将其拆分为一组 X 字母,这些字母将被调整为 28x28 并分别预测每个字母。最好的方法是什么?

我的部分代码是:

  def predict(image):
  img = resize_image(image)
  img = img[:,:,0]
  img = img.reshape((1,28,28))
  
  prediction = model.predict(img[:])
  return class_names[np.argmax(prediction)]

def printPrediction(image):
  img = cv2.imread(image)
  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  img = cv2.bitwise_not(img)
  img = img.astype('float32')
  img /= 255
  
  # Get the size of the image
  height = img.shape[0]
  width = img.shape[1]


  prediction = ''
  foundStartingPoint = False
  foundEndingPoint = False
  threshold = 0.8
  

  for column in range(width):
    blackColorPixels = 0  
    
    for row in range(height):
      
      if check(img[row,column], threshold) and not foundStartingPoint:
        foundStartingPoint = True
        startingPoint = [0, column-2]
        
      if foundStartingPoint and not check(img[row,column], threshold):
        blackColorPixels += 1
        if blackColorPixels == height:
          foundEndingPoint = True
          endingPoint = [row, column+2]
      
      if foundStartingPoint and foundEndingPoint:
        crop_img = img[startingPoint[0]:endingPoint[0], startingPoint[1]:endingPoint[1]]
        prediction = prediction + predict(crop_img)
        foundStartingPoint = False
        foundEndingPoint = False


  print("\nPrediction of the OCR system is: ")
  print(prediction)
  print("\nPossible word from the dictionary is: ")
  printPossibleWord(prediction.lower())


def check(list, threshold):
  counter = 0 
  for x in list: 
    if x >= threshold:
      counter += 1 
  if counter == 3:
     return True
  else:
     return False

然后我正在使用字典来用存在的东西替换预测的单词

def printPossibleWord(prediction):
  
    #import dictionary (JSON file) as a list
    with open('words_dictionary.json', 'r') as f:
      words_dict = json.load(f)


    # find the closest match word with our input
    matches = get_close_matches(prediction, words_dict, n=3, cutoff=0.6)
      

    #find the match with most similar characters with the input  
    max_value = 0
    similar_character_counter = zerolistmaker(len(matches))
    
    for i in range(len(matches)):
      if len(matches[i]) != len(prediction):
          continue
      
      for j in range(len(prediction)):
        if matches[i][j] == prediction[j]:
          similar_character_counter[i] += 1
    
    max_value = max(similar_character_counter)
    max_value_list = [i for i, j in enumerate(similar_character_counter) if j == max_value]
    
    # Print the possible word from the dictionary
    for i in max_value_list:
      print(matches[i].upper())


def zerolistmaker(n):
    listofzeros = [0] * n
    return listofzeros
          

主要问题是整个单词图像被调整为 28x28 并且每个字母被缩小为模糊的图形。处理这个问题的最佳方法是什么?

标签: pythontensorflowmachine-learningconv-neural-networkocr

解决方案


推荐阅读