python - Highly inconsistent OCR result for tesseract -
this original screenshot , cropped image 4 parts , cleared background of image extent can possibly tesseract detects last column here , ignores rest.
the output tesseract shown there blank spaces remove while processing result
femme—fatale. darklordeia achinesen1gg4 noob_diablo_
the output tesseract shown there blank spaces remove while processing result
kicked. nosnoel chikizd death_eag|e_42 chai—.
3579 10 1 7 148 2962 3 o 7 101 2214 2 2 7 99 2205 1 3 6 78
8212 7198 6307 5640 4884 15 40 40 6o 80 80
am dumping output of
result = `pytesseract.image_to_string(image.open("d:/newapproach/b&w"+str(i)+".jpg"),lang="new_language")`
but not know how proceed here consistent result.is there anyway if can force tesseract recognize text.because in trainer tesseract on default recognition scan it's not detected once select area scanned , received correctly
code
my suggestion perform ocr on full image.
i have preprocessed image grayscale image.
import cv2 image_obj = cv2.imread('1d4bb.jpg') gray = cv2.cvtcolor(image_obj, cv2.color_bgr2gray) cv2.imwrite("gray.png", gray)
i have run tesseract on image terminal , accuracy seems on 90% in case.
tesseract gray.png out 3579 10 1 7 148 3142 9 o 5 10 2962 3 o 7 101 2214 2 2 7 99 2205 1 3 6 78 score kills assists deaths connection 8212 15 1 4 4o 7198 7 3 6 40 6307 6 1 5 60 5640 2 3 6 80 4884 1 1 5
below few suggestions -
- do not use image_to_string method directly converts image bmp , saves in 72 dpi.
- if want use image_to_string override save image in 300 dpi.
- you can use run_tesseract method , read output file.
another approach problem can crop digits , deep neural network prediction.
Comments
Post a Comment