Optical Character Recognition (OCR) is a technology that makes curved handwritten or printed text to become machine-readable. However, the technology struggled to gain hands on such text when the characters were not parallel with the horizontal planes. It is due to this reason that TextTubes came into the scene by Amazon.
TextTubes is basically a detector that allows the AI model to understand the text on images into a tube around its middle axis. Researchers have called it a state-of-the-art technology considering what it has to offer.
Researchers explained that the whole process can be broken down into two tasks: Text detection and text recognition.
In text detection, contextual clues are used to localize characters, words and lines; whereas, text recognition is used to transcribe their content.
Do not assume that the task is simple, single deformation, arbitrary fonts or viewpoint changes can make it a lot difficult.
Where traditional approaches used quadrilaterals and overlap-and-noise-prone rectangles to capture the text – the Amazon is using a “tube” shape to capture all the variability in the content. It captures all the text of similar size pretty easily. This method is not like others. It is formulated as a mathematical function that basically trains machine learning scene text detectors.
The researchers conducted a test on curved text benchmarks CTW-1500 to check the performance of TextTubes. The data set involved over 1500 images from natural scenes with curved texts along with 1,255 training images and 300 test images. About 83.65% accuracy was achieved over CTW-1500 as compared to other methods.
From the results, it is evident that TextTubes can heavily benefit businesses that are relying upon OCR. The OCR market will be worth a total of $13.38 billion in the future, OCR solutions will be the first thing people would want to use for their businesses.
Read next: Technology Boom: What to Expect in the Next Decade
TextTubes is basically a detector that allows the AI model to understand the text on images into a tube around its middle axis. Researchers have called it a state-of-the-art technology considering what it has to offer.
Researchers explained that the whole process can be broken down into two tasks: Text detection and text recognition.
In text detection, contextual clues are used to localize characters, words and lines; whereas, text recognition is used to transcribe their content.
Do not assume that the task is simple, single deformation, arbitrary fonts or viewpoint changes can make it a lot difficult.
Where traditional approaches used quadrilaterals and overlap-and-noise-prone rectangles to capture the text – the Amazon is using a “tube” shape to capture all the variability in the content. It captures all the text of similar size pretty easily. This method is not like others. It is formulated as a mathematical function that basically trains machine learning scene text detectors.
The researchers conducted a test on curved text benchmarks CTW-1500 to check the performance of TextTubes. The data set involved over 1500 images from natural scenes with curved texts along with 1,255 training images and 300 test images. About 83.65% accuracy was achieved over CTW-1500 as compared to other methods.
From the results, it is evident that TextTubes can heavily benefit businesses that are relying upon OCR. The OCR market will be worth a total of $13.38 billion in the future, OCR solutions will be the first thing people would want to use for their businesses.
Read next: Technology Boom: What to Expect in the Next Decade