jTessBoxEditor

A Java box editor for Tesseract OCR data that is capable of reading common picture formats and provides support for Tesseract 2.0x and 3.0x.

  • jTessBoxEditor
  • Version :1.7.3 / 2.0 Beta
  • License :Apache License 2.0
  • OS :Windows All
  • Publisher :Quan Nguyen

Download Now

Download Now(Beta)

jTessBoxEditor Description

jTessBoxEditor is an application that was created in order to provide users with a companion to the Tesseract OCR software package. It will provide the means to edit Box data resulted from versions 2.0x and 3.0x of Tesseract and it will allow them to perform a full automation of the text recognition training for Tesseract.

Requiring Java as its main prerequisite, jTessBoxEditor will read most of the common image formats, such as TIFF, JPEG, GIF, PNG, BMP and multi-page TIFF. Furthermore, in addition to the aforementioned image formats, the application also supports PDF format files.

Users will be able to run the application without an actual installation process, by opening the JAR executable provided in the deployment archive. The editor will work with TIFF or Box file formats and the training for Tesseract OCR will require an image quality of at least 300 DPI.

The application will provide a basic set of hotkeys in its Box View window and people can use these in order to increase their editing speed. They will be able to move their box up-down / left-right, increase its width / height, toggle the previous / next box or edit their characters.

Upon completion of the text generating process, for each inputted UTF-8 text files, the editor will yield a corresponding TIFF / Box pair of files that users can input when training Tesseract OCR for text recognition. In order to eliminate bounding box overlapping issues, users can adjust the letter tracking or character spacing in the image generator.

System requirements

  • Java 7.0 or later

Leave a Reply

Your email address will not be published.