# tessdata-ocr

**Repository Path**: linux_23/tessdata-ocr

## Basic Information

- **Project Name**: tessdata-ocr
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-04-14
- **Last Updated**: 2025-04-14

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

tessdata
========

These language data files only work with Tesseract 4.0.0 and newer versions.
They are based on the sources in
[tesseract-ocr/langdata](https://github.com/tesseract-ocr/langdata) on GitHub.
(still to be updated for 4.0.0 - 20180322)

These have models for legacy tesseract engine (--oem 0) as well as the new LSTM neural net based engine (--oem 1).

The LSTM models (--oem 1) in these files 
have been updated to the integerized versions of 
[tessdata_best](https://github.com/tesseract-ocr/tessdata_best) on GitHub.
So, they should be faster but probably a little less accurate than tessdata_best.

[tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast) on GitHub
provides an alternate set of integerized LSTM models which have been built with a smaller network.
tessdata_fast files are the ones packaged for Debian and Ubuntu.

The legacy tesseract models (--oem 0) have been removed for Indic and
Arabic script language files.

tessdata for 3.04 or 3.05
-------------------------

Get language data files for Tesseract 3.04 or 3.05 from the
[3.04 tree](https://github.com/tesseract-ocr/tessdata/tree/3.04.00).

More information and a complete list of all languages is available in the
[Tesseract wiki](https://github.com/tesseract-ocr/tesseract/wiki/Data-Files).

All data in the repository are licensed under the
Apache-2.0 License, see file [LICENSE](LICENSE).