ocr-service documentation

OCR Service is a production-ready REST API for optical character recognition (OCR) that supports both Tesseract and PaddleOCR engines. Extract text from images with high accuracy using a simple HTTP API. Deploy easily with Docker for document processing, invoice scanning, multi-language text recognition, and automated data extraction workflows.

Key Features

  • Dual OCR Engines: Choose between Tesseract (fast document OCR) and PaddleOCR (multi-directional text)

  • REST API: Simple multipart/form-data POST requests with JSON responses

  • 140+ Languages: Comprehensive language support via Tesseract language packs

  • Docker Ready: Pre-built images on Docker Hub with variant options

  • Production Grade: Health checks, error handling, and security best practices included

Quick Start

Get started with the OCR API in seconds using Docker:

# Pull and run the default image (both engines)
docker run -p 5000:5000 gunthercox/ocr-service:latest

# Extract text from an image
curl -X POST -F "image=@document.png" http://localhost:5000/

Documentation Contents

Frequently Asked Questions

Which OCR engine should I use?

Use Tesseract for well-aligned document OCR (invoices, forms, scanned pages). Use PaddleOCR for rotated text, multi-directional layouts, or challenging image orientations.

How many languages are supported?

Over 140 languages are supported through Tesseract. PaddleOCR supports 80+ languages including Chinese, Japanese, Korean, and many Latin-script languages.

Can I use this for commercial projects?

Yes, the service is open source. Check the LICENSE for details on both the service and underlying OCR engines.

What image formats are supported?

PNG, JPEG, WebP, TIFF, BMP, and most common image formats are supported by both OCR engines.