API

Complete REST API reference for the OCR service. Learn how to extract text from images using Tesseract or PaddleOCR engines through simple HTTP POST requests. This API supports multipart/form-data uploads, multiple languages, and returns structured JSON with text and bounding box coordinates.

Overview

The OCR REST API provides two primary endpoints:

  • POST / - Main OCR endpoint for text extraction from images

  • GET /health - Health check endpoint for monitoring and load balancers

All responses are JSON formatted with detailed text extraction results including confidence scores and bounding box coordinates for each detected text region.

Endpoint: /

Method: POST

Description: Performs OCR on an uploaded image using the specified OCR engine. Expects a file field named image in the request, and accepts optional parameters to control the OCR engine and language.

Request:

  • Content-Type: multipart/form-data

  • Maximum file size: 10MB

  • Form fields:

    • image (file, required): The image file to process

    • engine (string, optional): OCR engine to use - ‘tesseract’ or ‘paddleocr’ (default: ‘paddleocr’)

    • lang (string, optional): Language code (format depends on engine, see below)

Supported Engines:

  • tesseract: Fast and efficient for standard document OCR with well-aligned text

  • paddleocr: Better for multi-directional and rotated text (default)

Example (cURL):

# Using default engine (PaddleOCR)
curl -X POST \
     -F "image=@example.png" \
     http://localhost:5000/

# Using Tesseract engine with Japanese language
curl -X POST \
     -F "image=@example.png" \
     -F "engine=tesseract" \
     -F "lang=jpn" \
     http://localhost:5000/

# Using PaddleOCR engine with Chinese language
curl -X POST \
     -F "image=@example.png" \
     -F "engine=paddleocr" \
     -F "lang=ch" \
     http://localhost:5000/

Response:

  • Content-Type: application/json

Success Example (PaddleOCR):

{
  "text": "Concatenated text from all detected regions",
  "regions": [
    {
      "bbox": [[10, 20], [100, 20], [100, 40], [10, 40]],
      "text": "Detected text",
      "confidence": 0.95
    },
    {
      "bbox": [[10, 50], [120, 50], [120, 70], [10, 70]],
      "text": "Another text region",
      "confidence": 0.92
    }
  ]
}

Success Example (Tesseract):

{
  "text": "Extracted OCR text from the image.",
  "regions": [
    {
      "bbox": [[15, 25], [95, 25], [95, 42], [15, 42]],
      "text": "Extracted",
      "confidence": 0.96
    },
    {
      "bbox": [[100, 25], [140, 25], [140, 42], [100, 42]],
      "text": "OCR",
      "confidence": 0.94
    }
  ]
}

Response Fields:

  • text (string): All detected text (concatenated with spaces for PaddleOCR, full text for Tesseract)

  • regions (array): List of detected text regions (both engines provide this data)

    Each region contains:

    • bbox (array): Bounding box as 4 corner points [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]

    • text (string): The text content from this region

    • confidence (float): Confidence score from 0.0 to 1.0

Error Example (missing image field):

{
  "error": {
    "image": "This field is required."
  }
}

Error Example (invalid engine):

{
  "error": {
    "engine": "Unsupported engine: invalid. Must be one of: tesseract, paddleocr"
  }
}

Error Example (file too large):

{
  "error": {
    "image": "File size exceeds 10MB limit."
  }
}

Error Example (invalid content type):

{
  "error": {
    "content_type": "Request must be multipart/form-data."
  }
}

Status Codes:

  • 200: Success, returns extracted text and regions.

  • 400: Bad request, missing or invalid parameters.

  • 413: Payload too large, file exceeds 10MB limit.

Endpoint: /health

Method: GET

Description: Health check endpoint to verify the service is running and responsive.

Request:

No parameters required.

Example (cURL):

curl http://localhost:5000/health

Response:

  • Status Code: 200

  • Empty response body

Status Codes:

  • 200: Service is healthy and operational.

Supported Languages

The language codes and available languages depend on which OCR engine you use:

Tesseract Languages

All Tesseract language and script packs installed in the container are supported. For the full list, see the Dockerfile in this repository.

Common language codes:

Code

Language

eng

English

fra

French

jpn

Japanese

deu

German

spa

Spanish

chi_sim

Chinese Simplified

chi_tra

Chinese Traditional

rus

Russian

ara

Arabic

hin

Hindi

por

Portuguese

eng+fra

English + French (multi-language)

Multi-language OCR is supported using the + operator (e.g., eng+fra for English and French).

PaddleOCR Languages

PaddleOCR supports 80+ languages. Common language codes:

Code

Language

en

English

ch

Chinese Simplified

japan

Japanese

korean

Korean

fr

French

german

German

es

Spanish

pt

Portuguese

ru

Russian

ar

Arabic

hi

Hindi

For a complete list of supported PaddleOCR languages, see the PaddleOCR documentation.

Important: Language codes differ between engines. Ensure you use the correct format for your selected engine. For example, Japanese is jpn in Tesseract but japan in PaddleOCR.

Supported Image Formats

All image formats supported by the Python Pillow library are accepted, including:

  • PNG

  • JPEG/JPG

  • BMP

  • TIFF

  • GIF

  • WebP

Engine Comparison

When to use Tesseract:

  • Standard document OCR with well-aligned text

  • Simple, clean layouts

  • When speed is important

  • When you need multi-language detection in a single pass (e.g., eng+fra)

When to use PaddleOCR:

  • Multi-directional text (text at various angles in the same image)

  • Rotated or angled text

  • Text in natural scenes and photographs

  • Complex document layouts

  • When you need bounding box coordinates and confidence scores

Performance Notes:

PaddleOCR is more computationally intensive than Tesseract and may take longer to process images, especially on the first request (model initialization). However, it provides significantly better accuracy for complex or rotated text.