API¶

Complete REST API reference for the OCR service. Learn how to extract text from images using Tesseract or PaddleOCR engines through simple HTTP POST requests. This API supports multipart/form-data uploads, multiple languages, and returns structured JSON with text and bounding box coordinates.

Overview¶

The OCR REST API provides two primary endpoints:

POST / - Main OCR endpoint for text extraction from images
GET /health - Health check endpoint for monitoring and load balancers

All responses are JSON formatted with detailed text extraction results including confidence scores and bounding box coordinates for each detected text region.

Endpoint: /¶

Method: POST

Description: Performs OCR on an uploaded image using the specified OCR engine. Expects a file field named image in the request, and accepts optional parameters to control the OCR engine and language.

Request:

Content-Type: multipart/form-data
Maximum file size: 10MB
Form fields:
- image (file, required): The image file to process
- engine (string, optional): OCR engine to use - ‘tesseract’ or ‘paddleocr’ (default: ‘paddleocr’)
- lang (string, optional): Language code (format depends on engine, see below)

Supported Engines:

tesseract: Fast and efficient for standard document OCR with well-aligned text
paddleocr: Better for multi-directional and rotated text (default)

Example (cURL):

# Using default engine (PaddleOCR)
curl -X POST \
     -F "image=@example.png" \
     http://localhost:5000/

# Using Tesseract engine with Japanese language
curl -X POST \
     -F "image=@example.png" \
     -F "engine=tesseract" \
     -F "lang=jpn" \
     http://localhost:5000/

# Using PaddleOCR engine with Chinese language
curl -X POST \
     -F "image=@example.png" \
     -F "engine=paddleocr" \
     -F "lang=ch" \
     http://localhost:5000/

Response:

Content-Type: application/json

Success Example (PaddleOCR):

{
  "text": "Concatenated text from all detected regions",
  "regions": [
    {
      "bbox": [[10, 20], [100, 20], [100, 40], [10, 40]],
      "text": "Detected text",
      "confidence": 0.95
    },
    {
      "bbox": [[10, 50], [120, 50], [120, 70], [10, 70]],
      "text": "Another text region",
      "confidence": 0.92
    }
  ]
}

Success Example (Tesseract):

{
  "text": "Extracted OCR text from the image.",
  "regions": [
    {
      "bbox": [[15, 25], [95, 25], [95, 42], [15, 42]],
      "text": "Extracted",
      "confidence": 0.96
    },
    {
      "bbox": [[100, 25], [140, 25], [140, 42], [100, 42]],
      "text": "OCR",
      "confidence": 0.94
    }
  ]
}

Response Fields:

text (string): All detected text (concatenated with spaces for PaddleOCR, full text for Tesseract)
regions (array): List of detected text regions (both engines provide this data)

Each region contains:
- bbox (array): Bounding box as 4 corner points [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
- text (string): The text content from this region
- confidence (float): Confidence score from 0.0 to 1.0

Error Example (missing image field):

{
  "error": {
    "image": "This field is required."
  }
}

Error Example (invalid engine):

{
  "error": {
    "engine": "Unsupported engine: invalid. Must be one of: tesseract, paddleocr"
  }
}

Error Example (file too large):

{
  "error": {
    "image": "File size exceeds 10MB limit."
  }
}

Error Example (invalid content type):

{
  "error": {
    "content_type": "Request must be multipart/form-data."
  }
}

Status Codes:

200: Success, returns extracted text and regions.
400: Bad request, missing or invalid parameters.
413: Payload too large, file exceeds 10MB limit.

Endpoint: /health¶

Method: GET

Description: Health check endpoint to verify the service is running and responsive.

Request:

No parameters required.

Example (cURL):

curl http://localhost:5000/health

Response:

Status Code: 200
Empty response body

Status Codes:

200: Service is healthy and operational.

Supported Languages¶

The language codes and available languages depend on which OCR engine you use:

Tesseract Languages

All Tesseract language and script packs installed in the container are supported. For the full list, see the Dockerfile in this repository.

Common language codes:

Code	Language
eng	English
fra	French
jpn	Japanese
deu	German
spa	Spanish
chi_sim	Chinese Simplified
chi_tra	Chinese Traditional
rus	Russian
ara	Arabic
hin	Hindi
por	Portuguese
eng+fra	English + French (multi-language)

Multi-language OCR is supported using the + operator (e.g., eng+fra for English and French).

PaddleOCR Languages

PaddleOCR supports 80+ languages. Common language codes:

Code	Language
en	English
ch	Chinese Simplified
japan	Japanese
korean	Korean
fr	French
german	German
es	Spanish
pt	Portuguese
ru	Russian
ar	Arabic
hi	Hindi

For a complete list of supported PaddleOCR languages, see the PaddleOCR documentation.

Important: Language codes differ between engines. Ensure you use the correct format for your selected engine. For example, Japanese is jpn in Tesseract but japan in PaddleOCR.

Supported Image Formats¶

All image formats supported by the Python Pillow library are accepted, including:

PNG
JPEG/JPG
BMP
TIFF
GIF
WebP

Engine Comparison¶

When to use Tesseract:

Standard document OCR with well-aligned text
Simple, clean layouts
When speed is important
When you need multi-language detection in a single pass (e.g., eng+fra)

When to use PaddleOCR:

Multi-directional text (text at various angles in the same image)
Rotated or angled text
Text in natural scenes and photographs
Complex document layouts
When you need bounding box coordinates and confidence scores

Performance Notes:

PaddleOCR is more computationally intensive than Tesseract and may take longer to process images, especially on the first request (model initialization). However, it provides significantly better accuracy for complex or rotated text.