Deployment
==========

Deploy the OCR REST API service using Docker for production environments. This guide covers Docker deployment, docker-compose orchestration, image variant selection, and security best practices for running Tesseract and PaddleOCR text recognition services.

The OCR service is available as pre-built Docker images on Docker Hub, optimized for different use cases and deployment scenarios.

**Docker Hub Repository**: https://hub.docker.com/r/gunthercox/ocr-service

Image Variants
--------------

The service provides three Docker image variants optimized for different use cases:

**Combined (default)** - Both engines available
  Use when you need flexibility to switch between OCR engines per request.

  - Tags: ``latest``, ``1.0.0``, ``1.0``
  - Python version: 3.12
  - Includes: Tesseract + PaddleOCR
  - Size: Largest (includes 140+ Tesseract language packs)
  - Available engines: Both ``tesseract`` and ``paddleocr``

**Tesseract only** - Document OCR
  Optimized for standard document OCR workloads.

  - Tags: ``latest-tesseract``, ``1.0.0-tesseract``, ``1.0-tesseract``
  - Python version: 3.12
  - Includes: Tesseract with all language packs
  - Size: Medium
  - Available engines: Only ``tesseract``

**PaddleOCR only** - Modern OCR with latest Python
  Best for rotated/multi-directional text or size-constrained environments.

  - Tags: ``latest-paddleocr``, ``1.0.0-paddleocr``, ``1.0-paddleocr``
  - Python version: 3.13
  - Includes: PaddleOCR only
  - Size: Smallest
  - Available engines: Only ``paddleocr``

**Important**: Engine-specific images will return a 400 error if you request an unavailable engine. For example:

.. code-block:: bash

   # This will fail with Tesseract-only image
   curl -X POST -F "image=@photo.png" -F "engine=paddleocr" http://localhost:5000/

The error response will indicate which engines are available in the deployed image variant.

Deployment using docker-compose
-------------------------------

For production environments, it is recommended to use the official Docker image from Docker Hub. Below is an example `docker-compose.yml` for deploying the OCR service in production:

**Combined variant** (default, both engines):

.. code-block:: yaml
  :caption: docker-compose.yml

   services:
     ocr-service:
       image: gunthercox/ocr-service:latest
       ports:
         - "5000:5000"
       restart: unless-stopped

**Tesseract-only variant** (smaller, document OCR):

.. code-block:: yaml
  :caption: docker-compose.yml

   services:
     ocr-service:
       image: gunthercox/ocr-service:latest-tesseract
       ports:
         - "5000:5000"
       restart: unless-stopped

**PaddleOCR-only variant** (smallest, Python 3.13):

.. code-block:: yaml
  :caption: docker-compose.yml

   services:
     ocr-service:
       image: gunthercox/ocr-service:latest-paddleocr
       ports:
         - "5000:5000"
       restart: unless-stopped

You can also pin to a specific version for more predictable deployments:

.. code-block:: yaml
  :caption: docker-compose.yml

   services:
     ocr-service:
       image: gunthercox/ocr-service:1.1.2-paddleocr
       ports:
         - "5000:5000"
       restart: unless-stopped

To run the service using docker-compose, run the following command in the directory containing your `docker-compose.yml` file:

  .. code-block:: bash

     docker compose up -d

- The service will be available at `http://localhost:5000/` by default.

Security Considerations
-----------------------

- For production, consider using a reverse proxy (e.g., Nginx) in front of the service for SSL termination and additional security.
- Monitor and update the image regularly to receive security and feature updates.
- Validate and sanitize all uploaded files to prevent malicious input.
- Limit accepted file types to images only.
- Consider rate limiting and authentication for production deployments.

**Additional Notes:**

- Review the `Docker Hub page <https://hub.docker.com/r/gunthercox/ocr-service>`_ for available tags.