API¶
Complete REST API reference for the OCR service. Learn how to extract text from images using Tesseract or PaddleOCR engines through simple HTTP POST requests. This API supports multipart/form-data uploads, multiple languages, and returns structured JSON with text and bounding box coordinates.
Overview¶
The OCR REST API provides two primary endpoints:
POST / - Main OCR endpoint for text extraction from images
GET /health - Health check endpoint for monitoring and load balancers
All responses are JSON formatted with detailed text extraction results including confidence scores and bounding box coordinates for each detected text region.
Endpoint: /¶
Method: POST
Description: Performs OCR on an uploaded image using the specified OCR engine. Expects a file field named image in the request, and accepts optional parameters to control the OCR engine and language.
Request:
Content-Type: multipart/form-data
Maximum file size: 10MB
Form fields:
image(file, required): The image file to processengine(string, optional): OCR engine to use - ‘tesseract’ or ‘paddleocr’ (default: ‘paddleocr’)lang(string, optional): Language code (format depends on engine, see below)
Supported Engines:
tesseract: Fast and efficient for standard document OCR with well-aligned text
paddleocr: Better for multi-directional and rotated text (default)
Example (cURL):
# Using default engine (PaddleOCR)
curl -X POST \
-F "image=@example.png" \
http://localhost:5000/
# Using Tesseract engine with Japanese language
curl -X POST \
-F "image=@example.png" \
-F "engine=tesseract" \
-F "lang=jpn" \
http://localhost:5000/
# Using PaddleOCR engine with Chinese language
curl -X POST \
-F "image=@example.png" \
-F "engine=paddleocr" \
-F "lang=ch" \
http://localhost:5000/
Response:
Content-Type: application/json
Success Example (PaddleOCR):
{
"text": "Concatenated text from all detected regions",
"regions": [
{
"bbox": [[10, 20], [100, 20], [100, 40], [10, 40]],
"text": "Detected text",
"confidence": 0.95
},
{
"bbox": [[10, 50], [120, 50], [120, 70], [10, 70]],
"text": "Another text region",
"confidence": 0.92
}
]
}
Success Example (Tesseract):
{
"text": "Extracted OCR text from the image.",
"regions": [
{
"bbox": [[15, 25], [95, 25], [95, 42], [15, 42]],
"text": "Extracted",
"confidence": 0.96
},
{
"bbox": [[100, 25], [140, 25], [140, 42], [100, 42]],
"text": "OCR",
"confidence": 0.94
}
]
}
Response Fields:
text(string): All detected text (concatenated with spaces for PaddleOCR, full text for Tesseract)regions(array): List of detected text regions (both engines provide this data)Each region contains:
bbox(array): Bounding box as 4 corner points [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]text(string): The text content from this regionconfidence(float): Confidence score from 0.0 to 1.0
Error Example (missing image field):
{
"error": {
"image": "This field is required."
}
}
Error Example (invalid engine):
{
"error": {
"engine": "Unsupported engine: invalid. Must be one of: tesseract, paddleocr"
}
}
Error Example (file too large):
{
"error": {
"image": "File size exceeds 10MB limit."
}
}
Error Example (invalid content type):
{
"error": {
"content_type": "Request must be multipart/form-data."
}
}
Status Codes:
200: Success, returns extracted text and regions.
400: Bad request, missing or invalid parameters.
413: Payload too large, file exceeds 10MB limit.
Endpoint: /health¶
Method: GET
Description: Health check endpoint to verify the service is running and responsive.
Request:
No parameters required.
Example (cURL):
curl http://localhost:5000/health
Response:
Status Code: 200
Empty response body
Status Codes:
200: Service is healthy and operational.
Supported Languages¶
The language codes and available languages depend on which OCR engine you use:
Tesseract Languages
All Tesseract language and script packs installed in the container are supported. For the full list, see the Dockerfile in this repository.
Common language codes:
Code |
Language |
|---|---|
eng |
English |
fra |
French |
jpn |
Japanese |
deu |
German |
spa |
Spanish |
chi_sim |
Chinese Simplified |
chi_tra |
Chinese Traditional |
rus |
Russian |
ara |
Arabic |
hin |
Hindi |
por |
Portuguese |
eng+fra |
English + French (multi-language) |
Multi-language OCR is supported using the + operator (e.g., eng+fra for English and French).
PaddleOCR Languages
PaddleOCR supports 80+ languages. Common language codes:
Code |
Language |
|---|---|
en |
English |
ch |
Chinese Simplified |
japan |
Japanese |
korean |
Korean |
fr |
French |
german |
German |
es |
Spanish |
pt |
Portuguese |
ru |
Russian |
ar |
Arabic |
hi |
Hindi |
For a complete list of supported PaddleOCR languages, see the PaddleOCR documentation.
Important: Language codes differ between engines. Ensure you use the correct
format for your selected engine. For example, Japanese is jpn in Tesseract but
japan in PaddleOCR.
Supported Image Formats¶
All image formats supported by the Python Pillow library are accepted, including:
PNG
JPEG/JPG
BMP
TIFF
GIF
WebP
Engine Comparison¶
When to use Tesseract:
Standard document OCR with well-aligned text
Simple, clean layouts
When speed is important
When you need multi-language detection in a single pass (e.g.,
eng+fra)
When to use PaddleOCR:
Multi-directional text (text at various angles in the same image)
Rotated or angled text
Text in natural scenes and photographs
Complex document layouts
When you need bounding box coordinates and confidence scores
Performance Notes:
PaddleOCR is more computationally intensive than Tesseract and may take longer to process images, especially on the first request (model initialization). However, it provides significantly better accuracy for complex or rotated text.