Clarifai Guide
Clarifai Home
v6.9
v6.9
  • Welcome
  • Getting Started
    • Quick Start
    • Applications
      • Create an Application
      • Base Workflows
      • Application Settings
      • Collaboration
    • Authentication
      • App-Specific API Keys
      • Personal Access Tokens
      • Scopes
      • Authorize
      • SSO
      • 2FA
    • Glossary
  • API Guide
    • API overview
      • API Clients
      • Status Codes
      • Pagination
      • Patching
    • Data
      • Supported Formats
      • Create, Get, Update, Delete
      • Collectors
        • Collectors
    • Concepts
      • Create, Get, Update
      • Languages
      • Search by Concept
      • Knowledge Graph
    • Annotate
      • Annotations
      • Training Data
      • Positive and Negative Annotations
      • Tasks
      • Task Annotations
    • Model
      • Clarifai Models
      • Model Types
      • Create, Get, Update, Delete
      • Deep Training
      • Evaluate
        • Interpreting Evaluations
        • Improving Your Model
    • Workflows
      • Create, Get, Update, Delete
      • Input Nodes
      • Workflow Predict
    • Predict
      • Images
      • Video
      • Prediction Parameters
      • Multilingual Classification
    • Search
      • Index Images for Search
      • Search
        • Combine or Negate
        • Filter
        • Rank
      • Legacy Search
        • Combine or Negate
        • Filter
        • Rank
        • Saved Searches
    • Walkthroughs
      • Custom Models
      • Custom Text Model
      • Custom KNN Face Classifier Workflow
      • Batch Predict CSV on Custom Text Model
      • Auto Annotation
      • Visual Text Recognition
  • Portal Guide
    • Portal Overview
    • Data
      • Supported Formats
      • CSV and TSV
      • Collectors
        • Collectors
    • Concepts
      • Create, Get, Update, Delete
      • Knowledge Graph
      • Languages
    • Labeler
      • Create a Task
      • Label Types
      • Labeling Tools
      • Workforce Management
      • Training Data
      • Positive and Negative Annotations
    • Model
      • Clarifai Models
      • Model Types
      • Deep Training
      • Evaluate
        • Interpreting Evaluations
        • Improving Your Model
    • Workflows
      • Input Nodes
    • Predict
    • Search
      • Rank
      • Filter
      • Combine or Negate
      • Saved Searches
      • Visual Search
    • Walkthroughs
      • Custom Models
      • Auto Annotation
      • Text Classification
      • Visual Text Recognition
  • Data Labeling Services
    • Data Labeling Services
  • Product Updates
    • Upcoming API Changes
    • Changelog
      • Release 6.9
      • Release 6.8
      • Release 6.7
      • Release 6.6
      • Release 6.5
      • Release 6.4
      • Release 6.3
      • Release 6.2
      • Release 6.1
      • Release 6.0
      • Release 5.11
      • Release 5.10
Powered by GitBook
On this page
  • How VTR works
  • Building a VTR workflow

Was this helpful?

  1. API Guide
  2. Walkthroughs

Visual Text Recognition

PreviousAuto AnnotationNextPortal Overview

Last updated 4 years ago

Was this helpful?

Visual text recognition helps you convert printed text in images and videos into machine-encoded text. You can input a scanned document, a photo of a document, a scene-photo (such as the text on signs and billboards), or text superimposed on an image (such as in a meme) and output the words and individual characters present in the images. VTR lets you "digitize" text so that it can be edited, searched, stored, displayed and analyzed.

Please note: The current version of our VTR model is not designed for use with handwritten text, or documents with tightly-packed text (like you might see on the page of a novel, for example).

How VTR works

VTR works by first detecting the location of text in your photos or video frames, then cropping the region where the text is present, and then finally running a specialized classification model that will extract text from the cropped image. To accomplish these different tasks, you will need to configure a workflow. You will then add these three models to your workflow:

  • Visual Text Detection

  • 1.0 Cropper

  • Visual Text Recognition

Building a VTR workflow

# Insert here the initialization code as outlined on this page:
# https://docs.clarifai.com/api-guide/api-overview/api-clients#client-installation-instructions

post_workflows_response = stub.PostWorkflows(
    service_pb2.PostWorkflowsRequest(
        user_app_id=resources_pb2.UserAppIDSet(
            app_id="cdd79189eb6f44049b6c5b58f14a87e6"
        ),
        workflows=[
            resources_pb2.Workflow(
                id="visual-text-recognition-id",
                nodes=[
                    resources_pb2.WorkflowNode(
                        id="detect-concept",
                        model=resources_pb2.Model(
                            id="2419e2eae04d04f820e5cf3aba42d0c7",
                            model_version=resources_pb2.ModelVersion(
                                id="75a5b92a0dec436a891b5ad224ac9170"
                            )
                        )
                    ),
                    resources_pb2.WorkflowNode(
                        id="image-crop",
                        model=resources_pb2.Model(
                            id="ce3f5832af7a4e56ae310d696cbbefd8",
                            model_version=resources_pb2.ModelVersion(
                                id="a78efb13f7774433aa2fd4864f41f0e6"
                                )
                            ),
                            node_inputs=[
                                resources_pb2.NodeInput(node_id="detect-concept")
                            ]
                        ),
                    resources_pb2.WorkflowNode(
                        id="image-to-text",
                        model=resources_pb2.Model(
                            id="9fe78b4150a52794f86f237770141b33",
                            model_version=resources_pb2.ModelVersion(
                                id="d94413e582f341f68884cac72dbd2c7b"
                                )
                            ),
                            node_inputs=[
                                resources_pb2.NodeInput(node_id="image-crop")
                            ]
                        ),
                ]
            )
        ]
    ),
    metadata=metadata
)

if post_workflows_response.status.code != status_code_pb2.SUCCESS:
    raise Exception("Post workflows failed, status: " + post_workflows_response.status.description)
import com.clarifai.grpc.api.*;
import com.clarifai.grpc.api.status.*;

// Insert here the initialization code as outlined on this page:
// https://docs.clarifai.com/api-guide/api-overview/api-clients#client-installation-instructions

MultiWorkflowResponse postWorkflowsResponse = stub.postWorkflows(
  PostWorkflowsRequest.newBuilder()
      .setUserAppId(UserAppIDSet.newBuilder().setAppId("{YOUR_APP_ID}"))
      .addWorkflows(
          Workflow.newBuilder()
              .setId("visual-text-recognition-id")
              .addNodes(
                  WorkflowNode.newBuilder()
                      .setId("detect-concept")
                      .setModel(
                          Model.newBuilder()
                              .setId("2419e2eae04d04f820e5cf3aba42d0c7")
                              .setModelVersion(
                                  ModelVersion.newBuilder()
                                      .setId("75a5b92a0dec436a891b5ad224ac9170")
                              )
                      )
              )
              .addNodes(
                  WorkflowNode.newBuilder()
                      .setId("image-crop")
                      .setModel(
                          Model.newBuilder()
                              .setId("ce3f5832af7a4e56ae310d696cbbefd8")
                              .setModelVersion(
                                  ModelVersion.newBuilder()
                                      .setId("a78efb13f7774433aa2fd4864f41f0e6")
                              )
                      )
                      .addNodeInputs(NodeInput.newBuilder().setNodeId("detect-concept"))
              )
              .addNodes(
                  WorkflowNode.newBuilder()
                      .setId("image-to-text")
                      .setModel(
                          Model.newBuilder()
                              .setId("9fe78b4150a52794f86f237770141b33")
                              .setModelVersion(
                                  ModelVersion.newBuilder()
                                      .setId("d94413e582f341f68884cac72dbd2c7b")
                              )
                      )
                      .addNodeInputs(NodeInput.newBuilder().setNodeId("image-crop"))
              )
      )
      .build()
);

if (postWorkflowsResponse.getStatus().getCode() != StatusCode.SUCCESS) {
    throw new RuntimeException("Post workflows failed, status: " + postWorkflowsResponse.getStatus());
}
// Insert here the initialization code as outlined on this page:
// https://docs.clarifai.com/api-guide/api-overview/api-clients#client-installation-instructions

stub.PostWorkflows(
    {
        user_app_id: {
            app_id: "e83440590d104cee97ef84af1856837d"
        },
        workflows: [
            {
                id: "visual-text-recognition-id",
                nodes: [
                    {
                        id: "detect-concept",
                        model: {
                            id: "2419e2eae04d04f820e5cf3aba42d0c7",
                            model_version: {
                                id: "75a5b92a0dec436a891b5ad224ac9170"
                            }
                        }
                    },
                    {
                        id: "image-crop",
                        model: {
                            id: "ce3f5832af7a4e56ae310d696cbbefd8",
                            model_version: {
                                id: "a78efb13f7774433aa2fd4864f41f0e6"
                            }
                        },
                        node_inputs: [
                            {node_id: "detect-concept"}
                        ]
                    },
                    {
                        id: "image-to-text",
                        model: {
                            id: "9fe78b4150a52794f86f237770141b33",
                            model_version: {
                                id: "d94413e582f341f68884cac72dbd2c7b"
                            }
                        },
                        node_inputs: [
                            {node_id: "image-crop"}
                        ]
                    },
                ]
            }
        ]
    },
    metadata,
    (err, response) => {
        if (err) {
            throw new Error(err);
        }

        if (response.status.code !== 10000) {
            console.log(response.status);
            throw new Error("Post workflows failed, status: " + response.status.description);
        }
    }
);
curl -X POST 'https://api.clarifai.com/v2/users/me/apps/{{app}}/workflows' \
    -H 'Authorization: Key {{PAT}}' \
    -H 'Content-Type: application/json' \
    --data-raw '{
        "workflows": [
            {
                "id": "visual-text-recognition-id",
                "nodes": [
                    {
                        "id": "detect-concept",
                        "model": {
                            "id": "2419e2eae04d04f820e5cf3aba42d0c7",
                            "model_version": {
                                "id": "75a5b92a0dec436a891b5ad224ac9170"
                            }
                        }
                    },
                    {
                        "id": "image-crop",
                        "model": {
                            "id": "ce3f5832af7a4e56ae310d696cbbefd8",
                            "model_version": {
                                "id": "a78efb13f7774433aa2fd4864f41f0e6"
                            }
                        },
                        "node_inputs": [
                            {
                                "node_id": "general-concept"
                            }
                        ]
                    },
                    {
                        "id": "image-to-text",
                        "model": {
                            "id": "9fe78b4150a52794f86f237770141b33",
                            "model_version": {
                                "id": "d94413e582f341f68884cac72dbd2c7b"
                            }
                        },
                        "node_inputs": [
                            {
                                "node_id": "image-crop"
                            }
                        ]
                    },
                ]
            }
        ]
    }'