Clarifai Guide
Clarifai Home
v7.11
v7.11
  • Welcome
  • Clarifai Basics
    • Start Here (5 mins or less!)
      • Your First AI Predictions (~1 min)
      • Your First Visual Search App (~1 min)
      • Your First Custom Model (~5 mins)
    • Key Terminology to Know
    • Applications
      • Create an Application
      • Application Settings
      • Collaboration
    • Authentication
      • App-Specific API Keys
      • Authorize
      • Personal Access Tokens
      • Scopes
      • 2FA
      • Roll-Based Access Control
    • Clarifai Community Quick Start
  • API Guide
    • Clarifai API Basics
      • Clarifai API Clients
        • gRPC vs HTTP Channels
      • Helpful API Resources
        • Using Postman with Clarifai APIs
    • Your Data
      • Datasets
        • Dataset Basics
        • Dataset Filters
        • Dataset Inputs
        • Dataset Versions
      • Supported Formats
      • Adding and Removing Data
      • Collectors
    • Making Predictions
      • Images
      • Video
      • Text
      • Prediction Parameters
      • Multilingual Classification
    • Creating and Managing Concepts
      • Create, Get, Update
      • Languages
      • Search by Concept
      • Knowledge Graph
    • Labeling Your Data
      • Annotations
      • Training Data
      • Positive and Negative Annotations
      • Tasks
      • Task Annotations
    • Creating and Training Models
      • Clarifai Models
      • Model Types
      • Custom Models
      • Custom Text Model
      • Create, Get, Update, Delete
      • Deep Training
    • Evaluating Models
      • Interpreting Evaluations
      • Improving Your Model
    • Creating Workflows
      • Base Workflows
      • Input Nodes
      • Setting Up Mesh Workflows
      • Common Workflows
        • Workflow Predict
        • Auto Annotation
        • Custom KNN Face Classifier Workflow
        • Visual Text Recognition
    • Search, Sort, Filter and Save
      • Search Overview
      • Combine or Negate
      • Filter
      • Rank
      • Index Images for Search
      • Legacy Search
        • Combine or Negate
        • Filter
        • Rank
        • Saved Searches
    • Advanced Topics
      • Status Codes
      • Patching
      • Pagination
      • Batch Predict CSV on Custom Text Model
      • Document Processing
  • Portal Guide
    • Clarifai Portal Basics
    • Your Data
      • Supported Formats
      • Exploring Your Data
        • Predictions
        • Annotations
        • Bulk Labeling
        • Proposals
        • Object Tracking
      • Collectors
    • Making Predictions
    • Creating and Managing Concepts
      • Create, Get, Update, Delete
      • Knowledge Graph
      • Languages
    • Labeling Your Data
      • Create a Task
      • Label Types
      • Labeling Tools
      • AI Assist
      • Workforce Management
      • Review
      • Training Data
      • Positive and Negative Annotations
    • Creating and Training Models
      • Training Basics
      • Clarifai Models
      • Custom Models
      • Model Types
      • Deep Training
    • Evaluating Models
      • Interpreting Evaluations
      • Improving Your Model
    • Creating Workflows
      • Input Nodes
      • Workflow Builder
      • Base Workflows
      • Setting Up a Workflow
      • Common Workflows
        • Auto Annotation
        • Visual Text Recognition
        • Text Classification
    • Search, Sort, Filter and Save
      • Rank
      • Filter
      • Combine or Negate
      • Saved Searches
      • Visual Search
      • Text Search
    • Advanced Topics
      • Importing Data with CSV and TSV Files
  • Data Labeling Services
    • Scribe LabelForce
  • Product Updates
    • Upcoming API Changes
    • Changelog
      • Release 8.1
      • Release 8.0
      • Release 7.11
      • Release 7.10
      • Release 7.9
      • Release 7.8
      • Release 7.7
      • Release 7.6
      • Release 7.5
      • Release 7.4
      • Release 7.3
      • Release 7.2
      • Release 7.1
      • Release 7.0
      • Release 6.11
      • Release 6.10
      • Release 6.9
      • Release 6.8
      • Release 6.7
      • Release 6.6
      • Release 6.5
      • Release 6.4
      • Release 6.3
      • Release 6.2
      • Release 6.1
      • Release 6.0
      • Release 5.11
      • Release 5.10
  • Additional Resources
    • API Status
    • Clarifai Blog
    • Clarifai Help
    • Clarifai Community
Powered by GitBook
On this page
  • How VTR works
  • Building a VTR workflow

Was this helpful?

Edit on GitHub
  1. API Guide
  2. Creating Workflows
  3. Common Workflows

Visual Text Recognition

Work with text in images, just like you work with encoded text.

PreviousCustom KNN Face Classifier WorkflowNextSearch, Sort, Filter and Save

Last updated 3 years ago

Was this helpful?

Visual text recognition helps you convert printed text in images and videos into machine-encoded text. You can input a scanned document, a photo of a document, a scene-photo (such as the text on signs and billboards), or text superimposed on an image (such as in a meme) and output the words and individual characters present in the images. VTR lets you "digitize" text so that it can be edited, searched, stored, displayed and analyzed.

Please note: The current version of our VTR model is not designed for use with handwritten text, or documents with tightly-packed text (like you might see on the page of a novel, for example).

How VTR works

VTR works by first detecting the location of text in your photos or video frames, then cropping the region where the text is present, and then finally running a specialized classification model that will extract text from the cropped image. To accomplish these different tasks, you will need to configure a workflow. You will then add these three models to your workflow:

  • Visual Text Detection

  • 1.0 Cropper

  • Visual Text Recognition

Building a VTR workflow

# Insert here the initialization code as outlined on this page:
# https://docs.clarifai.com/api-guide/api-overview/api-clients#client-installation-instructions

post_workflows_response = stub.PostWorkflows(
    service_pb2.PostWorkflowsRequest(
        user_app_id=userDataObject,  # The userDataObject is created in the overview and is required when using a PAT
        workflows=[
            resources_pb2.Workflow(
                id="visual-text-recognition-id",
                nodes=[
                    resources_pb2.WorkflowNode(
                        id="detect-concept",
                        model=resources_pb2.Model(
                            id="2419e2eae04d04f820e5cf3aba42d0c7",
                            model_version=resources_pb2.ModelVersion(
                                id="75a5b92a0dec436a891b5ad224ac9170"
                            )
                        )
                    ),
                    resources_pb2.WorkflowNode(
                        id="image-crop",
                        model=resources_pb2.Model(
                            id="ce3f5832af7a4e56ae310d696cbbefd8",
                            model_version=resources_pb2.ModelVersion(
                                id="a78efb13f7774433aa2fd4864f41f0e6"
                                )
                            ),
                            node_inputs=[
                                resources_pb2.NodeInput(node_id="detect-concept")
                            ]
                        ),
                    resources_pb2.WorkflowNode(
                        id="image-to-text",
                        model=resources_pb2.Model(
                            id="9fe78b4150a52794f86f237770141b33",
                            model_version=resources_pb2.ModelVersion(
                                id="d94413e582f341f68884cac72dbd2c7b"
                                )
                            ),
                            node_inputs=[
                                resources_pb2.NodeInput(node_id="image-crop")
                            ]
                        ),
                ]
            )
        ]
    ),
    metadata=metadata
)

if post_workflows_response.status.code != status_code_pb2.SUCCESS:
    raise Exception("Post workflows failed, status: " + post_workflows_response.status.description)
import com.clarifai.grpc.api.*;
import com.clarifai.grpc.api.status.*;

// Insert here the initialization code as outlined on this page:
// https://docs.clarifai.com/api-guide/api-overview/api-clients#client-installation-instructions

MultiWorkflowResponse postWorkflowsResponse = stub.postWorkflows(
  PostWorkflowsRequest.newBuilder()
      .setUserAppId(UserAppIDSet.newBuilder().setAppId("{YOUR_APP_ID}"))
      .addWorkflows(
          Workflow.newBuilder()
              .setId("visual-text-recognition-id")
              .addNodes(
                  WorkflowNode.newBuilder()
                      .setId("detect-concept")
                      .setModel(
                          Model.newBuilder()
                              .setId("2419e2eae04d04f820e5cf3aba42d0c7")
                              .setModelVersion(
                                  ModelVersion.newBuilder()
                                      .setId("75a5b92a0dec436a891b5ad224ac9170")
                              )
                      )
              )
              .addNodes(
                  WorkflowNode.newBuilder()
                      .setId("image-crop")
                      .setModel(
                          Model.newBuilder()
                              .setId("ce3f5832af7a4e56ae310d696cbbefd8")
                              .setModelVersion(
                                  ModelVersion.newBuilder()
                                      .setId("a78efb13f7774433aa2fd4864f41f0e6")
                              )
                      )
                      .addNodeInputs(NodeInput.newBuilder().setNodeId("detect-concept"))
              )
              .addNodes(
                  WorkflowNode.newBuilder()
                      .setId("image-to-text")
                      .setModel(
                          Model.newBuilder()
                              .setId("9fe78b4150a52794f86f237770141b33")
                              .setModelVersion(
                                  ModelVersion.newBuilder()
                                      .setId("d94413e582f341f68884cac72dbd2c7b")
                              )
                      )
                      .addNodeInputs(NodeInput.newBuilder().setNodeId("image-crop"))
              )
      )
      .build()
);

if (postWorkflowsResponse.getStatus().getCode() != StatusCode.SUCCESS) {
    throw new RuntimeException("Post workflows failed, status: " + postWorkflowsResponse.getStatus());
}
// Insert here the initialization code as outlined on this page:
// https://docs.clarifai.com/api-guide/api-overview/api-clients#client-installation-instructions

stub.PostWorkflows(
    {
        user_app_id: {
            app_id: "e83440590d104cee97ef84af1856837d"
        },
        workflows: [
            {
                id: "visual-text-recognition-id",
                nodes: [
                    {
                        id: "detect-concept",
                        model: {
                            id: "2419e2eae04d04f820e5cf3aba42d0c7",
                            model_version: {
                                id: "75a5b92a0dec436a891b5ad224ac9170"
                            }
                        }
                    },
                    {
                        id: "image-crop",
                        model: {
                            id: "ce3f5832af7a4e56ae310d696cbbefd8",
                            model_version: {
                                id: "a78efb13f7774433aa2fd4864f41f0e6"
                            }
                        },
                        node_inputs: [
                            {node_id: "detect-concept"}
                        ]
                    },
                    {
                        id: "image-to-text",
                        model: {
                            id: "9fe78b4150a52794f86f237770141b33",
                            model_version: {
                                id: "d94413e582f341f68884cac72dbd2c7b"
                            }
                        },
                        node_inputs: [
                            {node_id: "image-crop"}
                        ]
                    },
                ]
            }
        ]
    },
    metadata,
    (err, response) => {
        if (err) {
            throw new Error(err);
        }

        if (response.status.code !== 10000) {
            console.log(response.status);
            throw new Error("Post workflows failed, status: " + response.status.description);
        }
    }
);
curl -X POST 'https://api.clarifai.com/v2/users/me/apps/{{app}}/workflows' \
    -H 'Authorization: Key {{PAT}}' \
    -H 'Content-Type: application/json' \
    --data-raw '{
        "workflows": [
            {
                "id": "visual-text-recognition-id",
                "nodes": [
                    {
                        "id": "detect-concept",
                        "model": {
                            "id": "2419e2eae04d04f820e5cf3aba42d0c7",
                            "model_version": {
                                "id": "75a5b92a0dec436a891b5ad224ac9170"
                            }
                        }
                    },
                    {
                        "id": "image-crop",
                        "model": {
                            "id": "ce3f5832af7a4e56ae310d696cbbefd8",
                            "model_version": {
                                "id": "a78efb13f7774433aa2fd4864f41f0e6"
                            }
                        },
                        "node_inputs": [
                            {
                                "node_id": "general-concept"
                            }
                        ]
                    },
                    {
                        "id": "image-to-text",
                        "model": {
                            "id": "9fe78b4150a52794f86f237770141b33",
                            "model_version": {
                                "id": "d94413e582f341f68884cac72dbd2c7b"
                            }
                        },
                        "node_inputs": [
                            {
                                "node_id": "image-crop"
                            }
                        ]
                    },
                ]
            }
        ]
    }'