Clarifai Guide
Clarifai Home
v7.11
v7.11
  • Welcome
  • Clarifai Basics
    • Start Here (5 mins or less!)
      • Your First AI Predictions (~1 min)
      • Your First Visual Search App (~1 min)
      • Your First Custom Model (~5 mins)
    • Key Terminology to Know
    • Applications
      • Create an Application
      • Application Settings
      • Collaboration
    • Authentication
      • App-Specific API Keys
      • Authorize
      • Personal Access Tokens
      • Scopes
      • 2FA
      • Roll-Based Access Control
    • Clarifai Community Quick Start
  • API Guide
    • Clarifai API Basics
      • Clarifai API Clients
        • gRPC vs HTTP Channels
      • Helpful API Resources
        • Using Postman with Clarifai APIs
    • Your Data
      • Datasets
        • Dataset Basics
        • Dataset Filters
        • Dataset Inputs
        • Dataset Versions
      • Supported Formats
      • Adding and Removing Data
      • Collectors
    • Making Predictions
      • Images
      • Video
      • Text
      • Prediction Parameters
      • Multilingual Classification
    • Creating and Managing Concepts
      • Create, Get, Update
      • Languages
      • Search by Concept
      • Knowledge Graph
    • Labeling Your Data
      • Annotations
      • Training Data
      • Positive and Negative Annotations
      • Tasks
      • Task Annotations
    • Creating and Training Models
      • Clarifai Models
      • Model Types
      • Custom Models
      • Custom Text Model
      • Create, Get, Update, Delete
      • Deep Training
    • Evaluating Models
      • Interpreting Evaluations
      • Improving Your Model
    • Creating Workflows
      • Base Workflows
      • Input Nodes
      • Setting Up Mesh Workflows
      • Common Workflows
        • Workflow Predict
        • Auto Annotation
        • Custom KNN Face Classifier Workflow
        • Visual Text Recognition
    • Search, Sort, Filter and Save
      • Search Overview
      • Combine or Negate
      • Filter
      • Rank
      • Index Images for Search
      • Legacy Search
        • Combine or Negate
        • Filter
        • Rank
        • Saved Searches
    • Advanced Topics
      • Status Codes
      • Patching
      • Pagination
      • Batch Predict CSV on Custom Text Model
      • Document Processing
  • Portal Guide
    • Clarifai Portal Basics
    • Your Data
      • Supported Formats
      • Exploring Your Data
        • Predictions
        • Annotations
        • Bulk Labeling
        • Proposals
        • Object Tracking
      • Collectors
    • Making Predictions
    • Creating and Managing Concepts
      • Create, Get, Update, Delete
      • Knowledge Graph
      • Languages
    • Labeling Your Data
      • Create a Task
      • Label Types
      • Labeling Tools
      • AI Assist
      • Workforce Management
      • Review
      • Training Data
      • Positive and Negative Annotations
    • Creating and Training Models
      • Training Basics
      • Clarifai Models
      • Custom Models
      • Model Types
      • Deep Training
    • Evaluating Models
      • Interpreting Evaluations
      • Improving Your Model
    • Creating Workflows
      • Input Nodes
      • Workflow Builder
      • Base Workflows
      • Setting Up a Workflow
      • Common Workflows
        • Auto Annotation
        • Visual Text Recognition
        • Text Classification
    • Search, Sort, Filter and Save
      • Rank
      • Filter
      • Combine or Negate
      • Saved Searches
      • Visual Search
      • Text Search
    • Advanced Topics
      • Importing Data with CSV and TSV Files
  • Data Labeling Services
    • Scribe LabelForce
  • Product Updates
    • Upcoming API Changes
    • Changelog
      • Release 8.1
      • Release 8.0
      • Release 7.11
      • Release 7.10
      • Release 7.9
      • Release 7.8
      • Release 7.7
      • Release 7.6
      • Release 7.5
      • Release 7.4
      • Release 7.3
      • Release 7.2
      • Release 7.1
      • Release 7.0
      • Release 6.11
      • Release 6.10
      • Release 6.9
      • Release 6.8
      • Release 6.7
      • Release 6.6
      • Release 6.5
      • Release 6.4
      • Release 6.3
      • Release 6.2
      • Release 6.1
      • Release 6.0
      • Release 5.11
      • Release 5.10
  • Additional Resources
    • API Status
    • Clarifai Blog
    • Clarifai Help
    • Clarifai Community
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
  1. API Guide
  2. Advanced Topics

Batch Predict CSV on Custom Text Model

Enjoy the convenience of working with CSV files and text.

PreviousPaginationNextDocument Processing

Last updated 3 years ago

Was this helpful?

Below is a script that can be used to run prediction in a batch on text/sentences stored in a CSV file, using your custom text model.

To start, you'll need to create your own Custom Text Model, either via or .

Make sure to record the model ID, version ID that you want to use (each model gets one after being successfully trained), and the API key of the application in which the model exists.

This script assumes that you have a CSV file which has one column named "text" where the text you want to run predictions on is. It'll output another CSV file containing the predicted concepts for each text, together with confidence values.

"""
A script designed for running bulk NLP model predictions on a .csv file of text entries.
It requires the library clarifai_grpc (to install it: `pip install clarifai_grpc`).

Mandatory arguments:
- a CSV file with a "text" column; additional columns will be included/returned in the output file
- a Clarifai API KEY
- the model ID of the NLP model that you wish to predict with
- the specific model version ID for the above NLP model

Optional/Default arguments:
- the "top n" number of results to be returned from the model predictions. default 3. [1-200]
- the batch size or number of inputs to send in per predict call. default 32. max 128.

Example usage:
python nlp_model_predicts --csv_file CSVFILE --api_key API_KEY --model_id MODEL_ID --model_version MODEL_VERSION

Example input CSV file:
text,random_column_1
"The quick brown fox something something.",perhaps_some_data
"The lazy dog is...",some_other_data

Example output CSV file:
text,predict_1_concept,predict_1_value
"The quick brown fox something something.",predicted_concept,0.873
"The lazy dog is...",predicted_concept,0.982
"""

import argparse
import csv
import os

from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_code_pb2


def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))


def get_predict(texts, stub, model_id, model_version, auth_metadata, top_n):
    """
    inputs:
    • texts: a list of text to run predictions on
    • auth_metadata: (('authorization', 'Key YOUR_API_KEY'),)
    • top_n: integer for the desired max number of returned concepts [limit 20]

    returns:
    • the original text
    • predict_n_concept : predicted concept ID
    • predict_n_value   : predict concept value
    """

    if len(texts) > 128:
        raise Exception('Input length over maximum batch size. Please send in batches less than 128.')

    inputs = [
        resources_pb2.Input(data=resources_pb2.Data(text=resources_pb2.Text(raw=x)))
        for x in texts
    ]

    # make the model predict request
    request = service_pb2.PostModelOutputsRequest(
        model_id=model_id,
        version_id=model_version,
        inputs=inputs,
    )

    response = stub.PostModelOutputs(request, metadata=auth_metadata)

    if response.status.code != status_code_pb2.SUCCESS:
        raise Exception("A failed response: " + str(response.status) + "\n\nFull response:\n" + str(response))

    # parse results
    list_of_dicts = []
    for resp in response.outputs:
        temp_dict = {
            'text': resp.input.data.text.raw
        }

        for n in range(top_n):
            try:
                temp_dict['predict_{}_concept'.format(n + 1)] = resp.data.concepts[n].id
                temp_dict['predict_{}_value'.format(n + 1)] = "%.3f" % resp.data.concepts[n].value
            except Exception as e:
                print(e)
                break

        list_of_dicts.append(temp_dict)

    return list_of_dicts


def main():
    parser = argparse.ArgumentParser(
        description=
        'Given a CSV file with a "text" column, provide NLP model predictions.'
    )
    parser.add_argument('--api_key', required=True, help='the app\'s API key', type=str)
    parser.add_argument('--csv_file', required=True, help='the CSV file with texts', type=str)
    parser.add_argument('--model_id', required=True, help='the model ID', type=str)
    parser.add_argument(
        '--model_version', required=True, help='the specific model version ID', type=str)
    parser.add_argument(
        '--top_n', default=3, type=int, help='num results returned. default 3. max 200.')
    parser.add_argument(
        '--batch_size', default=32, type=int, help='prediction batch size. default 32. max 128')

    args = parser.parse_args()

    # setup the gRPC channel
    channel = ClarifaiChannel.get_json_channel()
    stub = service_pb2_grpc.V2Stub(channel)
    metadata = (('authorization', f'Key {YOUR_API_KEY}'.format(args.api_key)),)

    texts = []
    with open(args.csv_file) as f:
        csv_reader = csv.DictReader(f)
        for row in csv_reader:
            if 'text' not in row:
                raise Exception('The CSV file must contain column with a header named text')

            texts.append(row['text'])

    predicted_data = []
    # run model predictions in batches
    for i, texts_chunk in enumerate(chunker(texts, args.batch_size)):
        print("Predicting chunk #" + str(i + 1))
        predicted_data.extend(get_predict(texts_chunk, stub, args.model_id, args.model_version, metadata, args.top_n))

    output_name = os.path.splitext(args.csv_file)[0] + '_results.csv'
    print('Results saved to {}'.format(output_name))

    with open(output_name, 'w') as f:
        csv_writer = csv.DictWriter(f, fieldnames=predicted_data[0].keys())
        csv_writer.writeheader()
        csv_writer.writerows(predicted_data)


if __name__ == '__main__':
    main()

Example Usage

number,text
1,"We have never been to Asia, nor have we visited Africa."
2,"I am never at home on Sundays."
3,"One small action would change her life, but whether it would be for better or for worse was yet to be determined."
4,"The waitress was not amused when he ordered green eggs and ham."
5,"In that instant, everything changed."

With that, you can run the script on the CSV file in the following manner, which will produce a new CSV file.

python nlp_model_predicts.py --api_key YOUR_API_KEY --model_id YOUR_MODEL_ID --model_version YOUR_MODEL_VERSION_ID --csv_file my_data.csv --top_n 2
text,predict_1_concept,predict_1_value,predict_2_concept,predict_2_value
"We have never been to Asia, nor have we visited Africa.",negative,1.000,positive,0.000
I am never at home on Sundays.,negative,1.000,positive,0.000
"One small action would change her life, but whether it would be for better or for worse was yet to be determined.",positive,1.000,negative,0.000
The waitress was not amused when he ordered green eggs and ham.,negative,1.000,positive,0.000
"In that instant, everything changed.",positive,0.998,negative,0.002

Let's say you have the following CSV file, and want to predict, for each text in a row, whether the sentence is grammatically positive or negative. You first build a custom text model that was created to map text into two concepts: "positive" and "negative. See our on how to do that via our API.

our Portal
using the API
Custom Text Model walkthrough