OpenAI released the pricing plan for accessing GPT3 services a week ago. To further understand what these pricing tiers mean, what is the monthly cost for an application, and what are implications to business decisions, I will share some of my findings.
The pricing plan has been settled into 4 tiers: Explore, Create, Build, and Scale. For the “Create” plan, you will pay $100 per month for 2M tokens, and 8 cents per additional 1K tokens. The price for fine-tuning is categorized in the “Scale” plan, and the fine-tuning is currently only available to selected users and applications.
What are the tokens? The difference with words?
Tokens are pieces of words, or you can call them subwords. It is common in the NLP world to leverage tokens to handle unknown words also improve efficiency(handling more words by merging across subwords).
For example, the word “lower” gets broken up into the tokens “low”, “er”, the word “Descartes” gets broken up into the tokens “Desc”, “art” and “es”, while a short and common word like “pear” is a single token. On average, in English text, one token is roughly 4 characters. If you are interested in different tokenization algorithms, recommend to check out Byte Pair Encoding and lectures from Daniel Jurafsky.
In general, you will have more tokens generated from the input words. As a point of reference, Shakespeare’s entire collection is ~900,000 words or 1.2M tokens. Based on the official OpenAI API doc, the current limitation for prompt length is 2048 tokens, translated to words will be ~ 1500 words.
From the above info, we find a general token to word ratio about 1.4. It means for each word in your prompt, it will be counted as 1.4 tokens.
To get the more accurate token counts, you can either use the tokenizer function from the huggingface’s transformer library. Or use the prebuilt token estimator to get more accurate token count estimations.
For example, with the following prompt and max_token set at 64, I got 287 tokens for this prediction.
prompt = "I am a highly intelligent question answering bot. If you ask me a question that is rooted in truth, I will give you the answer. If you ask me a question that is nonsense, trickery, or has no clear answer, I will respond with \"Unknown\".\n\nQ: What is human life expectancy in the United States?\nA: Human life expectancy in the United States is 78 years.\n\nQ: Who was president of the United States in 1955?\nA: Dwight D. Eisenhower was president of the United States in 1955.\n\nQ: Which party did he belong to?\nA: He belonged to the Republican Party.\n\nQ: What is the square root of banana?\nA: Unknown\n\nQ: How does a telescope work?\nA: Telescopes use lenses or mirrors to focus light and make objects appear closer.\n\nQ: Where were the 1992 Olympics held?\nA: The 1992 Olympics were held in Barcelona, Spain.\n\nQ: How many squigs are in a bonk?\nA: Unknown\n\nQ:"engines_generate_tokens(prompt, 64, 1)287
Deconstruct the Pricing:
In the above 4 tiers, the numbers of tokens per tier include both the prompt(input) and completion(prediction) tokens. Now let’s figure out how many queries(predictions) can you get in each tier and the cost per query.
Assume each query consumes 1000 tokens, this includes both the tokens of prompt in request and max_tokens in predictions response. You can adjust the number of tokens per completion API call to your specific application(like some of the QA prompt might be less than 1000 tokens).
With the above assumption, you will be able to send 100 queries in the “Explore”(free) tier. In the create tier, with $100 per month, you can send about 2,000 API calls, with each API call at 5 cents, and an additional API call at 8 cents.
This will give you a sense of the numbers of the covered query in each tier, and what is the unit price for each API call. To get the total cost estimation, you can also breakdown the price by getting the total number of tokens of your application in a month, adding the additional cost if it is over the available tokens in the tier. Otherwise, you will just pay the monthly subscription fees.
Takeaway on tokens and prices:
- A typical token to word ratio is 1.4
- On average, 1 token is about 4 characters in English text
- Tokens are counted for both input prompt and predicted text
- On average, 1000 tokens will be consumed per query
- Price per query ranges from 4 cents to 8 cents depends on the tier and excessive usage
The Case Study:
Let’s crunch the numbers to get a further estimate of the total cost of integrating GPT3 into your application with 2 examples.
Let’s say we are going to build a text classification app to analyze customer’s feedback in real-time. Let’s do a cost comparison between major platforms like AWS ML service Comprehend and GPT3.
- Assume we have 10 customers enter feedback per minute
- Each feedback is 300 characters
- Throughput: 50 characters per second (300 characters per user feedback * 10 feedbacks per minute / 60 seconds)
- We need to provision and endpoint with at least 1 inference unit (IU), which can handle 100 characters per second
- The price of 1 inference unit is $0.0005 per second
- We are running the inference endpoint 12 hours per day, so the inference cost will be $21.6 ($0.0005 per second *3600 seconds per hour * 12 hours)
- Total inference cost per month will be $648 ($21.6 per day * 30 days)
- $3 per hour for model training
- Assume 20 hours of training time per month
- Total training cost per month will be $60
Model management cost:
- $0.5 per month for model storage
Total cost per month: $708.5 ($648 inference + $60 training + $0.5 storage). The cost does not include data collecting and labeling, which is another major cost of this project.
Let’s say we are using the build tier, $400 per month covers 10M tokens and $0.06 per additional 1k tokens.
- Users will generate 4,320,000 characters per day (300 characters per feedback * 10 customers per minute * 1,440 minutes per day)
- Translated to tokens will be 1,080,000 tokens per day (on average 1 token is about 4 characters in English text, 4,320,000 characters / 4)
- This give us 32,400,000 tokens per month (1,080,000 * 30 days)
- Subtract 10M tokens covered by the tier price, the remaining 22,400,000 tokens will be charged at $0.06 per 1k tokens, this yields $1,344 (22,400,000 / 1000 * $0.06)
- So the total cost from GPT3 will be $1,744 ($400 monthly subscription + $1,344 for additional tokens)
To warp up, here is the monthly cost for our customer feedback classification app from AWS and GPT3:
To warp up, here is the monthly cost for our customer feedback classification app from AWS and GPT3:
AWS: $708.5 (not include data preparation cost)
So, should you use GPT3 for your application? 😕
At the first impression, the cost of integrating GPT3 into your application is not cheap, not even to mention the cost of fine-tuning. Then should you subscribe to the commercial license of GPT3 for your product? Here are some thoughts:
- Pick the high value/impact ML task to work on
- Track quality with the evaluation metrics
- Error analysis to understand the risks
- A basic cost and benefits analysis is needed
Pick the high value/impact ML task
Like any ML project, the first step is to under the problem, evaluate resources and risks, then pick the most valuable or fruitful ML task to work on. And this also highly depends on application types, business needs, and values.
In general, if a typical person can do a mental task with less than one second of thought, we can probably automate it with AI. But with GPT3, we found it is pretty good at some of the tasks that even humans can not finish within seconds or minutes of thoughts. Like email summarization and auto-generation, reconstructing and extending the relationship between entities, parsing complicated semantic queries into end results.
For those tasks that training data is hard to get or takes more human mental efforts to finish, it would be great to leverage GPT3.
Track quality with evaluation metrics
No matter it is using GPT3 or the current ML pipeline, we need ways to qualitatively measure the prediction quality. This can be either accuracy, F1, etc. We will use the same metrics to evaluate the quality of the system.
The difference for the ML team will be instead of developing a system or model architecture that does well on the dev/test set, the team will need to understand and tweak the prompt (also called prompt engineering)to have GPT3 predictions achieve your product quality metrics.
For details about the role and responsibility changes of a ML team in the era of GPT3, check out my previous article, when GPT3 as a service.
Error analysis to understand the risks
GPT3 predictions are not always correct or consistent, sometimes the prediction might be risky(sensitive or unsafe) to be directly delivered to the end-users. So leverage the content filtering API from OpenAI or set up your post-filtering services are important to classify the predictions as safe, sensitive, unsafe, and take actions from there.
Like the traditional ML pipeline error analysis, which is designed to identify the most promising directions and iteratively improve the system quality from there. With GPT3, most of this part will be improving the prompt (prompt engineering) rather than improving the model architecture or algorithms.
On top of the error analysis, prompt engineering, a hot-fix pipeline might be helpful to reduce the potential risks as well.
Cost and benefits analysis
As one of the most important factors to decide whether to use GPT3 for your product, we will need to outline a comprehensive list of the overall costs and benefits associated with the project.
On the cost side, except for the above estimations on model training, inference, and management cost. We also need to factor in the data collecting/labeling cost, human capital cost(Engineer, Scientist, PM man-hours), ML infra maintenance Cost, etc.
On the benefits side, direct revenue contributed by using GPT3, cost-saving from replacing traditional ML pipeline, competitive advantage or market share gained as a result of using GPT3, and other intangible benefits also need to be factored in.
With the above information, we will be able to get a systematic view of the cost and benefits of using GPT3 to make strategic choices.
GPT3 amazed us in many tasks, but to bring it from prototype to production, we still need to follow a framework to guide us through the whole project life cycle from both technology and business perspectives.
Hope this article helps you better understand the announced pricing plan of GPT3 API, and I believe the price and benefits in each tier will get evolved with the feedback from the developer community. In terms of whether or not to use GPT3 in your project, hope the above examples and framework can provide you some reference and guidance.
I am planning to share more experience and thoughts on building with GPT3. Here are a couple of topics in mind, let me know if you have any specific topic want to explore or collaborate:
- Best practices to optimize token utilization in applications
- Pros and cons of breaking down tasks into multiple GPT3 API calls
To learn more about why GPT3 is disruptive, end to end project built with GPT3, and implications to jobs and society, check out my previous blogs:
GPT3 The Dream Machine in Real World
Like Sam’s tweets, there is a hype in the tech community about the latest GPT-3 released by OpenAI in June 2020. It is…