Analytics for LLM Apps: The Metrics Developers Need to Track with the ChatGPT API

Learn the key metrics LLM app developers need to track for the ChatGPT API to optimize user experience, latency, and token costs. Discover how powerful analytics tools can help you make informed decisions.

Photo: Propel

As an LLM app developer, you’re likely aware of the importance of tracking and monitoring the performance of your apps. But do you know which metrics are crucial to optimizing user experience, latency, and token costs? In this blog post focusing on OpenAI’s ChatGPT API, we’ll dive into the key metrics you need to track, how to derive valuable insights from these metrics, and the importance of powerful analytics tools to make informed decisions for your ChatGPT-powered LLM apps.

What's different about LLM apps?

When it comes to LLM apps and ChatGPT, there are two crucial differences that set them apart from other software applications. These differences have a significant impact on their costs and user experience.

LLMs consume tokens

A defining characteristic of large language models (LLMs) is that they consume tokens, and token consumption directly affects the cost of LLM apps. Token usage needs to be tracked to ensure that costs are managed and allocated appropriately.

Tradeoffs between token usage, latency, and user experience

In LLM apps, there are tradeoffs between token usage, latency, and user experience. By understanding these tradeoffs and monitoring key metrics, you can optimize your LLM app's performance, providing a better experience for your users.

Key metrics for LLM apps

To optimize your LLM or ChatGPT-powered app, focus on these five key metrics: token usage, latency, user feedback, conversation length, and “finished reason.” These metrics will be important, no matter if you build with OpenAI or another LLM provider.

Token Usage

Token usage is crucial because both input and output tokens are consumed during app usage. Different models have varying token costs, so it's essential to monitor and optimize token usage to manage expenses effectively. Token usage is a great proxy for the length of the prompt and the response.

Latency

More complex inputs can lead to increased response times, which can negatively affect user experience. Additionally, different models exhibit different latencies. By tracking latency, you can ensure a smooth and responsive user experience.

Feedback

Capturing user feedback is vital to gauging whether your app is providing value to its users. Feedback will vary depending on the app and its use case, so it's essential to tailor your feedback capture methods accordingly.

Conversation length

You’ll want to track the number of turns in a conversation. This allows you to monitor and display to the user any trends in conversation length over time as you make changes to the experience, switch models, or tune parameters.

Finished reason

Finally, you'll want a metric that counts the occurrences of each possible value for the <span class="code-exp">finished_reason</span> property. This will help you identify how often users experience incomplete responses. Tuning the prompt and <span class="code-exp">max_tokens</span> parameters can help reduce the frequency of incomplete responses for your users.

From Metrics to Insights

To gain valuable insights from the metrics discussed above, you'll need a powerful analytics backend that allows you to slice and dice the data efficiently.

Key Drill-Down Dimensions

To better understand the metrics above, ensure that you capture the following dimensions from OpenAI’s APIs as well as your app’s metadata:

Dimension	Description	Description
prompt_tokens	The tokens consumed by the prompt.	OpenAI API response
completion_tokens	The tokens consumed by the completion.	OpenAI API response
total_tokens	The total tokens consumed.	OpenAI API response
index	The index of the conversation.	OpenAI API response
model	The name of the model used, e.g., gpt-3.5-turbo.	OpenAI API request
prompt	The prompt that was used.	OpenAI API request
temperature	The temperature value used.	OpenAI API request
top_p	The top_p value that was used.	OpenAI API request
max_tokens	The max_tokens value that was used.	OpenAI API request
presence_penalty	The presence_penalty value that was used.	OpenAI API request
frequency_penalty	The frequency_penalty value that was used.	OpenAI API request
logit_bias	The logit_bias value that was used.	OpenAI API request
app_metadata	All your application metadata helps contextualize the ChatGPT request/response.	Your application
feedback	The user’s feedback on the model’s response.	Your application

By leveraging these dimensions, LLM app developers leveraging the ChatGPT API can fine-tune requests to improve user experience, latency, and token costs for their apps.

Leveraging Propel for ChatGPT-powered LLM apps

ChatGPT developers can greatly benefit from utilizing Propel, a powerful analytics backend with a GraphQL API and UI component library to collect and query their metrics data. Propel enables engineering teams to deliver high-performance customer-facing analytics without the need to scale or manage infrastructure.

Benefits

Metrics definition: Developers can define the token usage, latency, and feedback metrics with all the dimensions described above.
Aggregations: Propel will automatically aggregate data to provide developers with the insights they need.
Filtering and grouping: Developers can filter and group metrics by any of the dimensions defined, giving them powerful analytical capabilities.
GraphQL API: Propel's GraphQL API enables internal monitoring as well as customer-facing analytics features.
UI Component Library: Propel's UI component library provides a set of pre-built data visualization and React components, equipping developers to create visually appealing and informative analytics dashboards for their ChatGPT-powered apps.
No Infrastructure Scaling or Management: With Propel, developers can focus on building and optimizing their ChatGPT apps without worrying about the complexities of scaling and managing their analytics infrastructure.

By leveraging Propel, ChatGPT developers can access powerful analytics tools that help them track essential metrics, optimize app performance, and provide an improved user experience without the burden of infrastructure management. To learn more, you can read the docs or get started with a free Propel account.

Conclusion

By focusing on key metrics like token usage, latency, and feedback, ChatGPT developers can optimize their apps, ensuring a better user experience while managing costs effectively. Powerful analytics tools like Propel, with the ability to slice and dice data across dimensions, can further aid in making informed decisions both in internal tools and customer-facing apps, that lead to shorter time to market and better user experiences for ChatGPT apps.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.