Overview

Langfuse is an open-source LLM observability and evaluation platform that provides production monitoring, debugging, and quality evaluation for AI applications, giving engineering teams the visibility into LLM application behavior that traditional application monitoring tools don't cover. Every LLM call, retrieval operation, and tool use in your application creates a trace in Langfuse: inputs, outputs, model parameters, latency, token usage, and cost, all organized into hierarchical session views that show how a user's request flowed through your system. This trace data transforms debugging from 'reproduce the problem manually' into 'filter traces by the condition that produced the issue and inspect exactly what happened.' Prompt management allows storing, versioning, and deploying prompts from the Langfuse interface rather than hardcoding them, enabling non-engineers to iterate on prompts without code deployments.

Evaluation runs compare model outputs against quality criteria, either LLM-judged or human-reviewed, across datasets, enabling systematic measurement of whether prompt or model changes improve or degrade quality. The analytics dashboard aggregates trace data into trend views of quality scores, latency distributions, cost per feature, and user session patterns. SDK integrations cover Python, TypeScript, LangChain, LlamaIndex, and direct API clients.

Self-hosted under an MIT license or available as a cloud service with a free tier.

Langfuse

Alternatives

Overview

Key Features

Alternatives

Overview

Key Features

People Also Use