Latency (AI)

Latenz (KI)

Latency refers to the time span between a request to an AI model and the start or completion of its response. It depends on factors such as model size, context length, and server load, and is a particularly important quality factor for interactive applications.

Source: NVIDIA — AI inference explained

← Back to the glossary