Latency (AI)
Latenz (KI)
Latency refers to the time span between a request to an AI model and the start or completion of its response. It depends on factors such as model size, context length, and server load, and is a particularly important quality factor for interactive applications.
Source: NVIDIA — AI inference explained