Multimodality
Multimodalität
Multimodality refers to an AI model's ability to process and generate not only text but also other data types such as images, audio, or video. A multimodal model can, for example, describe an image or answer a question about a chart.
Source: Google DeepMind — Multimodal AI