Microsoft believes a multimodal approach paves the way for human-level AI.
According to a recent article on Ars Technica, Microsoft has unveiled Kosmos-1, an AI language model that has the ability to perform visual perception tasks. This marks a significant advancement in AI capabilities, as previous language models were primarily focused on understanding and processing text.
Kosmos-1 is built on Microsoft’s existing language model, GPT-3, but with the addition of visual perception capabilities. The model is trained on large amounts of visual data and can generate descriptions of images, answer questions about them, and even perform visual reasoning tasks.
In a blog post, Microsoft stated that “Kosmos-1 is a step towards a more general AI that can understand the world as humans do, with the ability to see, hear, and reason about what it sees and hears.”
Continue reading… “Microsoft unveils AI model that understands image content, solves visual puzzles”