The LLM movement started by OpenAI continues to grow rapidly with Apple and Google entering the AI market. AI researchers working at Apple published a paper on model architectures and performance improvements with the title “MM1: Methods, Analysis and Insights from Multimodal LLM Pre-Training”.
Apple publishes paper on performance improvements to the MM1 model
The paper, published this week on arxiv.org by Apple employees, provides insights into how carefully combining different types of training data and model architectures can lead to improved model performance. In the paper, the team emphasized the importance of scaling the visual components, highlighting that the choice of image encoder and the resolution of the input images have a major impact on model performance.
“For large-scale multimodal pre-training, we show that using a careful mix of caption, spaced caption and text-only data is crucial for achieving state-of-the-art few-shot results across multiple benchmarks,” the published paper said.
The team found that the image encoder has a significant impact, along with image resolution and the number of image tokens, while the image language connector design is of relatively negligible importance. The largest (30 billion parameter) MM1 model exhibited strong in-context learning capabilities, enabling it to perform multi-step reasoning on multiple input images using several steps of “chain of thought” guidance.
The research output points to the potential of large multi-model models to tackle complex, open-ended problems that require deep language understanding and generation. The MM1 research comes at a time when Apple is accelerating its investments in AI to catch up with rivals such as Google, Microsoft and Amazon, which are making progress in integrating productive AI capabilities into their products.
According to a recent Bloomberg report, the company is on track to spend $1 billion a year on AI development. Sources say Apple is working on a broad framework of language models called “Ajax” as well as a chatbot known internally as “Apple GPT.”
The company plans to integrate these technologies into Siri, Messages, Apple Music and other apps and services. For example, AI could be used to automatically create personalized playlists, help developers write code, or participate in open-ended conversation and task completion.
{{user}} {{datetime}}
{{text}}