In recent months, Apple has been quite active in sharing research papers on its advancements in generative AI, sparking speculation and anticipation about the company’s plans within this rapidly evolving field. Although the specifics of Apple’s projects remain under wraps, there has been significant buzz about a potential collaboration between Apple and Google to bring Google’s Gemini AI technology to iPhones.
Among the intriguing tidbits shared by Apple are details about an open-source model known as MLLM-Guided Image Editing (MGIE), designed for media editing using natural language instructions. Another project that has caught the AI community’s attention is Ferret UI, proposing a multimodal AI that could revolutionize the way users interact with mobile user interfaces by offering actionable advice.
The push towards integrating generative AI capabilities directly into devices rather than relying on cloud-based solutions aims to make AI more accessible and secure for users. This approach is evidenced by Google’s Gemini, which operates on devices like the Google Pixel and Samsung Galaxy S24 series, offering features like summarization and translation without the need for an internet connection.
Ferret UI aims to merge a multimodal AI model with iOS to assist with tasks like icon recognition, text finding, and widget listing. By leveraging AI for intelligent optical character recognition (OCR), Ferret UI could answer contextual inquiries about app safety for children in the App Store or provide explanations of on-screen elements.
Despite the promising features of Ferret UI, which relies in part on OpenAI’s GPT-4 technology, there remains skepticism about how Apple plans to integrate this technology with iOS. Concerns about Apple’s pace in the AI race have been raised, suggesting that the company might consider licensing technology from Google or OpenAI to bolster its offerings.
Nevertheless, functional integrations of tools like Ferret UI into iOS could present significant advancements, such as allowing users to make calendar entries from emails or initiate calls and app actions through voice commands. The potential for this technology could greatly enhance user experience, offering more intuitive and seamless interactions with mobile devices.
As the tech world eagerly anticipates further developments, the question remains whether iOS 18 will unveil any of these AI-enhanced features. With the Worldwide Developers Conference 2024 on the horizon, Apple may soon reveal how its AI research will transform the iOS user experience.
Source