Google’s new AI tool uses image prompts instead of text



CNN
 — 

Google’s latest synthetic intelligence device, “Whisk,” lets folks add images to get again a mixed, AI-generated picture – even with out customers inputting any textual content to clarify what they need.

Customers can enter pictures depicting topics, setting and magnificence earlier than Whisk combines all the pieces into one picture.

Whisk is a “inventive device” for fast inspiration, Google mentioned in a blog post, versus a “conventional picture editor.” In essence, Whisk is meant as a enjoyable AI characteristic, somewhat than as one thing that’s imagined to be refined skilled work.

Large Tech firms like Google and OpenAI are racing to launch shopper merchandise that may showcase makes use of for the snazzy new expertise, at the same time as naysayers warn that the dearth of guardrails across the improvement of AI poses risks for humanity.

Since OpenAI initially launched its text-to-image creation device, Dall-E, in 2021, the idea of AI-generated paintings has swamped social media and develop into a spotlight of shopper merchandise. Google’s Whisk is an image-to-image generator, constructing upon the favored idea of text-to-image turbines.

Folks utilizing Whisk can “remix” the ultimate picture by enhancing their inputs and mixing the classes to supply completely different pictures like a plushie toy, enamel pin or sticker. Customers can add in textual content in the event that they wish to direct sure particulars, however it isn’t required to create a picture.

“Whisk is designed to permit customers to remix a topic, scene and magnificence in new and inventive methods, providing fast visible exploration as a substitute of pixel-perfect edits,” Thomas Iljic, a director of product administration at Google Labs, mentioned in a press release.

Google’s Whisk is constructed upon the generative AI developed by DeepMind, the AI lab that Google acquired in 2014.

A general view of the Google DeepMind offices after the announcement that Founder and CEO Demis Hassabis and senior research scientist, John M. Jumper, received the 2024 Nobel Prize for Chemistry on October 9, 2024 in London, England. Two Google DeepMind employees shared the 2024 Nobel Prize for Chemistry with David Baker, of the University of Washington, for discoveries related to the structure of proteins.

Whisk works through the use of Google’s core AI providing, Gemini, which debuted in December 2023, and pairing it with Imagen 3, the most recent text-to-image generator launched by DeepMind in December.

When customers add their pictures, Gemini generates a caption which is fed into Imagen 3. The method captures the “essence” of the topic versus an actual duplicate, which permits for remixing the ultimate picture but additionally means the top product would possibly stray from the immediate.

For instance, the generated picture may need a distinct peak, coiffure or pores and skin tone because the immediate pictures, Google mentioned in a blog post.

When Google first rolled out Gemini’s text-to-image creator in February, the corporate confronted preliminary backlash as a result of the device produced traditionally inaccurate pictures.

Whisk is first obtainable as a web site on Google Labs for customers within the US and is in its early phases of improvement, the corporate mentioned.

OpenAI additionally not too long ago released a text-to-video generator known as Sora, highlighting the competitors for shopper merchandise.

Dan Ives, managing director and senior fairness analyst at Wedbush Securities, instructed CNN that Whisk is one other “flex the muscle groups second” for Google within the AI and tech race.

“DeepMind is a key asset for Google,” Ives mentioned, noting that AI merchandise are part of Google’s “treasure chest” of latest merchandise for 2025, which additionally embrace a brand new Android working system in-built collaboration with Samsung and Qualcomm.

Sensi Tech Hub
Logo