Meta Releases AI Models That Generate Both Text and Images

Meta has launched 5 new synthetic intelligence (AI) analysis fashions, together with ones that may generate each textual content and pictures and that may detect AI-generated speech inside bigger audio snippets.

The fashions had been publicly launched Tuesday (June 18) by Meta’s Fundamental AI Research (FAIR) group, the corporate mentioned in a Tuesday press release.

“By publicly sharing this analysis, we hope to encourage iterations and in the end assist advance AI in a accountable method,” Meta mentioned within the launch.

One of many new fashions, Chameleon, is a household of mixed-modal fashions that may perceive and generate each pictures and textual content, in accordance with the discharge. These fashions can take enter that features each textual content and pictures and output a mixture of textual content and pictures. Meta instructed within the launch that this functionality could possibly be used to generate captions for pictures or to make use of each textual content prompts and pictures to create a brand new scene.

Additionally launched Tuesday had been pretrained fashions for code completion. These fashions had been educated utilizing Meta’s new multitoken prediction strategy, during which giant language fashions (LLMs) are educated to foretell a number of future phrases directly, as a substitute of the earlier strategy of predicting one phrase at a time, the discharge mentioned.

A 3rd new mannequin, JASCO, gives extra management over AI music technology. Somewhat than relying primarily on textual content inputs for music technology, this new mannequin can settle for numerous inputs that embrace chords or beat, per the discharge. This functionality permits the incorporation of each symbols and audio in a single text-to-music technology mannequin.

One other new mannequin, AudioSeal, options an audio watermarking approach that allows the localized detection of AI-generated speech — which means it could possibly pinpoint AI-generated segments inside a bigger audio snippet, in accordance with the discharge. This mannequin additionally detects AI-generated speech as a lot as 485 instances quicker than earlier strategies.

The fifth new AI analysis mannequin launched Tuesday by Meta’s FAIR group is designed to extend geographical and cultural range in text-to-image technology techniques, the discharge mentioned. For this process, the corporate has launched geographic disparities analysis code and annotations to enhance evaluations of text-to-image fashions.

Meta mentioned in an April earnings report that capital expenditures on AI and the metaverse-development division Actuality Labs will vary between $35 billion and $40 billion by the top of 2024 — expenditures that had been $5 billion larger than it initially forecast.

“We’re constructing various totally different AI companies, from our AI assistant to augmented actuality apps and glasses, to APIs [application programming interfaces] that assist creators have interaction their communities and that followers can work together with, to enterprise AIs that we expect each enterprise finally on our platform will use,” Meta CEO Mark Zuckerberg mentioned April 24 throughout the firm’s quarterly earnings name.

For all PYMNTS AI protection, subscribe to the each day AI Newsletter.

Sensi Tech Hub
Logo
Shopping cart