Yang mentioned she and her colleagues suggest a “model-over-models” strategy to LLM improvement. That requires a decentralised paradigm through which builders prepare smaller fashions throughout hundreds of particular domains, together with code era, superior information evaluation and specialised AI brokers.
These smaller fashions would then evolve into a big and complete LLM, also called a basis mannequin. Yang identified that this strategy might cut back the computational calls for at every stage of LLM improvement.
That paradigm could make LLM improvement extra accessible to college labs and small corporations, in keeping with Yang. An evolutionary algorithm then evolves over these domain-specific fashions to finally construct a complete basis mannequin, she mentioned.
Efficiently initiating such LLM improvement in Hong Kong would depend as an enormous win for the town, because it seems to show into an innovation and know-how hub.
In accordance with Yang, her staff has already verified that small AI fashions, as soon as put collectively, can outperform essentially the most superior LLMs in particular domains.
“There’s additionally a rising consensus within the business that with high-quality, domain-specific information and steady pretraining, surpassing GPT-4/4V is very achievable,” she mentioned. The multimodal GPT-4/4V analyses picture inputs supplied by a consumer, and is the newest functionality OpenAI has made broadly out there.
Yang mentioned the subsequent step is to construct a extra inclusive infrastructure platform to draw extra expertise into the AI group, in order that some releases might be made by the top of this 12 months or early subsequent 12 months.
“Sooner or later, whereas a number of cloud-based massive fashions will dominate, small fashions throughout numerous domains will even flourish,” she mentioned.
Yang, who acquired her PhD from Duke College in North Carolina, has printed greater than 100 papers in top-tier conferences and journals, and holds greater than 50 patents within the US and mainland China. She performed a key function in creating Alibaba’s 10-trillion-parameter M6 multimodal AI mannequin.