Exclusive | PolyU’s top AI scientist Yang Hongxia seeks to revolutionise LLM development in Hong Kong

At current, Yang mentioned LLM improvement has largely relied on deploying superior and costly graphics processing items (GPUs), from the likes of Nvidia and Advanced Micro Devices, in data centres for tasks involving huge quantities of uncooked information, which has put deep-pocketed Large Tech corporations and well-funded start-ups at a serious benefit.
The doorway to the Hung Hom campus of Hong Kong Polytechnic College, the place synthetic intelligence scientist Yang Hongxia serves as a professor on the Division of Computing. Photograph: Solar Yeung

Yang mentioned she and her colleagues suggest a “model-over-models” strategy to LLM improvement. That requires a decentralised paradigm through which builders prepare smaller fashions throughout hundreds of particular domains, together with code era, superior information evaluation and specialised AI brokers.

These smaller fashions would then evolve into a big and complete LLM, also called a basis mannequin. Yang identified that this strategy might cut back the computational calls for at every stage of LLM improvement.

Area-specific fashions which can be sometimes capped at 13 billion parameters – a machine-learning time period for variables current in an AI system throughout coaching, which helps set up how information prompts yield the specified output – can ship efficiency that’s on par or exceeds OpenAI’s latest GPT-4 models, whereas utilizing far fewer GPUs from round 64 to 128 playing cards.

That paradigm could make LLM improvement extra accessible to college labs and small corporations, in keeping with Yang. An evolutionary algorithm then evolves over these domain-specific fashions to finally construct a complete basis mannequin, she mentioned.

Efficiently initiating such LLM improvement in Hong Kong would depend as an enormous win for the town, because it seems to show into an innovation and know-how hub.

Yang Hongxia, a number one synthetic intelligence scientist, beforehand labored on AI fashions at TikTok-owner ByteDance in the US and Alibaba Group Holding’s analysis arm Damo Academy. Photograph: PolyU
Hong Kong’s dynamic ambiance, in addition to its entry to AI expertise and sources, make the town a super place to conduct analysis into this new improvement paradigm, Yang mentioned. She added that PolyU president Teng Jin-guang shares this imaginative and prescient.

In accordance with Yang, her staff has already verified that small AI fashions, as soon as put collectively, can outperform essentially the most superior LLMs in particular domains.

“There’s additionally a rising consensus within the business that with high-quality, domain-specific information and steady pretraining, surpassing GPT-4/4V is very achievable,” she mentioned. The multimodal GPT-4/4V analyses picture inputs supplied by a consumer, and is the newest functionality OpenAI has made broadly out there.

Yang mentioned the subsequent step is to construct a extra inclusive infrastructure platform to draw extra expertise into the AI group, in order that some releases might be made by the top of this 12 months or early subsequent 12 months.

“Sooner or later, whereas a number of cloud-based massive fashions will dominate, small fashions throughout numerous domains will even flourish,” she mentioned.

Yang, who acquired her PhD from Duke College in North Carolina, has printed greater than 100 papers in top-tier conferences and journals, and holds greater than 50 patents within the US and mainland China. She performed a key function in creating Alibaba’s 10-trillion-parameter M6 multimodal AI mannequin.

Sensi Tech Hub
Logo