AI-Powered Virus Discovery: Alibaba Cloud and Sun Yat-sen University Collaborate to Discover 180 Novel RNA Virus Clades
AI-Powered Virus Discovery: Alibaba Cloud and Sun Yat-sen University Collaborate to Discover 180 Novel RNA Virus CladesA research collaboration between Sun Yat-sen University and Alibaba Cloud, published recently in the prestigious journal Cell, has leveraged cloud computing and AI to uncover 180 novel clades and over 160,000 new RNA viruses, a nearly 30-fold increase in known virus species. This discovery significantly advances our understanding of RNA virus diversity and evolutionary history
AI-Powered Virus Discovery: Alibaba Cloud and Sun Yat-sen University Collaborate to Discover 180 Novel RNA Virus Clades
A research collaboration between Sun Yat-sen University and Alibaba Cloud, published recently in the prestigious journal Cell, has leveraged cloud computing and AI to uncover 180 novel clades and over 160,000 new RNA viruses, a nearly 30-fold increase in known virus species. This discovery significantly advances our understanding of RNA virus diversity and evolutionary history.
Cell is a leading journal in the field of life sciences, representing the highest level of research in this domain. Only a limited number of papers are accepted into Cell, with just dozens published annually from China. The published paper presents a novel deep learning-based RNA virus discovery framework, marking a milestone in the application of deep learning for virus discovery and establishing a new paradigm for virology research.
Viruses are intimately linked to human health, yet only around 5,000 confirmed virus species are known, representing just the tip of the iceberg in the viral world. Traditional RNA virus identification methods heavily rely on sequence homology comparisonidentifying viruses by comparing the similarity between an unknown virus and known ones. However, due to the vast diversity and high degree of divergence among RNA viruses, traditional methods struggle to capture "dark matter viruses" with limited or no homology, leading to low efficiency in new virus discovery.
The integration of AI and virology research is breaking through this bottleneck. This paper introduces a novel deep learning model, "LucaProt," built upon the Transformer framework and large-scale model representation technology. Combining protein sequence and intrinsic structural characteristics, LucaProt demonstrates exceptional performance on an independent test dataset, achieving high accuracy (false positive rate of only 0.014%) and specificity (false negative rate of 1.72%).
The research team analyzed 10,487 biological samples collected from various environments across the globe, identifying 513,134 viral genomes, representing 161,979 potential virus species, and 180 RNA virus clades. This discovery significantly increases the number of known RNA virus clades by approximately nine times and the number of virus species by about 30 times. Notably, 23 of these clades could not be identified using sequence homology methods, representing the "dark matter" of the virosphere.
The paper also reveals several novel discoveries in the field of virology:
- Discovery of the longest RNA virus genome to date, measuring 47,250 nucleotides.
- Identification of genome structures beyond previous understanding, showcasing the flexibility of RNA virus genome evolution.
- The existence of RNA virus diversity even in extreme environments such as high-temperature deep-sea hydrothermal vents.
Professor Shi Mang, from Sun Yat-sen University's School of Medicine, stated, "AI applications in scientific research are unstoppable. Exploring scientific questions through AI methods has led to significant breakthroughs. This research paradigm will become the norm in future science and potentially a crucial tool for understanding the world."
He Yong, co-first author of the paper and Algorithm Expert at Alibaba Cloud's Feitian Laboratory, commented, "The new AI+virology research framework has revolutionized our understanding of the virosphere. As we continually refine our knowledge, it will help us better predict potential future pandemics and further advance the development of RNA virus vaccines."
Over the past few years, Alibaba Cloud has actively collaborated with domestic universities and research institutions, achieving notable research outcomes in the field of life sciences, including LucaOne, a unified nucleic acid and protein foundational model, LucaProt for RNA virus discovery, and LucaPCycle for identifying the phosphate cycle protein family.
Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.(Email:[email protected])