An international team of researchers has expanded our understanding of genetic variations associated with human diseases by leveraging a novel AI algorithm that analyzes primate DNA data.
To begin, the scientists extensively sequenced over 800 individual samples obtained from 233 non-human primate species, encompassing all 16 families ranging from lemurs to gorillas. In order to interpret the collected data, they devised an innovative algorithm called PrimateAI-3D.
PrimateAI-3D is constructed upon deep-learning language architectures similar to those employed in ChatGPT but specifically tailored to model genomic sequences instead of linguistic ones. By subjecting the algorithm to mutations that are known to be benign in our primate counterparts through the process of natural selection, the researchers trained its parameters. This approach enabled the algorithm to identify non-threatening genetic variants and, through a process of elimination, mutations that are likely to contribute to disease.
The next step involved applying PrimateAI-3D to detect potentially harmful mutations within the human population. This was accomplished by utilizing health records and gene variant data from over 400 individuals who had generously donated samples to the UK Biobank project. Notably, the researchers discovered that the algorithm exhibited “remarkable advancements” in accurately predicting humans’ increased genetic risk for common diseases.
Furthermore, the method’s ability to surpass genetic bias associated with individuals of white European ancestry is a significant advantage, as it enables the algorithm to identify pathogenic mutations with greater precision than existing techniques.
According to Kyle Farh, the VP of Artificial Intelligence at Illumina, despite the world’s population reaching 8 billion, our genetic diversity remains reminiscent of the original people of approximately 10,000 common ancestors from whom we all descend. This realization has led scientists to recognize that relying solely on human genome sequencing data is insufficient for a comprehensive understanding of the human genome.
To overcome this limitation, researchers have turned to a combination of human and non-human primate data. Given that living primates share over 90% of their DNA with humans, integrating this data is crucial. Illumina’s research has revealed that if a genetic variant is tolerated by natural selection in another primate, the likelihood of it causing disease in humans is a mere 1%.
The study’s findings have significant implications for health research. They can assist scientists in prioritizing genetic variants that pose the greatest risk to humans. Furthermore, the research can contribute to the conservation efforts aimed at protecting other primate populations.
Kyle Farh believes we are merely scratching the surface of what can be learned in this field. The idea that insights into our own species can be gleaned from studying other species is both captivating and profound.
The complete study has been published in the journal Science.