Universidad
Politécnica de Madrid

A study unveils the importance of tens of thousands of previously unexplored microbial genes

The work drives the discovery of new molecular functions and improves understanding of the interactions between the microbiome and its environment.

19.12.23

For more than two decades, scientists around the world have contributed to unveiling the immense diversity of microorganisms that inhabit the different ecosystems of our planet, from oceans and soils to our own bodies. Metagenomics, the field responsible for studying genetic material (DNA) obtained directly from environmental samples, has made it possible, for example, to identify thousands of new species and shed light on a large part of their genetic content. However, because most of these microorganisms have not yet been isolated in the laboratory, we ignore fundamental aspects of their genomes.

A study published in the journal Nature provides new clues to understand this large amount of unknown DNA, revealing its functional and evolutionary significance. The work, led by Dr. Jaime Huerta Cepas of the Comparative Genomics and Metagenomics group at the Center for Plant Biotechnology and Genomics (UPM-INIA/CSIC), a joint center between the INIA National Center of the Spanish National Research Council (CSIC) and the Technical University of Madrid (UPM), provides an extensive catalog of new gene families and highlights the need to incorporate this information in future metagenomic studies. 

Dr. Huerta Cepas points out that, although current metagenomic databases contain millions of unknown DNA sequences coding for new genes, the lack of information on their origin and biological relevance has greatly limited their integration into microbiome studies. According to Dr. Huerta Cepas, the characterization of this enormous number of new genes will allow us not only to discover new molecular functions, but also to better understand the interactions between microorganisms and their environment.

In the work, more than 149,000 microbial genomes obtained from diverse environments were examined, establishing a catalog of approximately 400,000 new gene families. Dr. Álvaro Rodríguez del Río, lead author of the research, comments on how, by means of comparative genomics and phylogenetic techniques, they have managed to identify many new genes. Despite these genes being absent in known microorganisms, they are highly prevalent in different ecosystems and are subject to strong selective pressure. According to Dr. Rodríguez del Río, the identification of these novel gene families was based on very strict filters, discarding more than 90% of all the available genetic material. Even so, the published catalog triples the number of microbial gene families known to date.

This comprehensive analysis of the microbiome also provides functional information on more than 130,000 new gene families. For this purpose, various genomic conservation analyses and artificial intelligence-based protein structure prediction techniques were used. The work highlights the practical application of these predictions in the discovery and characterization of new molecular functions, experimentally validating several new genes involved in the motility and defense mechanisms of microorganisms.

Finally, the article demonstrates how this large amount of unknown genetic material, hitherto ignored, can improve the association studies between the microbiome and its environment. In particular, the study reveals how the abundance of some of these new genes varies significantly in the gut microbiome of colon cancer patients. This discovery not only promises to enhance diagnostic techniques, but also opens new avenues for a better understanding of the mechanisms that govern the relationship between the microbiome and health.

The work has involved an interdisciplinary team of CBGP researchers specialized in computational biology (Dr. Jaime Huerta Cepas, Dr. Álvaro Rodríguez del Río, Dr. Joaquín Giner Lamia, Dr. Carlos Pérez Cantalapiedra, Ana Hernandez Plaza, Ziqi Deng and Jorge Botas), as well as several members of the group led by Dr. Emilia López Solanilla (Martí Munar-Palmer, Dr. Saray Santamaría-Hernando and Dr. José J. Rodríguez Herva), in charge of the experimental validations of some of the new microbial genes identified. In addition, this study included the collaboration of the laboratories led by Dr. Luis Pedro Coelho (Fudan University, China), Dr. Sinichi Sunagawa (ETH, Switzerland) and Dr. Peer Bork (EMBL, Germany), involved in the mining of metagenomic data and the prediction of protein structures.

This research has been funded by the CBGP Severo Ochoa Center of Excellence project (UPM-INIA/CSIC), the INPhINIT program of the La Caixa Foundation, the State Research Agency (AEI) and Spanish Ministry of Science and Innovation (MCIN).