A new era is dawning in the field of proteins with the advent of artificial intelligence (AI). An innovative model, named ESM3, is capable of generating entirely novel proteins. Just like ChatGPT predicts the next word in a sequence, ESM3 can create protein sequences unknown in nature. This feat is as astonishing as it is ethically challenging.
The esmGPF proteins, generated by the ESM3 model, are unique. According to scientists, 500 million years of evolution would have been needed to create such a protein. Credit: EvolutionaryScale
Researchers have used ESM3 to develop a fluorescent protein that shares only 58% of its sequence with those found in nature. This advancement was published on July 2 in the bioRxiv database. EvolutionaryScale, a company founded by former Meta researchers, also detailed this discovery in a press release on June 25.
The ESM3 model, similar to OpenAI's GPT-4, was trained on 2.78 billion proteins. Researchers extracted information on the sequence, structure, and function of each protein, then asked the model to predict the missing information. This method allows for the generation of new proteins, but its effectiveness must be validated through experimental tests.
EvolutionaryScale has made a reduced version of the ESM3 model available under a non-commercial license, while the full version will be accessible to industrial researchers. This technology could revolutionize various fields, from drug discovery to plastic degradation.
The research team already gained attention in 2022 with EMSFold, a precursor to the ESM3 model, which predicted unknown microbial protein structures. Simultaneously, Google's DeepMind team announced predictions for 200 million proteins, highlighting the limits and challenges of such approaches, particularly the verification of predictions using traditional experimental methods.
The true innovation of the ESM3 model lies in its ability to generate entirely new proteins. Using billions of data points on protein structure, function, and sequence, the model has produced a new fluorescent protein called "esmGPF". Although less bright than its natural counterparts, further iterations have improved its brightness, achieving results unimaginable through natural evolution.