Rapidly developing generative AI models: Friend or Foe?

28 June 2023

Professor Hashem Koohy - an Alan Turing Fellow in Health and Medicine from MRC Human Immunology Unit - responds to Intel’s recent announcement of Aurora genAI, a new state-of-the-art Generative AI model for science with 1 trillion parameters.

Digital computer brain on circuit board with blue glows and lens flares

Currently, very little is known about Intel’s upcoming AI model for science. However, its announcement raises questions about its potential impact and challenges in science. To address these, it is worth taking one step back and looking into the impacts and challenges of existing models, which are expected to be enhanced due to the greater power of Aurora.

In recent years, the field of generative artificial intelligence (genAI) models has witnessed tremendous advancements, revolutionizing various domains. These models, which are built on powerful neural networks, have demonstrated the ability to generate realistic and creative outputs, from images and text to music, and even entire virtual environments.

Generative Models: development and impact

Very generally speaking, “Generative Models” (GMs) refer to a family of models that can identify patterns and structures within existing data to generate new and original content. Examples include, but are not limited to, Generative Adversarial Networks (GANs), and Variational Autoencoders (VAEs). These models have unlocked new possibilities in data synthesis, image generation, and content creation. By learning patterns and features from large datasets, these models can generate novel samples that resemble real data, pushing the boundaries of creativity and imagination. In science, GMs have the potential to revolutionize multiple areas.

Some potential applications include:

Impact on data augmentation and simulation. GMs enable researchers to augment and synthesize data, which is particularly valuable when working with limited datasets. By generating additional samples, researchers can improve the robustness of their models and gain insight into complex phenomena. This application of GMs offers an opportunity to tackle some highly complex problems, such as understanding biological systems where relevant data is sparse, an issue which in turn is rooted in the difficulties of generating experimental data (high cost, lack of samples from different diseases and healthy tissues, etc.).
Revolutionizing drug discovery and enabling de novo design of drugs or antibody targets. These approaches can accelerate the process of identifying potential candidates for further experiments and optimizations. Models such as AlphaFold, ESMFold, and a plethora of follow-up breakthroughs in the field of protein processing, have proven very successful in generating molecular structures with desired properties.
Image analysis and diagnosis. The field of life and medical sciences is rich in a variety of imaged data, for which genAIs can aid in analyses, interpretation, and diagnosis at speed and scale, as well as at a considerably lower cost. Furthermore, by generating realistic images, these models can assist in training algorithms to identify abnormalities and improve diagnostic accuracy.
Inspire scientific creativity by generating new hypotheses, simulations, and experimental designs. These models can be leveraged to explore uncharted territories and discover patterns that might otherwise have been overlooked.

The list of AI applications even in life and medical sciences can be very long, and I hope that these few represent a flavour of how GMs will be integrated into the way we will be doing science in years to come. However, it is also equally important to note that the development of these transformative approaches poses certain challenges with varying degrees of concern. Here I list a few of the data- and science-focused challenges that I envisage we will be facing in health and medical sciences and must be addressed.

Challenges

Biased data. As with all models in machine learning and data science, GMs learn from the data that have been used during their training. Any bias in the data is not only easily inherited and influential on model performance, but can also be perpetuated and amplified, leading to unfair and discriminatory outcomes. This is likely to be a challenge for life sciences because data from European and white ethnicities are overrepresented compared to other ethnic backgrounds. Ensuring fairness and mitigating bias in GMs is, therefore, crucial to prevent unintended negative consequences.
Data privacy and security. Models such as Aurora genAI with 1 trillion parameters, or even ChatGPT with its hundreds of millions of parameters, require vast amounts of data for training. Protecting the privacy and security of sensitive information (often patients’ data in the case of medical sciences) during data collection, storage, and model deployment is of utmost importance.
Interpretability and accountability. Machine learning models in general and deep neural network models are often considered black boxes due to their complex inner networks. This will be obviously a bigger issue for Aurora genAI because of the significantly higher number of parameters. Understanding how these models generate outputs and being able to interpret their decisions is crucial in scientific, and in particular in medical, domains, where transparency and accountability are so essential.
Reproducibility and open access. As models become more sophisticated, reproducing and validating their results becomes more challenging due to a variety of factors, such as lack of access to substantially big data or the necessary computing infrastructures. The scientific community must address the need for open-source frameworks, code sharing, and standardized evaluation metrics to ensure reproducibility and promote collaboration.

All in all, the rapid development of GMs holds great promise for scientific research and innovation in all disciplines, especially life and medical sciences. From data augmentation and simulation to drug discovery and creative inspiration, these models are transforming various scientific domains. Aurora genAI is likely to speed up and broaden these applications. However, it also highlights the need to address challenges (such as bias, privacy, interpretability, and reproducibility) to ensure the ethical and responsible use of GMs in science. By overcoming these, and other, challenges, the scientific community can harness the full potential of GMs to unlock new discoveries, advance knowledge and address complex challenges in human life and health, while resting assured that the risks and concerns are mitigated and are under control.

You can listen to HASHEM speaking about this post here:

Have an idea for a blog post?

The MRC WIMM editorial team wants to hear from you!

Contact public.engagement@imm.ox.ac.uk with your blog post pitches or submit your pitch here.

Cookies on this website

Rapidly developing generative AI models: Friend or Foe?

Generative Models: development and impact

Challenges

You can listen to HASHEM speaking about this post here:

Have an idea for a blog post?