Gen-AI at scale: From experimentation to industrialisation

This final article of the Generative-AI (Gen-AI) at scale series looks at the importance of fostering a culture of experimentation within organisations. Carl Prest, a data and AI specialist at Microsoft UK, explains how this accelerates the pace of innovation and embeds agility and adaptability into the organisational fabric.

As explored in the first article, harnessing the transformative power of Gen-AI in healthcare and life sciences starts with generating innovative ideas. That is then followed by the prototype phase, discussed in the second article.

As organisations go further along their Gen-AI journey, they often discover the power and efficiency of fostering a culture of experimentation. This shift signifies a profound realisation of how swiftly and effectively ideas can be translated into action through rapid prototyping. In fact, some individuals have remarked that conducting experiments within their organisations can be more efficient than engaging in lengthy discussions about potential use cases.

The methods for fostering this experimentation culture vary, tailored to suit each organisation’s unique needs and preferences.

For some, a quarterly business-led hackathon serves as an ideal platform to surface innovative ideas across diverse domains, followed by dedicated time to transform these ideas into tangible experiments. Others opt for a use case committee model, continually defining and prioritising a backlog of ideas that subsequently evolve into practical solutions.

At Microsoft, we would typically suggest organisations follow a process known as the Enterprise LLM Lifecycle. As depicted below, the initial phase involves a developer searching a model catalogue for large language models that align with specific business needs. The developer then uses a subset of data and prompts to explore the capabilities and limitations of each model through prototyping and evaluation, experimenting with different prompts, chunking sizes, vector indexing methods, and basic interactions to validate or refute business hypotheses.

After identifying and assessing the core capabilities of their chosen LLM, the developer moves to the next phase, which focuses on refining and optimising the LLM to better suit their requirements. The final phase involves transitioning the LLM from development to production. This includes deployment, monitoring, incorporating content safety systems, and integrating with continuous integration and continuous deployment (CI/CD) processes. This final output can then be put through the feedback cycle, which, as needed, would prompt additional building and augmentation and, where appropriate, further ideation.

This perpetual cycle of exploration forms the cornerstone of the Enterprise LLM Lifecycle, ensuring that the most promising experiments are nurtured and eventually deployed within the organisation.

LLM Ops

Borrowing heavily from and building on the concepts that are at the heart of ML Ops (machine learning operations), LLM Ops (large language model operations) forms the backbone of enterprise-wide LLM adoption and has emerged as a key consideration as organisations look to operationalise LLMs at scale.

ML Ops is a set of practices that aims to reliably and efficiently deploy and maintain machine learning models in production. It encompasses various tasks, including data pipeline management, model deployment, monitoring, and scaling. These practices ensure that machine learning models can be developed, tested, and delivered to production in a streamlined and automated manner.

Similarly, LLM Ops extends these principles to large language models, addressing the unique challenges they present, such as handling vast amounts of data, ensuring model performance, and managing computational resources. By implementing LLM Ops, organisations can more effectively integrate LLMs into their operations, ensuring robust, scalable, and efficient use of these powerful models, which is crucial for leveraging their full potential across various applications.

With this in mind, there are some essential components of robust ML and LLM Ops delivery:

Data prep: the criticality of data preparation remains paramount in both ML Ops and LLM Ops. Organisations must cleanse, transform, and ensure data accessibility before leveraging it with an LLM. This step underscores the importance of data quality and integrity in machine learning models’ interactions with data.
Training and testing vs. discovering and tuning: while LLMs come pre-built, offering organisations the advantage of readily available models, attention shifts towards prompt engineering and fine-tuning to ensure optimal performance.
Deploying: engineers proficient in deploying applications into production typically handle packaging and deploying LLM models in scalable containers. This deployment enables convenient access to LLMs and prompts via endpoints for inference, streamlining user interactions without necessitating complex infrastructure setup.
Model management and monitoring: monitoring LLM applications focuses on ensuring safe and high-standard model performance in production. Content safety systems play a pivotal role in detecting and mitigating misuse and unwanted content, safeguarding the integrity of LLM applications.

Organisations traverse various stages along the LLM Ops maturity journey, with the most mature enterprises embedding operational excellence and continuous improvement into the management of their LLM applications across the company.

Gen-AI Centre of Excellence

While LLM Ops extends beyond the technical to encompass people and processes, the Centre of Excellence (CoE) model provides a comprehensive framework for scaling Gen-AI successfully.

Direction and governance: a robust CoE should possess a keen understanding of business priorities, leveraging this insight to prioritise use cases effectively. This involves establishing a steering committee to set direction and a governance framework to identify executive sponsors for all use cases and define a sign-off approach.
Skills, knowledge and best practice: the Gen-AI CoE must select and nurture top talent, equipping individuals with the necessary training and skills to excel in their roles. Diverse skill sets, including functional expertise, solution design, program management, and responsible AI advocacy, are essential for CoE effectiveness. The CoE should curate both external and internal best practices, fostering knowledge sharing and adoption across the organisation.
Implementation approach with change management: a tailored implementation approach is pivotal for successful Gen-AI integration within an organisation. Agile product development supported by phased rollouts replaces traditional Big Bang deliveries, facilitating seamless solution adoption.
Responsible AI framework: a robust Responsible AI Framework provides ethical guardrails and principles for AI’s responsible use, ensuring adherence to ethical standards across Gen-AI solutions.
Measuring success: the Gen-AI CoE should establish success metrics aligned with organisational KPIs, whether focused on cost reduction, revenue generation, or other key performance indicators.

Achieving Gen-AI scalability across the organisation poses significant challenges, even for pioneers. Common pitfalls include overlooking the business impact of use cases and struggling to integrate Gen-AI solutions with existing business processes effectively.

The rapid pace of change in the Gen-AI landscape underscores organisations’ need to proactively embrace this transformative technology shift. As Gen-AI continues to evolve, staying abreast of emerging trends and opportunities is paramount to capitalise on its vast potential.

Carl Prest is a data and AI specialist at Healthcare and Life Sciences, Microsoft UK

Gen-AI at scale: From experimentation to industrialisation

Want news like this straight to your inbox?

Related articles

most popular