AI Plumbing: mapping the landscape

Author: Libby Kinsey Company: Digital Catapult Blog

Figure 1: AI Plumbing v1 (logos illustrative, not exhaustive)

It may be surprising to the academic community to know that only a tiny fraction of the code in many machine learning systems is actually doing “machine learning”. When we recognize that a mature system might end up being (at most) 5% machine learning code and (at least) 95% glue code, reimplementation rather than reuse of a clumsy API looks like a much better strategy.”

 — “Machine Learning: The High-Interest Credit Card of Technical Debt” (Google)

Bringing a machine intelligence product or service to market is hard work. It involves a wide range of technologies, tooling and processes in addition to the ‘core’ machine intelligence code to build and deploy AI at scale. But what are they? Here we propose a model to think about these enabling technologies and principles which we collectively call ‘AI Plumbing’ (see note 1).

The diagram below of our proposed AI Plumbing model illustrates the model categories and how they relate to one another. Broadly speaking, categories progress from left to right as the machine intelligence progresses from potential to utility — starting with planning and data sourcing/processing and moving through R&D to product or service deployment. On the other axis, the stack is built up from the physical processors at the bottom through various layers of infrastructure and software abstraction, to the initiatives and processes that are required to build ‘Responsible AI’ at the top.

Figure 2: AI Plumbing schematic diagram

Naturally, there is considerable overlap between categories in practice, and in particular, those offering ‘Machine Learning Platforms’, ‘DataOps’, and ‘MLOps’ services may cover a range of different infrastructure, technologies and techniques in multiple categories.

Responsible AI

Your scientists were so preoccupied with whether or not they could they didn’t stop to think if they should.”

 — Ian Malcolm in Jurassic Park (1993)

Responsible AI techniques, frameworks and initiatives aim to enable developers and companies to build ethical considerations into the heart of their products and services. Sometimes called ‘ethical by design’, we believe that there will be competitive advantage for companies that consider ethical aspects in the design and monitoring of, and communication about, their AI-enabled products and services. Having good visibility over what your system is doing and if it is performing correctly is part of being responsible. The ‘move fast and break things’ credo popularised by recent Silicon Valley technology firms is coming under increasing scrutiny.

However, putting the theory into practice is difficult as this is an evolving space and practical tools lag the theory. That’s why we think it’s important that Responsible AI is included as a layer of the AI Plumbing landscape, and why Digital Catapult has convened an ethics committee to test and develop frameworks and tools for evaluating and addressing the ethical implications of machine intelligence applications.


“If all you have is a hammer, everything looks like a nail”

 — English saying

Whilst AI and machine learning have been applied in many domains and verticals with great success, clearly they are not appropriate or feasible for every problem. Machine learning often has high prerequisites, in particular for the availability of data, and sometimes other approaches can yield similar results with less effort. In the planning stages, we should ask ourselves whether using a particular approach will add value, what resources will be required, how much they will cost, and how long will it take. These questions can be very tricky to answer and require expertise from many different domains. This category seeks to highlight tools and initiatives that can inform the process.

  • Information. Where to go for information about what is state of the art, what is feasible, and who’s doing what. Initiatives like AI Index and AI Progress Measurement track progress against specific tasks and domains, whilst directories such as CognitionX’s market intelligence platform tell you who’s doing what. That’s not to mention the wealth of freely available information from ArXiV, blogs and meetups.
  • Tools, methodologies and know-how. These resources help you to understand what to think about when implementing an AI project, and tricks to help ensure success. Examples are the Machine Learning Canvas, and Andrew Ng’s Machine Learning Yearning book.
  • Benchmarks and comparators. These help to choose the right tools and infrastructure for the job; benchmarks compare platforms, frameworks, specific hardware or cloud options for utility, performance, energy efficiency or cost. One recent initiative in this space is ML Perf.

Data technologies

“More data beats clever algorithms, but better data beats more data.”

 — Peter Norvig

It’s a truism that machine learning solutions require a lot of data. Getting hold of appropriate data, annotating it, processing, exploring and organising it, require lots of different skills and tools, embodied by a thriving ecosystem of data technologies.

  • Source. Acquisition of data from data marketplaces, through scraping, open data initiatives or simulation (e.g. Ocean ProtocolImport.ioKaggle DatasetsGazebo).
  • Annotate. Whether through crowdsourcing, active learning or other approaches, various technologies and platforms exist to label training data (e.g. Figure EightMighty AI).
  • Process. These technologies assist in cleaning, transforming, augmenting and collating data (e.g. EnigmaTrifactaPandas).
  • Explore. Tools for visualising and exploring data can help to identify issues, understand biases, inspire and extract features, and debug model training (e.g TableauZegami).
  • Organise. The database or data ‘engine’ technologies for organising and querying data or knowledge efficiently and scalably (e.g. GraknGeospockCloudera).

Model training and evaluation

“Great predictive modelling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing.”

 — Jeremy Howard

This is the ‘ML code’ category, which covers what happens once appropriate data has been acquired and/or if there is a hypothesis to test. For data-driven approaches, this will involve routines for partitioning the data and possibly for feature engineering . Also included are tasks such as model selection and hyperparameter tuning (e.g HyperoptSigopt), techniques for combining models together, and for evaluating performance and utility (e.g Tensorboard).

Research and experiment management tools are used to automate experiment pipelines, track model performance, and store model/data telemetry (e.g. Comet, ModelDB, Weights & Biases, Quilt). The plethora of tools available ranges from frameworks for self-service (e.g Tensorflow, Pytorch), to machine learning platforms (e.g. Dataiku, Bonsai), through to full automation (known as ‘AutoML’).

Inference and deployment

“Pickles are for Delis, not Software”

 — Alex Gaynor

This category is for the software and standards involved in taking a trained model (these need not be wholly proprietary, they may utilise third-party IP or APIs such as IBM Watson, Agorai or Clarif.ai), and integrating inference into a deployment pipeline, testing, monitoring and maintaining it.

Models may need to be compressed or the code optimised for efficient and economic deployment. They may need to be converted into formats (e.g. ONNX) that promise interoperability across hardware platforms, or into forms suitable for use with specific application platforms (CoreML, algorithmia).

Once in production, models need to be monitored to ensure that they continue to do what they were designed for (and to avoid unintended biases or other consequences). The ability to update a model must be envisaged in the deployment design, whether through whole-model replacement or active learning etc.


Research solutions that provide a tiny accuracy benefit at the cost of massive increases in system complexity are rarely wise practice. Even the addition of one or two seemingly innocuous data dependencies can slow further progress. Paying down technical debt is not always as exciting as proving a new theorem, but it is a critical part of consistently strong innovation.

 — “The ML test score: A rubric for ML production readiness and technical debt reduction” (Google)

Deploying an AI or machine learning solution presents several additional hurdles above and beyond those presented by traditional software engineering. Well-resourced “AI first” companies like Twitter, Uber, and Facebook have developed proprietary sophisticated DevOps environments (DeepBird 2, Michelangelo and FBLearner Flow respectively) for automating, scaling, and optimising the building and deployment of machine intelligence products and services. These services might include tools to handle collaboration, portability, and reliability, and to manage data flows and server requests. Infrastructure configuration, management and monitoring tools (Terraform) belong in this category, along with containers (Docker, Shifter, Singularity), distributed training (Horovod), scheduling (Dask, Polyaxon) and security. Commercial platforms include Peltarion, Seldon, Valohai and ClusterOne, and there are various mature components such as Apache Spark and Kubernetes that are open-source.


“Facebook’s products and services are powered by machine learning. Powerful GPUs have been one of the key enablers, but it takes a lot more hardware and software to serve billions of users”

— John Morris, ZDNet, “How Facebook scales AI” 

This category is for servers, storage and networking technologies and services. These can be rented from cloud providers (who often provide services in other AI Plumbing categories too) or bought. Specialist providers will build workstations, servers and storage solutions specifically for deep learning (Lambda Labs, Pure Storage Blade) or focus on offering machines for rental that are exclusively for these workloads (Floydhub, Crestle).

Infrastructure also includes High Performance Computing, communications protocols like NVLink and Infiniband, and ‘hyper-converged’ compute and storage architectures.


“Developing deep learning models is a bit like being a software developer 40 years ago. You have to worry about the hardware and the hardware is changing quite quickly… ”

 — Phil Blunsom, Oxford University and DeepMind

This layer is for the physical processors used for the data, machine learning or inference workloads. They may be for very specific (ASIC, FPGA) or for more general (CPU, GPU, TPU, IPU) workloads; for training only, inference only, or both; for use in the data centre or at the edge. Machine intelligence has caused a renaissance in silicon chip innovation and may eventually provide commercial opportunity for more experimental quantum, neuromorphic, memristor and optical processing paradigms.

For more in this layer see Digital Catapult’s Machines for Machine Intelligence research report.

What’s next?

We’d like our model to be a useful for machine intelligence practitioners who want to take innovation into production, and therefore invite feedback on the ‘work in progress’ presented here. In particular, does the AI Plumbing landscape make sense? Are we missing something? Where would your company fit in? What would you like to know more about?

Thanks to Anat Elhalal (@anat_elhalal), Marko Balabanovic (@balabanovic), Daniel Staff (@DanielStaffUK ), Daniel Justus (@Daniels_Data), Michaela Muruianu (@Michaela_MM_ ), and Peter Bloomfield (@Pete_Bloomfield) for their comments and suggestions on this post.

Note: We called our landscape ‘AI’ Plumbing, but we acknowledge that it is skewed towards machine learning because that is where the ‘plumbing’ has diverged from traditional software engineering. ‘Responsible AI’ of course applies regardless of technique.