Two Faces of Model Evaluation and Validation in Machine Learning

Abstract image of 2 shapes

In the field of data science or more specifically Machine Learning (ML), you might expect that certain foundational concepts and terms are well established and commonly understood. After working in this domain for years, I can point to many examples where this is not the case.

When I moved to ML from another technical domain, I was surprised with the amount of new concepts and especially new meanings for common terms, like “training” a model. The typical ML workflow involves collecting data, training and evaluating the model from that data, deploying a model to produce sound predictions and monitoring the model’s performance in the production state. One particular topic that confused me as I was starting out was trying to figure out what people meant by model evaluation and model validation. Even within the field of practice, these terms can have different meanings.

It means one thing from the point of view of the model developer.

Model developers care about the performance of the model.

Evaluating a model during the training phase is one way that a model developer can build trust in their model. This involves analyzing how the trained model responds to data that it has not seen before. In doing this, the model developer is trying to answer the question. What is a robust machine learning model for this case?

This perspective focuses on the model performance

Evaluation is enabled by segmenting data to build the model and run experiments

Generally speaking, there are two categories of approaches for evaluating a ML model, both involve segmenting available data into sets to run training experiments. The method that appears to be the most common results in segmenting data into training, validation and test sets. The training data set is used to build the initial ML model, the validation data set is used to tune model’s hyperparameters (think knobs and switches) and select the best model. Finally, the test set is used to learn if the model will predict with similar performance as the final model when new or unseen data is fed to it for predictions. A model developer will then use metrics to actually quantify the performance of a ML model based on the type of problem they want to solve for. For example, is it a supervised or unsupervised ML model? If the model is supervised, what techniques were applied (e.g classification or regression)? A quick search on model evaluation and model validation surfaces informative articles and courses from stellar practitioners who provide overviews of performance measures. A few that come to mind are:

From these publications and courses you can learn about common performance metrics used for classification (e.g. accuracy, confusion matrix, log loss, AUC, f-measure) and regression (e.g. root mean squared error, mean absolute error) 

The model developer perspective is useful but it focuses mostly on the model, not the data or the application.

Machine learning models are just one component in a software application.

When I started out in data science, the primary users worked in research and development. AI was not yet in the mainstream. This first description of model evaluation and validation was my foundation of understanding these terms, until one day I started hearing them used in other, much more expansive contexts.

It means a bigger thing from the point of view of the model validator.

Model validation is concerned with the integrity of the data, model and the application, especially in regulated sectors like banking.

As I branched out to learn about AI/ML applications in specific industries, model evaluation and validation began to take on different meanings. The banking or financial services sector topped the list for practice areas with an expanded view of model evaluation and validation. Financial Institutions have established Model Risk Management (MRM) practices that need to be supplemented with new knowledge and insights to meet the demands of AI/ML.

Compliance plays a key role for defining model evaluation and validation in banking.

Model validation takes on a broader meaning when you consider the context of use.

Applying Machine Learning in the banking sector opens up opportunities to use models with more accurate predictive power and insights. Compared to traditional models used in the banking industry, ML models are less transparent and more complex. This makes meeting the stringent regulatory requirements in the banking sector very difficult because various regulations spell out what companies need to do to ensure their models are in compliance. In this case, the definition of model validation extends beyond the model performance to the upstream data and even to the downstream model documentation and usage scenarios.

The person who is responsible for model validation is not the developer

Model validators need to be independent of the model development

Another distinction lies in who is responsible for model validation. In the most basic ML workflows, data scientists or model developers would engage in model evaluation and validation steps in an effort to improve the performance of the model. In the case of model validation in the banking sector, it would be internal (or external) people working to understand how fit the model is for use against specified criteria like the regulations.

These perspectives are different but complementary.

Regulations that are concerned with the context of use for AI/ML models are the key differentiator in these perspectives.

I don’t want to paint a picture that these two perspectives are at odds, they are very complementary. In fact, you can’t get to the more stringent description of model evaluation and validation until you satisfy the view for the model developer. In both the model-developer and the model validator views of model evaluation validation are grounded in a well performing model. The more expansive view gets you an application that is fit for use in the designated application according to one or more regulatory or business requirements. These distinctions are useful to me now because I can understand what perspectives to anchor my practice in. 


Biswas, P. (2021, Jun. 7). AI/ML Model Validation Framework. It’s more than a simple MRM issue. Towards Data Science.

Chauhan, N.  (2020, May 28). Model Evaluation Metrics in Machine Learning. KD Nuggets, 11(9), 907-923.

Intro to Machine Learning Lesson 4: Model Validation. Kaggle.

Khalusova, M. (2019, May 8). Machine Learning Model Evaluation Metrics. Anacondacon.

Mutuvi, S. (2019, April 16). Introduction to Machine Learning Model Evaluation. Heartbeat.

Featured presenter, Olivier Blais, Co-founder and Head of Data Science | Moov AI (2020, Mar. 20). The Comprehensive Guide to Model Validation Framework: What is a Robust Machine Learning Model? ODSC Blog.

Srinivasa C., Castiglione, G. (2020, Dec. 2). Model Validation: a vital tool for building trust in AI. Borealis AI Blog.

Validation of Machine Learning Models: Challenges and Alternatives. Protivity Blog.

ML Engineers: Partners for Scaling AI in Enterprises

credits: author copywrite

Enterprises across many industries are adopting AI and ML at a rapid pace. There are many reasons for this accelerated adoption, including a need to realize value out of the massive amounts of data generated by multi-channel customer interactions and the increasing stores of data from all facets of an enterprise’s operations. This growth prompts a question, what knowledge and skill sets are needed to help organizations scale AI & ML in the enterprise? 

To answer this question, it’s important to understand what types of transformations enterprises are going through as they aim to make better use of their data.

Growing AI/ML Maturity 

Many large organizations have moved beyond pilot or sample AI/ML use cases within a single team, to figuring out how to solidify their data science projects and scale them to other areas of the business. As data changes or gets updated, organizations need ways to continually optimize the outcomes from their ML models. 

Mainstreaming Data Science 

Data Science has moved into the mainstream of many organizations. People working in various line-of-business teams like product, marketing and supply chain are eager to apply predictive analytics. With this growth, decentralized data science teams are popping up all over a single enterprise. For many people looking to apply predictive techniques, they have limited training in data science or knowledge of the infrastructure fundamentals for production-scale AI/ML. Additionally, enterprises are faced with a proliferation of ad-hoc technologies, tools and processes.  

Increasing Complexity of Data 

Having achieved some early wins, often with structured or tabular data use cases, organizations are eager to derive value out of the massive amounts of unstructured data, including language, vision, natural language and others. One role that organizations are increasingly turning to for help to meet these challenges is the Machine Learning Engineer.  

What is a Machine Learning Engineer?

I have observed that as organizations mature in their AI/ML practices, they move beyond Data Scientists toward hiring people with ML Engineering skills. A review of hundreds of Machine Learning Engineer job postings sheds light on why this role is one way to meet the transformative needs of the enterprise. For example, examining the frequency of certain terms in the free text of the job postings surfaces several themes; 


ML Engineers are closely affiliated with the software engineering function. Organizations hiring ML Engineers have achieved some wins in their initial AI/ML pilots and they are moving up the ML adoption curve from implementing ML use cases to scaling, operationalizing and optimizing ML in their organizations. Many job postings emphasize the software engineering aspects of ML over the pure data science skills. ML Engineers need to apply software engineering practices and write performant production-quality code. 


Enterprises are looking for people with the ability to create pipelines or reusable processes for various aspects of the ML workflow. This involves both collaborating with Data Engineers (another in-demand role) and creating the infrastructure for robust data practices throughout the end-to-end ML process. In other words, ML Engineers create processes and partnerships to help with cleaning, labeling and working with large scale data from across the enterprise. 


Many employers look for ML Engineers who have experience with the end-to-end ML process, especially taking ML models to production. ML Engineers work with Data Scientists to productionize their work; building pipelines for continuous training, automated validation and version control of the model.  


Many ML Engineers are hired to help organizations put the architecture, systems and best-practices in place to take AI/ML models to production. ML Engineers deploy ML models to production either on cloud environments, or on-premise infrastructure. The emphasis on systems and best practices helps to drive consistency as people with limited Data Science or infrastructure fundamentals learn to derive value from predictive analytics. This focus on systematizing AI/ML is also a critical prerequisite for developing an AI/ML governance strategy. 

This qualitative analysis of ML Engineering jobs is not based on an assessment of a specific job posting or even one specific to the enterprise I work in, it reflects a qualitative evaluation of general themes across the spectrum of publicly available job postings for ML Engineers – a critical role for enterprises to scale AI/ML. 

What do ML Engineers work on?

  • Big data 
  • Complex problems
  • Driving insights
  • Realizing business value
  • Large scale – projects impacting millions of users
  • Establishing best practices

credits: author copywrite
credits: author copywrite

How do ML Engineers work? 

  • Collaboration with many roles
  • Cross-business and cross-function
  • Software development practices
  • Agile
  • Leveraging best practices

In what teams do ML Engineers work?

Within enterprises, ML Engineers reside in a variety of teams including Data Science, Software Engineering, Research & Development, Product Groups, Process/Operations and other business units.

What industries seek talent to help productionize ML?

While demand for ML Engineers is at an all time high, there are several industries that are at the forefront of hiring these roles. The industries with the highest demand for ML Engineers include; computers and software, finance and banking and professional services. 

As AI and Machine Learning continues to grow and mature as practice in enterprises, Machine Learning Engineers play a pivotal role in helping to scale AI/ML usage and outcomes. ML Engineers enable Data Scientists to focus on what they do best by establishing infrastructure, processes and best practices to realize business value from AI/ML models in production. This is especially the case as data volumes and complexity grows. 

Challenges and Opportunities for AI & ML in Scientific Discovery

credits: Adobe Stock

Looking around, there are many examples where scientific communities are again working to understand how to draw benefits from AI & ML. Today, AI drives cross-disciplinary scientific discovery across many domains like climate change, healthcare, space science and material science. There are increasing examples of AI applied to drug discovery.

Renewed global interest in understanding how AI and ML can improve the process of scientific discovery is taking shape. One of the more visible examples is DeepMind’s AlphaFold project where AI techniques used by a multidisciplinary team accelerated new scientific discoveries. DeepMind developed 3D models of proteins at such a rapid pace and scale that it had a profound impact in solving one of the hardest problems in biology.

What specific challenges are driving the scientific community to explore AI now? Several of these are; 

Running scientific experiments can be time consuming. It takes hundreds of hours to prepare, conduct and evaluate results from experiments especially in cases where there is a broad space of variables to explore.

Difficulties handling large quantities of data. The volume of data produced for scientific discovery is vast and many teams have not fully developed the capacity to analyze this amount of data. Some examples of this include astronomy where telescoping images can capture millions of stars and biology-based discovery where microscopes can capture molecular-scale processes and details.

Complex data. In addition to the volumes of data, scientific communities are often dealing with complex types of data. Research teams may need to capture many parameters like color, shape, size, relationships and other details from advanced sensing devices and instruments.

Lack of metadata. Once all this data is captured, it is rarely usable because of the inability to capture metadata. Missing or incomplete metadata about all the experimental conditions (e.g. temperature, pressure, sample composition and orientation) makes it hard to do neural network training.

Cost and time for data acquisition. Production of data with many laboratory or advanced instruments is costly and time consuming. Many data acquisition instruments are specialized and require significant training, setup and maintenance.

Compute power. Lack of computational power to do complex analysis.

Collaborative science. Another challenge for scientific communities is the need to support research collaborations.

For almost all of these challenges there are multiple tooling and technology developments both with the open source community and enterprise data science platform providers. For example, 

AI can enhance how powerful scientific tools work, regardless of the scale of the subject – solar system or molecule. AI could be used for automating data processing steps like ingestion, cleaning and joining of large data sets. AI can transform the process of experimentation. AI can help scientists improve measurement strategies, essentially pinpointing what samples to explore and what details to capture. Several research projects apply generative AI/ML techniques to help reduce the problem or solution space. Improvements in this area can help distributed teams of scientists to collaborate on data and experiments. 

Automation. One final opportunity where AI can help the scientific process is in automation using robotics. One university lab created a robotic “lab assistant” that automated many mundane laboratory tasks. This assistant even operated during the Covid-19 pandemic, when social distancing prevented in-person lab work.

In Conclusion 

These are just a few examples of how the scientific community can benefit from AI /ML. There is a lot that the data science community can do to help facilitate adoption in this relatively nascent but potentially impactful area.