Two Faces of Model Evaluation and Validation in Machine Learning

Abstract image of 2 shapes

In the field of data science or more specifically Machine Learning (ML), you might expect that certain foundational concepts and terms are well established and commonly understood. After working in this domain for years, I can point to many examples where this is not the case.

When I moved to ML from another technical domain, I was surprised with the amount of new concepts and especially new meanings for common terms, like “training” a model. The typical ML workflow involves collecting data, training and evaluating the model from that data, deploying a model to produce sound predictions and monitoring the model’s performance in the production state. One particular topic that confused me as I was starting out was trying to figure out what people meant by model evaluation and model validation. Even within the field of practice, these terms can have different meanings.

It means one thing from the point of view of the model developer.

Model developers care about the performance of the model.

Evaluating a model during the training phase is one way that a model developer can build trust in their model. This involves analyzing how the trained model responds to data that it has not seen before. In doing this, the model developer is trying to answer the question. What is a robust machine learning model for this case?

This perspective focuses on the model performance

Evaluation is enabled by segmenting data to build the model and run experiments

Generally speaking, there are two categories of approaches for evaluating a ML model, both involve segmenting available data into sets to run training experiments. The method that appears to be the most common results in segmenting data into training, validation and test sets. The training data set is used to build the initial ML model, the validation data set is used to tune model’s hyperparameters (think knobs and switches) and select the best model. Finally, the test set is used to learn if the model will predict with similar performance as the final model when new or unseen data is fed to it for predictions. A model developer will then use metrics to actually quantify the performance of a ML model based on the type of problem they want to solve for. For example, is it a supervised or unsupervised ML model? If the model is supervised, what techniques were applied (e.g classification or regression)? A quick search on model evaluation and model validation surfaces informative articles and courses from stellar practitioners who provide overviews of performance measures. A few that come to mind are:

From these publications and courses you can learn about common performance metrics used for classification (e.g. accuracy, confusion matrix, log loss, AUC, f-measure) and regression (e.g. root mean squared error, mean absolute error) 

The model developer perspective is useful but it focuses mostly on the model, not the data or the application.

Machine learning models are just one component in a software application.

When I started out in data science, the primary users worked in research and development. AI was not yet in the mainstream. This first description of model evaluation and validation was my foundation of understanding these terms, until one day I started hearing them used in other, much more expansive contexts.

It means a bigger thing from the point of view of the model validator.

Model validation is concerned with the integrity of the data, model and the application, especially in regulated sectors like banking.

As I branched out to learn about AI/ML applications in specific industries, model evaluation and validation began to take on different meanings. The banking or financial services sector topped the list for practice areas with an expanded view of model evaluation and validation. Financial Institutions have established Model Risk Management (MRM) practices that need to be supplemented with new knowledge and insights to meet the demands of AI/ML.

Compliance plays a key role for defining model evaluation and validation in banking.

Model validation takes on a broader meaning when you consider the context of use.

Applying Machine Learning in the banking sector opens up opportunities to use models with more accurate predictive power and insights. Compared to traditional models used in the banking industry, ML models are less transparent and more complex. This makes meeting the stringent regulatory requirements in the banking sector very difficult because various regulations spell out what companies need to do to ensure their models are in compliance. In this case, the definition of model validation extends beyond the model performance to the upstream data and even to the downstream model documentation and usage scenarios.

The person who is responsible for model validation is not the developer

Model validators need to be independent of the model development

Another distinction lies in who is responsible for model validation. In the most basic ML workflows, data scientists or model developers would engage in model evaluation and validation steps in an effort to improve the performance of the model. In the case of model validation in the banking sector, it would be internal (or external) people working to understand how fit the model is for use against specified criteria like the regulations.

These perspectives are different but complementary.

Regulations that are concerned with the context of use for AI/ML models are the key differentiator in these perspectives.

I don’t want to paint a picture that these two perspectives are at odds, they are very complementary. In fact, you can’t get to the more stringent description of model evaluation and validation until you satisfy the view for the model developer. In both the model-developer and the model validator views of model evaluation validation are grounded in a well performing model. The more expansive view gets you an application that is fit for use in the designated application according to one or more regulatory or business requirements. These distinctions are useful to me now because I can understand what perspectives to anchor my practice in. 


Biswas, P. (2021, Jun. 7). AI/ML Model Validation Framework. It’s more than a simple MRM issue. Towards Data Science.

Chauhan, N.  (2020, May 28). Model Evaluation Metrics in Machine Learning. KD Nuggets, 11(9), 907-923.

Intro to Machine Learning Lesson 4: Model Validation. Kaggle.

Khalusova, M. (2019, May 8). Machine Learning Model Evaluation Metrics. Anacondacon.

Mutuvi, S. (2019, April 16). Introduction to Machine Learning Model Evaluation. Heartbeat.

Featured presenter, Olivier Blais, Co-founder and Head of Data Science | Moov AI (2020, Mar. 20). The Comprehensive Guide to Model Validation Framework: What is a Robust Machine Learning Model? ODSC Blog.

Srinivasa C., Castiglione, G. (2020, Dec. 2). Model Validation: a vital tool for building trust in AI. Borealis AI Blog.

Validation of Machine Learning Models: Challenges and Alternatives. Protivity Blog.

Transformation and Adaptation in Data Science: Emergence of the Citizen Data Scientist

Role Related Research

One trend that I had been cautiously curious about is the emerging Citizen Data Scientist role. Citizen Data Scientist refers to an advanced data analysis professional or data professional who wants to do more than basic data analysis, they want to create or generate models that leverage predictive or prescriptive analytics or machine learning. This individual typically has a primary job function that is outside of the field of statistics and analytics. I conducted a literature review about Citizen Data Scientists and learned that while Gartner created the term several years ago, there are conflicting perspectives about the definition, scope and truthfully about the value of this role (Wilson, 2020). After examining multiple definitions of what a Citizen Data Scientist is, I learned that this role is significant mostly for what it helped me to understand about organizational change and adaptation.

A Closer Look: What Is a Citizen Data Scientist? 

One reference described Citizen Data Scientists as software power users who can do moderate data analysis tasks. They don’t replace expert data scientists. Instead, they use software features like drag-and-drop tools, prebuilt models, and data pipelines to create models without code (GetApp, 2019).

Several references describe Citizen Data Scientists as a new type of business analyst with diverse business responsibilities.  They apply sophisticated analytical tools, and complex methods of analysis (mostly around big data) to improve business results – all without the training or assistance of Data Scientists or IT team members (Blais, 2020).

While there are differences in definitions of the role some commonalities include:

  • Citizen Data Scientists come from Lines of Business
  • Role resides somewhere between a Business Analyst and Data Scientist
  • Performs more complex analysis than a Business Analyst
  • Rely on specialized tools, software and building blocks from more advanced roles

Where do Citizen Data Scientists reside in an organization?

Diagram showing relationship of Citizen Data Scientists to Other Roles
Figure 1: Relationship of Citizen Data Scientists to Other Roles

What are the skills of a Citizen Data Scientist?

Citizen Data Scientist is not a role that organizations are looking to fill from external sources. Most of the references about the role touted that you can’t find job postings for this role. Spoiler Alert: I did find actually one job posting that accurately reflected some of the common definitions of Citizen Data Scientist. But that singular job posting aside, that statement is generally true. Very simply stated Citizen Data Scientists “do cool things with data” (Ghosh, 2021). I learned that there are basically 4 core skills of a Citizen Data Scientist, although various references are not aligned about how involved this role is in the area of model development and coding. Citizen Data Scientists primary and secondary skills include;


  • Work with large data sets (Big Data)
  • Data preparation, algorithms and queries
  • Document the data extraction process
  • Ability to visualize data


  • Knowledge of data models and statistics
  • Coding / proficiency in at least one programming language
  • Modify, create and deploy predictive models
  • Create data science models using advanced and predictive analytics


  • Strong analytical skills
  • Ability to perform fairly complex data analysis
  • Power users of business applications such as (familiarity with spreadsheets)


  • Document their findings and communicate that with business staff, business Analysts, Business Intelligence(BI) and IT leaders

Why all the hype around Citizen Data Scientists? 

Some people may be quick to dismiss this role and the hype around it mostly because they are a bit confused about the title. I initially affiliated Citizen Data Scientist with the term “Citizen Scientists” which meant more of a hobbyist. For these reasons, it did not seem relevant to the primarily large enterprise contexts that I historically work in. However, digging under the hood to learn what factors lead to its emergence, uncovered some interesting findings.

The primary driver for the emergence of this role is the high demand for, and shortage of trained Data Scientists (Maffeo, 2019). Gartner predicted that, ” ‘the number of Citizen Data Scientists will grow five times faster than the number of highly skilled data scientists'” (Patel, 2020). The growth projections and supply of people who may be called Citizen Data Scientists is larger than the pool of Data Scientists. Companies are exploring different adaptations in people, technology, process and other areas to meet those needs (Arora 2021, Tibco, VentureBeat) 

Digital transformation initiatives have impacted every aspect of how organizations do business today. These data-driven changes have led to more and more business leaders turning to Citizen Data Scientists to fill the gap between the demand for data and analytics and the limited supply of skilled data scientists in the market today (Tibco).

Big Data. Behind most of these transformation initiatives is the need to generate insights from large quantities of data. There are cost and time impacts associated with working with bigger and more complicated datasets. A new skill set is needed to meet these challenges. As the data gets more complex and large, it increases the length of time it takes to get data for reports and analysis. In many industries, there may also be changes in reporting requirements that add a layer of complexity to the task of working with Big Data (Chmiel, 2021).

Business Context. Data Scientists don’t always have the business context for their work to have maximum impact. Knowledge of the business context where a Citizen Data Scientist adds the greatest value.

Experimentation. With all this data, enterprise business users will want to experiment and try different hypotheses.

Career Growth. For some people, the idea of progressing towards a Citizen Data Scientist role seems like a more attainable path to becoming a Data Scientist.

What does this role need to succeed in an organization?

In order to meet these complex needs, there are a number of conditions that need to be in place. 

Accessible Tooling. Various sources describe Data Scientists as somewhere between a business user using self-service analytics and a data scientist who is well versed in advanced analytics. Given the skill limitations, Citizen Data Scientists need augmented or new self-service tools to do big data analytics or augmented analytics. Several resources refer to the need for self-service “point-and-click” or “drag-and-drop” tools. New tools need to be easy to use, taking into account the lack of skills such as coding, statistics and automation (pipelines). Additionally, these tools need to make accessing data easier. With these new tools, developers need to be more conscious of the human-computer interaction requirements. 

Collaboration. Because Citizen Data Scientists are primarily in the lines-of-business, they can benefit by working closely with more formal Data Scientists in the organizations. Data Scientists will continue to work on advanced analysis and statistics. Additionally, they can create processes, tools and infrastructure to support Citizen Data Science (Sakpal 2021, Tibco).

Invest in Training. Organizations need to invest in reskilling or upskilling people who take on Citizen Data Scientist roles. Based on the lack of agreement on the primary and secondary skills, organizations will need to do a thorough skills assessment to understand the full scope of the skills. Additionally,  the Data Scientist Role will need to include process development for Citizen Data Scientist and an approach to validate the quality (QA) of the Citizen Data Scientist’s outputs. and QA of the CDS outputs. Because of the organization-specific differences, Citizen Data Scientist training can not support a one-size fits all approach. In exploring several of the companies that surfaced during this literature review, I learned that a number were investing in training initiatives.  

Organizational Policies. Because of the changes in people, technology, process – organizations need to be deliberate and establish policies to make this new model of data science work (Blais). For example, there needs to be appropriate levels of transparency and sharing as the numbers of people doing aspects of data science increase (Tibco, Chmiel 2021).

Business Leadership. For business leadership, it’s important for them to be aware of these changes and to deliberately create the conditions to help Citizen Data Scientist work (SAS).

What are the impacts and benefits of cultivating Citizen Data Scientist Roles?

Benefits to Individuals

 At least one author believes that this is a viable career choice for someone that wants to get into Data Science without the “time and expense” of an advanced degree (Arora, 2021).

Benefits to Organizations

Gartner predicted that in 2019, this role would have a larger impact on businesses than traditional Data Scientists (Datarobot). Experienced or skilled data scientists will be able to focus on more complex problems. Smarten talks about data democratization initiatives which can “optimize the time and resources of Data Scientists and improve productivity, empowerment and accountability for business users” (Smarten Blog, 2021). The emergence of the CDS role enabled the functional areas to draw the most benefit from data given the limited DS resources and to do that in a way that enables IT to maintain the most appropriate level of process and security over the systems.

I mentioned earlier that company policies and relationships will need to change. This will also impact the IT organization who has to make the bulk of policies and procedures. Security and governance policies will be more critical.

Benefits to Industry Overall

As ML gets more simplified you will see more of the companies who have not started their AI/ML journey applying Data Science and machine learning. 

What does this all mean?

I explored some of the implications for individuals, organizations and even industries but what does this mean in the context of UX? Taking the time out to understand a user role that is outside of my normal scope was a valuable exercise. It helped me to bust some biases I had that prevented me from seeing past the title. It also highlighted the critical role that HCI and UX can play in the enterprise context. In this exploration of the Citizen Data Scientist role, I learned that perhaps the confusion about the title / role was a signal of important transformations and corresponding adaptations that organizations need to make.

How AI Transforms the User Experience – Personalization

I have worked at the intersection of UX and AI for several years and my focus has been on understanding the enterprise side of the equation. Even with this focus, there is a great need to understand and plan for the total experience of those ultimately impacted by this sweeping technology, not just the enterprise developers. Several designers who have written about the UX of AI spend their time focused on interaction patterns and the impacts of AI on the design profession, with less insight on broader aspects of the human element. This is what I would like to address more. But first some context.

What powers AI?

AI already impacts daily life in many ways, often unknown to most people. AI is a driving force for some of the largest companies across the globe. Whether you are receiving a price on an online purchase, viewing a list of results from a search engine, ordering a shared ride, getting approved for an online transaction or finding a new song through a recommendation — AI is there influencing or driving many parts of those daily interactions.

AI algorithms are fed by massive amounts of data to find patterns that enable your interactions with various companies. Through this complex network of data and algorithms AI removes limits in learning. People then apply this data and algorithms to interesting goals. For example, some data science researchers are exploring ways to use AI to automate user research. They experiment with techniques like understanding sentiment, classifying feedback  — with limited success deriving truly meaningful and actionable impacts.

What is AI-enabled personalization?

At the highest level, two of the leading drivers for AI adoption are delivering a better customer experience and helping employees to get better at their jobs. Some AI futurists believe that “consumers will demand even more customized experiences –  giving rise to hyper-personalization and greater customer experience within the e-commerce sector” (Weissgraeber, 2021). AI makes it possible to realize something that has eluded marketers and product owners for years. It makes it possible to have large scale personalization, true one-to-one experiences for all. Note that this is very different from the earlier concept of mass-customization of the early 2000’s which focused on the user agency to choose a design or product configuration that they desired.

What are some examples of AI-enabled personalization?

Here are some current and future examples of personalization and the reasons they need a stronger focus on the user experience:

Retail and eCommerce – AI will continue to empower “hyper-personalized” ecommerce experiences. Companies like Salesforce realize this level of personalization by applying personalized recommendations and by providing tools that enable web developers to customize the layout of content that users see. AI in design enables user-centered design, to “an extreme level of granularity” – concept of design for every single person (personalization at scale). 

Entertainment – One example in media and entertainment is how Netflix combines user behavior modeling with recommendations to offer each viewer a personalized viewing experience. AI provides for more efficient classification, tagging and recommendations of existing and newly generated content like online videos (Rao, 2017). 

Dialog interfaces are an interesting case of an AI enabled solution that are supposed to drive better user experiences in customer service. While this is a complex area dominated by data scientists, people recognize the need to focus on basic UX principles and people-centered design, such as giving the user a way to bypass an automated system that is not working out and reducing the user’s cognitive load. While this is a noble goal, most deployments of dialog Interfaces are centered on driving cost benefits to the service provider versus driving real beneficial user experiences.

Healthcare, Personalized Medicine and Precision Medicine. In the comparatively AI-nascent field of healthcare, the future prospects are for diseases are more quickly and accurately diagnosed, drug discovery is sped up and streamlined, virtual nursing assistants monitor patients and big data analysis helps to create a more personalized patient experience (Thomas, 2019).

These are just a few examples of how AI-enabled personalization is impacting life today and will in the future. While the proposed benefits are many, there are tremendous risks that make an emphasis on the human element of this powerful technology even more critical.

What are the risks of increased AI-enabled personalization?

Privacy sacrificed for convenience. The convenience of more personalized experiences requires even greater access to personal data.  This raises the tremendous risk of encroaching on personal privacy by the mixing of data and techniques.

Security. With increased access to personal data there is always the looming risk of security breaches related to the massive amounts of data that needs to be accessed and stored.

Loss of Personal Agency or Control. Some designers focus on hiding the AI from users as a way to minimize exposure to complexity. Lack of visibility makes it hard to understand or challenge the outcomes. This raises the question: Does too much personalization inhibit discovery? Will a loss of that skill for discovery inhibit people’s opportunity to experience serendipity?

Bias and Discrimination. We already know that companies leverage third-party cookies and complex algorithms to track users’ online activities and that information is used to serve different prices to different customers. With little commercial guidance on fairness or transparency in pricing and other automated decisioning processes, how  can we ensure people are not penalized based on personal characteristics like gender, race or economic status?

Less Human Interaction more Automation. While AI will enable retail and entertainment experiences that are more personalized, this increased personalization raises the risk of decreasing interpersonal interactions.

What are some future research directions?

I have not yet found a lot of research or good examples of how end users are engaged in identifying the benefits of this new technology. Most user research focuses on improving the interaction with technology, not finding the real benefit to the end user who generally does not have agency in whether or not to use the technology. What is the prospect for personalization driven by AI,  if the benefit to humans is not clear?


Álvarez Sánchez, G. (Date Unspecified) AI + UX: The real value of UX for artificial intelligence. Grupodot Agencia Blog.

Anderson, J., Lee, R. (2018, December 10) Artificial Intelligence and the Future of Humans. Pew Research.

CBC Marketplace.  (2017, November 24) Exposing price discrimination in online shopping. CBC News.

Clark, Josh. (2019, Nov 5) AI is Your New Design Material. Presentation from Amuse UX Conference.

Corby, S. (2021, May 13) How to be competitive in the age of AI. CEO Magazine.

Goswami, D. (2018, Nov 7) How Will Artificial Intelligence Impact the Future? It’s Up To Us. Triple Pundit.

Guszcza, Jim. (2018, January 22) AI Needs Human-Centered Design. Wired Magazine.

Havasi, C. (2019, May 10) Beyond ‘citizen data science’: the need for user-centric AI design. Information Age.

Kolstø, E., Raedler, R.(Date Unspecified) Exploring a UX-Centered AI Design Process for Creating Successful Human and Machine Dialog Interactions. UT Austin.

Rao, A. (2017) Sizing the prize What’s the real value of AI for your business and how can you capitalise? PWC.

Salesforce (2020, October 20) The future of user experience design starts with AI. FastCompany.

Taulli, T. (2019, April 27) Artificial Intelligence (AI): What About The User Experience?. Forbes Magazine.

Thomas, M. (2019, June 8). The Future of AI: How AI will Change the World. Built In.

Verganti, R., Vendraminelli, L., Iansiti, M. Working Paper: Design in the Age of AI.

Weissgraeber, R. (2021, February 18). Four Ways AI And Machine Learning Will Drive Future Innovation and Change. Forbes Magazine.

Zaghdoudi, S. Glomann, L. (Jan 2021) AI-enabled user research. Advances in Artificial Intelligence, Software and Systems Engineering (pp.187-193). DOI:10.1007/978-3-030-51328-3_27