Measuring Learning: Asking The Right Questions

Measuring Learning: Asking The Right Questions


Less Content, More Impact!

For workplace learning professionals, the Learning-Transfer Evaluation Model (LTEM) provides a framework to align their analytics approach with the depth of learning evaluation they wish to achieve.

Learning-Transfer Evaluation Model For Measuring Learning

The Learning-Transfer Evaluation Model, developed by Will Thalheimer, presents a nuanced approach for measuring and evaluating the effectiveness of workplace learning. In this eight-tier approach, each tier can answer certain questions with confidence, while other answers would be speculative at a minimum (answered with low confidence). LTEM can be instrumental in shaping analytics strategies, enabling analysts to pose relevant questions, measure the right outcomes, and acknowledge the limitations of their metrics.

Why Not Use Kirkpatrick Level 4 Evaluation?

If you’re currently using the Kirkpatrick model and it’s working for your organization, keep doing it. This article may be irrelevant for you. Often, the challenge I’ve seen is not the model itself but the implementation. Namely, relying on level 1 data and assuming a knowledge assessment at the end of a course is a level 2 evaluation.

Learning without transfer (that is, application on the job) is an investment with low returns. Therefore, from early on, we (learning business partners, learning designers, developers, SMEs, etc.) need to focus on what happens after any learning experience. Each role must understand what they can do in their own scope to get to more effective outcomes! That’s the reason I’ve been using LTEM for measuring learning in the workplace: it helps learning designers understand their impact on every single design choice they make along with the message they send, while it also serves as a baseline for stakeholders to understand what we can and cannot answer confidently at each tier (and what data we need to collect to do that).

What Questions Can We Answer Confidently At Each Tier?

Here’s how each tier can guide analytics for measuring learning, and the kind of questions we can confidently answer, as well as those we cannot.

Tier 1: Attendance

At the base of the LTEM is attendance, where analytics can only confirm registration, enrollment, participation, access, or completion. Showing up does not mean paying attention, learning anything, having any intent to apply it, or making any difference on the job.

  • Confidence
    We can determine the reach of our training program, for example. We can compare enrollment with an A/B test, for options such as on-demand and live versions. We can segment enrollment and completion based on regions or leaders. We often use this tier combined with data collected from other tiers to find insights.
  • Limitation
    We cannot infer if the learning event had any cognitive impact or behavioral change, let alone make any predictions on performance.
  • Data
    Our data strategy document defines whether we track users anonymously (count only without identifiable information) or by their ID. If the decision is to anonymize data, we still need to determine if we care about individual tracking, or just aggregated data. Be clear with your stakeholders that we cannot tell, for example, unique counts with confidence, if we’re not tracking individual users.

Tier 2: Activity

Moving beyond mere attendance, the activity tier measures engagement. Engagement, however, has to be clearly defined in the data strategy document because it is one of the most misunderstood labels. Assumptions can be costly later in the project! A learner engages in the following activities related to learning, which can be measured in some way:

  • Measures of attention
    A metric inadequate to validate learning success—because learners may pay attention but not learn.
  • Measures of interest
    A metric inadequate to validate learning success—because learners may show interest but not learn.
  • Measures of participation
    A metric inadequate to validate learning success—because learners may participate but not learn.

In practice, we tend to focus on measuring three types of activities related to learning:

  1. Physical
    What users do (all User Interface interactions fall into here)
  2. Affective
    How users feel about the activity, what emotions they experience.
  3. Cognitive
    How much they learn, reflect, and apply.

Ideally, you design a balance between these three domains. Otherwise, you may end up with “highly-interactive” clicky-clicky-drag-next activities that keep users awake, but they report frustration and no relevant learning. Or, you may end up with the most entertaining video script everyone talks about, yet they may not even remember what they were supposed to learn.

  • Confidence
    We can tell people are awake. We can gauge engagement, and identify which activities capture attention more than others. We can tell what options most users go with (will they download the PDF, or watch the animation?). We can tell if anyone ever visits those “important” resource links your team worked on.
  • Limitation
    We cannot conclude that engagement equates to learning or retention. If users are absolutely bored, it is unlikely they will learn, let alone perform on the job. However, just because they are engaged (especially, if your stakeholder prefers more “fun” activities to score good feedback points), it does not mean they’re learning or they’re planning to apply any of it.
  • Data
    Think Google Analytics! Anything that is an action can be captured in this tier. We use this tier to make data-informed decisions on how to improve learning or performance support content based on actual use. We can determine how time spent on activities matches the design. We can tell where users drop off. This tier can tell you about your “content performance.”

We also use data from this tier in more sophisticated adaptive paths to determine the next steps or personalize the user’s journey. The smallest unit of learning is not completing a course! Any captured data point can be part of a logic that determines the path of a user. We use xAPI for this level of advanced decision-making and predictive analytics.

Tier 3: Learner Perceptions

The learner perceptions tier focuses on subjective feedback related to the learning experience. Surveys and interviews can provide insights into learner comprehension, motivation, and perceived support. There are two ways to collect data at this tier. The traditional Kirkpatrick level 1 approach is very common (which we found interesting, but not helpful in finding actionable insights early on), and there is also the more performance-focused approach (which gives you specific data points on confidence, intent to use, self-efficacy, and anticipated support/barriers).

  • Confidence
    We can ascertain the learners’ confidence, motivation, and perceived support.
  • Limitation
    We cannot ensure that positive perceptions correlate with effective learning or application of skills.
  • Data
    The “traditional” approach often uses a Likert scale where users select their perceived satisfaction, for example, between 1 and 5. This data then gets aggregated and averaged, and dashboards can show a single number as an indicator of success. This approach can lead to low or no decision-making, unless some disaster happens.

The performance-focused approach provides more practical insights that we can directly take to stakeholders, about what to expect in terms of transfer. I strongly suggest experimenting with your version based on Will Thailheimer’s book.

Tier 4: Knowledge

The knowledge tier metrics look at the learner’s ability to recite information. This is one of the most misunderstood concepts in workplace learning practice. Let me illustrate. When you tell the stakeholders that you will design a knowledge check at the end of the module, here’s what they understand by that: “A knowledge check is great because otherwise, learners may not pay attention. It is also good to know that they will know what to do on the job after the training.”

Now, a lot of “knowledge checks” I’ve encountered in my 15+ years working across many corporate learning teams are more about remembering what was said previously on a slide than about checking whether participants would be able to apply the knowledge using the proper skills later on the job. This is one of the reasons we decided to use LTEM.

Simple fact recall during, or at the end of, a learning event is not adequate for measuring learning. Using Cathy Moore’s action mapping along with LTEM, we can explain to SMEs and stakeholders the dangers of the “illusion of learning.” If the fact is crucial, we need to use it in an authentic scenario or task for practice. If it is important but there’s support for recall on the job (like a checklist), then use the support.

  • Confidence
    We can assess immediate recall of information and terminology.
  • Limitation
    We cannot confirm if a user will remember the same fact on the job later, let alone apply this piece of knowledge in practical contexts.
  • Data
    This tier is often associated with true-or-false questions or multiple-choice assessment items. Item analysis can tell you more about the construct of the design to find some weak spots in the content. For example, if there’s a clear pattern of incorrect recall across users (assuming the fact is crucial to be memorized), designers can adjust the course.

Tier 5 and tier 6 is where all learning design should be aiming at for assessment! Sometimes, as a learning designer, you feel you don’t have the opportunity to make huge changes to the already-decided solution agreed with stakeholders. One thing you can always do: move from fact recall to decision-making competence. This is where using a consistent measurement and evaluation framework can already influence design before any learning would happen. What gets measured, gets done. What gets measured, gets designed for.

    • What’s the difference between decision-making and task competence?
      Breaking out decision-making and task competence is important for both learning design and performance support. While these two actions often seem to happen at the same time, there is a huge difference between them.
      I was once working with salespeople across the country on selling a new product. They all had the skills to sell, as they were chosen from the top performers. However, the new product required some adjustment in the execution of their sales pitch. Everyone passed the scenario-based digital assessment that was required before the live event. They all knew what to do and how to say things. And yet, the first round of pitches in real practice showed the difference between knowing exactly what to do (decision-making) and the practice of executing (task).

Tier 5: Decision-Making Competence

This tier examines the learner’s capability to make decisions based on scenarios reflecting real-life situations. Analytics can measure the accuracy and quality of these decisions.

  • Confidence
    We can evaluate the competency in decision-making in controlled scenarios. Users will know what to do and how to do it (in theory).
  • Limitation
    We cannot guarantee that this competency translates to the execution of the decision.
  • Data
    Authentic, job-relevant scenarios can provide meaningful contexts where users make decisions based on the theoretical knowledge they gained and the perceived skills they acquired. The data can be used for tracking individuals throughout their journey, including automatic path suggestions or complexity adjustments. The data can reveal insights about larger patterns (content validation, ambiguity, critical stop, etc.) that can be used for digital coaches to provide individualized sessions or pull a whole cohort together for a critical course correction.

Decision-making competence does not have to stop at the end of a learning event. You can run campaigns over time, with role-based and skills-fine-tuned challenges to detect knowledge and skills gaps before they cause any performance issues.

Tier 6: Task Competence

Task competence goes beyond decision-making to evaluate task execution, either immediately or after a time lag to measure retention. You can think of it as a scale of authenticity, from role-play simulations to monitored task execution on the job.

  • Confidence
    We can verify if users can perform tasks and make decisions in a simulated environment, or remember how to do so after a period on the job. Checking can happen through self-evaluation with a worked example, peer reviews, AI-assisted measurement, or expert feedback.
  • Limitation
    We cannot ascertain the persistence of these competencies in the long term, or their adaptability to changing work situations.
  • Data
    Ultimately, the goal is to get as close as the realistic performance, with authentic tools, processes, and circumstances, while providing scaffolded support. The data insights gained from the process as well as the output (ideally, measured by authentic evaluation criteria) are the closest we can get to predictive analysis and personalized interim support for transfer.

Note that the tiers between 4-6 would be part of Kirkpatrick level 2: learning. The granular breakdown ensures learning designers and SMEs can measure and evaluate learning, adjust, and iterate if needed. However, high-level stakeholders don’t need that level of granularity, so reporting on “learning, behavior, and performance” would be more appropriate.

Tier 7: Transfer

Transfer assesses the application of learned skills to work tasks. It distinguishes between assisted transfer (with support) and full transfer (independent application).

  • Confidence
    We can track the application of skills in the workplace and differentiate between supported and independent applications. This is really important from a managerial or coaching support perspective, because users may need more hand-holding in the beginning to ensure success in the long term. This is especially true for new hires.
  • Limitation
    We cannot measure the holistic impact of these skills on broader work outcomes without further data.

Realistic transfer must be part of the learning design strategy. Assuming that successful completion of training equals behavior change and appropriate, long-term skills growth in performance is simply wishful thinking.

Tier 8: Effects Of Transfer

The effects of transfer measures the broader impact of learning, including consequences on the organization, community, and beyond. Analytics must employ rigorous methods to assess causal relationships and both positive and negative effects.

  • Confidence
    We can identify and certify the wider impacts of learning transfer, using robust and often complex analytical methods.
  • Limitation
    It is challenging to isolate the effects of training from other variables influencing these wide-ranging outcomes. The most effective measurement projects start backward with a clear business problem, followed by supporting performance KPIs and their driving behaviors, followed thereafter by an analysis of barriers to these behaviors, before we ever get to talk about learning, let alone training and content.

Conclusion: Actionable Insights For Measuring Learning

Here are some actionable insights across LTEM tiers:

  1. Adopt multitiered analytics
    Implement analytics at multiple tiers to capture a comprehensive picture of learning effectiveness.
  2. Balance leading and lagging indicators
    Use early tiers for leading indicators of engagement and later tiers for lagging indicators of learning transfer and impact.
  3. Integrate qualitative and quantitative data
    Combine subjective learner perceptions with objective measures of knowledge and task competence.
  4. Measure transfer over time
    Track the application of skills beyond the immediate post-training period to assess long-term transfer. Make sure you also track how much the transfer is supported (partial transfer) or unsupported (full transfer).
  5. Assess impact holistically
    Use sophisticated analytics to evaluate the wider effects of learning, considering the organizational and societal impact. For example, look at team dynamics, transfer rate, or other signs of team-level performance and engagement level changes, rather than just individual skills.
  6. Leverage technology for advanced analytics
    Employ learning analytics platforms that can measure and analyze data across all tiers of LTEM. Use real-time data analytics for adaptive learning or predictive needs.
  7. Communicate analytics clearly
    Present data to stakeholders in a way that reflects both the potential and limitations of the learning interventions with an impact on performance.
  8. Continuously improve analytics practices
    As analytics tools and methods evolve, so should the strategies for measuring learning effectiveness. If you’re comfortable only with learning data that you have control over, start there but make sure you’re clear about the limitations. Once your data literacy and analytical skills grow, you can expand and iterate to include pre- and post-assessment comparisons, effect size analysis, and learning versus performance change correlation/causation data stories.

In the workplace, the ultimate goal is rarely learning, and not even learning transfer. It is doing, and doing it well under specific limitations. That is why showing the value of L&D starts with measuring the right things.

References: