The Consumer Neuroscience Company
05. February 2020

NeuroMethod 3: Stationary testing of ads

Ad testing is one of the first uses of neuromarketing. But how is it done, and how can you use neuro data to boost your campaigns?

One of the hallmarks of neuromarketing is the use of eye-tracking and brain scanning to test ad responses. From this, one needs a critical knowledge of how to interpret the results and then use the results to build better ads and marketing strategies. Here, we provide a walkthrough of how a neuromarketing TV ad test actually works, and how the results are to be interpreted and acted upon.

A brief recap of the advantages

It’s been around 20 years since neuromarketing was defined and used, and although many things have changed since that day, many aspects remain the same as the first days of neuromarketing. Crucially, ad testing in stationary environments has been the hallmark method that neuromarketing vendors have offered, and even in academic consumer neuroscience studies stationary test protocols have been the standard approach.

By “stationary testing” we mean testing that occurs in an environment where test participants are sitting still and facing a monitor. All testing is conducted in this environment, and the participant is often even instructed to not move too much, to avoid loss of data from the eye-tracker or other measures.

A stationary neuromarketing ad test is often performed with an eye-tracking bar under the screen, and with EEG equipment to measure cognitive and emotional responses as they unfold.

The added value of this testing was soon obvious. Neuromarketing offered a toolkit that provided a new and unparalleled level of details into consumers’ responses to ads. While more traditional methods such as surveys, interviews and focus groups could say something about liking, associations, and memory, these methods could not measure responses as they unfolded.

Other methods have been tried, such as the use of a “dial” or a joystick where participants report their level of ad liking second by second. However, this method has failed for several reasons, one of the most dominating reasons being that our introspection of how we feel second by second interferes with true emotional responses. It basically changes how we experience the ad, and it becomes a highly artificial and unrealistic test.

So how exactly is a test performed when we want to test an ad in a more stationary environment, like a TV commercial? Below is a walkthrough of the steps taken, how the results look, and how we can use these results. Here, we will use the 4-power model of ad testing that Neurons has developed with Stanford University. You can read more about this model here.

How a stationary neuromarketing ad test study is run

As shown above, the stationary test consists of the participant facing a screen and seeing ads and other materials. Importantly, as Neurons focuses on having as high ecological validity as possible, we embed the ads as natural breaks when the participant is watching a documentary. We also usually tell the participant a cover story that we are interested in their responses to the documentary/movie, which has been recorded from a live broadcast (hence, ads were included).

The study usually goes through the following steps, which usually takes around one hour:

  1. Recruitment — we focus on recruiting the right sample based on the customer segment. A minimum sample size of 30 participants is used for a mixed-gender sample and a relatively small age range and other variables (e.g., education, affluence). For including more variables such as affluence level and age groups, the sample size must be multiplied by the number of groups. A typical sample is around 120 participants in a study
  2. Greeting and initial test — upon arrival the participant is greeted and a double check is done on their recruitment criteria. From here, they receive full information about the study and sign an informed consent form.
  3. Setup and benchmark — to ensure that the data is valid and reliable, the person is being set up with the eye-tracking and EEG equipment. Here, a set of calibration tasks are also undertaken, such as eye-tracking calibration of accuracy, and EEG signal test and benchmark tasks.
  4. Study execution — ad exposure is typically made in a setting where the participant is looking at other TV contents such as a documentary or episode. Ads are inserted as part of natural breaks, and ads are pseudo-randomized across people to minimize the effect of ad order.
  5. Distraction task — to “cleanse” participants’ minds from recent ads and other contents, a few cognitively demanding tests are undertaken.
  6. Post-study tests — at the end of the study, participants are going through a series of stepwise tests, including
    • NeuroEquity — this test is being used to assess direct brand emotions after ad exposure. It is often used as a part of the calibration during the beginning of the test, which allows us to run a pre-post comparison to understand if the ad has had an impact on brand emotions.
    • free recall (“do you remember seeing any ads today?”), for ad and brand.
    • category cued recall (“do you remember seeing any ads for food today?”), for ad and brand.
    • recognition (“do you recognize this ad from today?”, showing one ad at a time), for ad and brand.
    • liking score (for recognized ads: “what do you think about this ad?”).
    • associations (for all ads: “what words come to mind when you see this ad?”), both for spontaneous associations, and ranking of predefined associations.

Together, these steps are used to gather ad responses in a highly standardized manner. This allows us to measure what people see, how they respond, what they remember, and what they feel afterward.

The ad to be tested: Oral-B

In this test, we will focus on an ad for Oral-B which was used as a filler ad during one of our ad tests in the US. The sample size was 30 participants from a mixed-gender sample.

The ad can be seen below:

While we will focus on neuromarketing scores, we also include data from post-test survey responses. Here, we can see that the ad performs slightly better compared to benchmark data on both self-reported liking and brand emotions:

Ad liking and brand impact data from post-test survey responses.

These data suggest that the ad might perform decently on conscious responses. It will remain to be seen if it also performs well on subconscious responses of attention, emotion, and cognition.

The Stanford-Neurons 4-power model

To better understand and structure ad responses, we will rely on the 4-power model that we have developed with Professor Baba Shiv at Stanford University, and that is being prepared for scientific publication. The original work was to test longitudinal ad responses for Bonnier News over three rounds, and it has been used in multiple large-scale studies across the globe since. The 4-power model is now one of the cornerstones of Neurons’ neuromarketing ad testing work.

The figure below recaps the main four steps in the model:

The steps in the 4-power Stanford-Neurons model.

In the following, we will go through the results of the ad following each of the steps.

Stopping power: What did people see?

The first we do is to see how the ad performs overall

The Stopping power of the ad shows that it performs lower than optimal, including a z score of -1.1 lower than the benchmark. This suggests that audience attention is not optimal and can be boosted through several means.

The ad attention can then be broken down in two ways to dig a bit deeper into the analysis. First, when we can run an Areas of Interest (AOI) analysis, in which we sample the amount of time that people spend looking at critical areas of the ad, such as the brand, product, and slogan. This can be shown as follows:

Visual attention to different regions of the ad, such as the slogan, brand, and product. The graph shows the percentage of participants that see the AOI (orange) compared to the Neurons benchmark (grey) for comparable ads. This shows that either of the logos is only seen by a few percent of the participants, and this is lower then the benchmark. The only thing that receives a decent amount of attention is the text/slogan.

While AOI attention performance is sub-par for this ad, it should be noted that the product/brand name (“Oral B”) is being mentioned many times during the ad, which increases the likelihood that viewers will connect the ad, storyline, and brand.

Finally, for stopping power analysis, we can plot how coherently people in a sample look at the same part of the screen. This Focused Attention (FA) score has been explained in a separate post. Basically, the score is calculated so that when people have a high agreement on where they look, the score is high. When the score is low, it means that people are spread around the screen in a less coherent manner.

As a rule of thumb, a high FA score is necessary when you want people to look at critical information such as a slogan, product, and brand. A low FA score is mostly acceptable if the intention is to create an impression of visual “noise” and ambiguity, such as images from a party, or a busy store environment. In general, the optimal result is a high FA score when the heat map focus is on the intended AOIs.

Let’s see how the ad fares on the FA score:

We can make the following observations:

  • The FA score drops dramatically at the very beginning of the ad. Here, we can see that having a person on the same screen as the brand and unique selling points makes viewers scatter around the field. Attention is much less coordinated than what it could be.
  • Visual attention is very good during the phases where there are only the person and the brand. Most of the attention is on the person, with the occasional brand attention. This is good.
  • When the person points to the product, attention also remains high on the product.
  • In the break session with more technical information, focused attention drops, suggesting that the story is less coherent in driving people’s attention around to the same areas. It is possible that the information provided during the product presentation is too much to process.
  • In the end, the FA score is shifting dynamically, suggesting that certain phases are good at driving a coherent attention response, while other phases are more challenging to viewers.

How can the ad be boosted on attention? Here are a few solutions:

  • reduce the number of items that the audience should pay attention
  • prioritize the message: put things front and center when they matter
  • work on visual saliency, both by increasing the visibility of the items of interest, and reducing things that are less important

With this in mind, we now turn to the next step: how people respond cognitively when they are seeing the ad.

Transmission power: Did they get the message?

We now turn to whether participants show signs of good cognitive processing of the ad. Here, we look at the cognitive load score, which is an indication of the amount of cognitive demand that the ad shows. The score can be summarized as follows:

Transmission power shows a good performance, as viewers are neither overloaded with information or bored by it. This shows a likelihood that when the ad is attended it is likely to be understood and remembered. If there is a good link between the message and the product/brand then this will be made more easily.

Oral B Power performs overall well on transmission power, showing that people will process the narrative and have a higher likelihood of memory. The cognitive load is very well in the mean range and the motivation has 4 positive peaks. Leading to interest and brand impact. The lows can be driven by specific scenes as seen with the brushing teeth.​


In general, cognitive load shows a good dynamic response in the “optimal area” between 55 and 75. This suggests that viewers understand the message. There are no critical times in which the response crosses the “overload” line of 75. Early on there is a peak in cognitive load when the main character speaks about her hygienist, and there is an accompanying reduction in motivation. This suggests that this element is not needed in the ad, and that it can potentially be taken out.

Cognitive load also increases during the presentation of the selling points: effectiveness and safety. This means that these associations are likely to be remembered and connected to the product and brand.

Persuasion power: Did the ad engage?

Overall, for Persuasion power, we get these general results:

Overall, persuasion power is positive and over the benchmark average. This suggests that the ad produces positive emotions and engagement. Viewers are likely to build positive associations with the brand and product.

When looking at the over-time plot video, we can see the following results:

Oral B Power has a positive persuasion power above-average showing a high likelihood of impact and interest. Arousal has some low drops which can lead to lower brand recall. The recommendation would be to cut the ad already at 23,50 seconds as this would lead to a high motivation impact on the brand. Hereafter, we see a significant drop before the final scenes.

Another interesting finding is that the ad produces a significant drop in motivation around 14 seconds, when the brushing is shown. Here, what is shown is dirt on teeth, and a negative response in viewers can actually be interpreted as a good response. After all, we want the audience to resonate with the problem. Here, a slight negative response when a problem is presented should be seen as a good result. That said, viewers produce only a neutral response to the solution, suggesting that this can be optimized. Here, the ad could have used more time to distinguish between the problem and the solution, giving viewers a better way to respond positively to the solution.

Locking power: What do they remember?

For testing ad and brand memory, we can summarize the findings as follows:

Locking power shows a good performance, suggesting that the audience on average remembers the ad and possibly the brand. This score is around the Neurons benchmark score, which still suggests that a higher performance is possible.

A break-down into the different sub-scores can be seen below:

Memory subscores show how the Oral-B ad (purple) performs lower than the benchmark (grey) on free recall — in other words, only 21% of participants remember the brand spontaneously. However, ad recognition is good (87%) and brand recognition is also good (89%). This suggests that the visual materials of the ad is easy to recognize and relate to the Oral-B brand.

Oral B Power performs like an average TV ad when it comes to locking power, showing that people will have a good chance of memorizing the ad. ​Notably, free recall is showing lower than average. This means that the brand did not have enough impact during the ad, and failed to be recalled. We also saw​ from the EEG scores that the “brand logo” shows a negative motivation and the overall ad shows a lower arousal score.

The recommendation here would​ be to make the brand more salient on the product and cut the ad at 23,50 seconds as this would leave it on a high positive note. ​This also fits with the recommendations from the other metrics.


As we have seen, the Oral B Power ad drives a higher transmission power, suggesting that people are indeed processing and understanding the ad narrative. The ad shows a higher persuasion power driving an emotional interest. This leads to a higher liking and brand impact than average. ​Overall, the ad also generates an average locking power showing that people would have an impact on the ad and also remember the ad, but this can still be optimized. ​

A few summarizing points:

  • Stopping power: Oral B Power scores below average on stopping power due to a split-screen approach with the main character on the left and text information on the right. This leads people to follow different information through the ad. It shows to work well as the emotional and cognitive score is above average. The stopping power never drops into ”low” which shows that the attention spread is never too big.
  • Transmission power: Oral B Power performs well on transmission power, ensuring that people will process the information. The ad does have a few points where the cognitive load is close to ”high”, but it never goes into overload which is positive. The ad has a high likely hood of generating ad recognition does to the higher transmission power.
  • Persuasion power: The ad has an above-average persuasion power which will lead people to impact and interest. This is also seen in the brand impact stated measures. The ad has a neutral ending and would, therefore, be recommended to cut the scene earlier, e.g. at 23,50 seconds as this is where the high peak on motivation and arousal is found. ​
  • Locking power: Finally, the ad performs close to average on locking power. The only factor decreasing the score is free recall which could be boosted by both adding more saliency to the brand on the product and also by ending the ad on a high note of motivation, arousal, and cognitive load. ​

Do you want to read the full report? You can download it from here as a PDF (3 MB) and as a PPTX (13 MB)file.

Interested in testing your own ad or campaign materials? Reach out to us!