www.ijcrsee.com
37
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
Introduction
It is obvious that different image areas contain different volume of information. Classical experiments
of A. Yarbus (Yarbus, 2013) have made it possible to see that the eyes ignore homogeneous areas of the
image and, on the contrary, the gaze is directed to the most heterogeneous areas.
Starting from the early levels of visual processing, neurons respond precisely to heterogeneities.
So, striate neurons are activated by luminance heterogeneity in their receptive elds (Marat et al., 2013).
However, single luminance gradients are only local heterogeneities. When it comes to the perception of
scenes or objects, salient regions have signicant spatial extent. In this case, the heterogeneity is spatial
modulation of luminance gradients (changes in their contrast, orientation, or spatial frequency).
The optimization of the visual perception implies nding and processing the most informative parts
of the input image. A number of authors have posited that the areas that differs most from the surroundings
are of the greatest interest to the visual system and attract the attention of the observer (Bruce and
Tsotsos, 2009; Marat et al., 2013; Perazzi et al., 2012; Xia et al., 2015). Perhaps, mental representations
of complex visual stimuli are formed by the information from these areas. The importance of nding the
areas of interest determines a large number of studies aimed at nding an algorithm for identifying them
and constructing saliency maps. However, a signicant part of proposed saliency detection algorithms
often is not based on nor considers real brain mechanisms of visual perception (Cheng et al., 2015;
Perazzi et al., 2012; Wu, Shi and Lu, 2012).
The human visual system has tools for detecting spatial modulations of luminance gradients in the
input image. These are the so-called second-order visual lters (Graham, 2011), which act preattentively.
They at a certain spatial interval combine the outputs of striate neurons (rst-order lters) with the same
Recognition of Facial Expressions Based on Information From the Areas of
Highest Increase in Luminance Contrast
Vitali Babenko1* , Daria Alekseeva1 , Denis Yavna1 , Ekaterina Denisova2 ,
Ekaterina Kovsh1 , Pavel Ermakov1
1Southern Federal University, Rostov-on-Don, Russian Federation,
e-mail: babenko@sfedu.ru, alexeeva_ds@mail.ru, yavna@fortran.su, katya-kovsh@yandex.ru, paver@sfedu.ru
2Don State Technical University, Rostov-on-Don, Russian Federation, e-mail: denisovakeith@gmail.com
Abstract: It is generally accepted that the use of the most informative areas of the input image signicantly optimizes
visual processing. Several authors agree that, the areas of spatial heterogeneity are the most interesting for the visual system
and the degree of difference between those areas and their surroundings determine the saliency. The purpose of our study
was to test the hy-pothesis that the most informative are the areas of the image of largest increase in total luminance contrast,
and information from these areas is used in the process of categorization facial expressions. Using our own program that was
developed to imitate the work of second-order visual mechanisms, we created stimuli from the initial photographic images of faces
with 6 basic emotions and a neutral expression. These images consisted only of areas of highest increase in total luminance
contrast. Initially, we determined the spatial frequency ranges in which the selected areas contain the most useful information
for the recognition of each of the expressions. We then compared the expressions recognition accuracy in images of real
faces and those synthe-sized from the areas of highest contrast increase. The obtained results indicate that the recognition of
expressions in synthe-sized images is somewhat worse than in real ones (73% versus 83%). At the same time, the partial loss
of information that oc-curs due to the replacing real and synthesized images does not disrupt the overall logic of the recognition.
Possible ways to make up for the missing information in the synthesized images are suggested.
Keywords: expression recognition, saliency, total luminance contrast, second-order visual lters.
Original scientic paper
Received: November, 15.2022.
Revised: November, 30.2022.
Accepted: December, 04.2022.
UDK:
159.937.5.072
159.931.072
10.23947/2334-8496-2022-10-3-37-51
© 2022 by the authors. This article is an open access article distributed under the terms and conditions of the
Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
*Corresponding author: babenko@sfedu.ru
www.ijcrsee.com
38
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
frequency tuning. First-order lters encode information about the carrier (localization, spatial frequency
and orientation of luminance gradients). The second-order lters are activated when spatial modulation of
the contrast, orientation or spatial frequency of these gradients (envelope) fall within their receptive elds.
Moreover, the higher the modulation amplitude, the stronger their reaction. At the same time, it has been
shown that different second-order lters respond to different modulations (Yavna, 2012). Since orientation
modulations are primarily important for detecting texture boundaries (Solomon and Morgan, 2017), and
spatial frequency modulations are important for detecting surface curvatures (Sakai and Finkel, 1995),
it is fair to consider the lters selective to contrast modulations to be the rst candidate for the role of a
segmentation mechanism for real scenes and objects (Açık et al., 2009; Frey, König and Einhäuser, 2007;
‘t Hart et al., 2013).
The aim of our study was to determine the role of image areas of largest increase in total (non-local)
luminance contrast in visual processing using facial expression recognition tasks. The hypothesis was
that information from these areas of the image is used in categorization.
We chose faces as visual stimuli due to both their high social signicance and multidimensionality,
which implies separate processing of variable and invariant facial characteristics. At the same time, face
detection and identication is characterized by unique speed (Cauchoix et al., 2014; Willis and Todorov,
2006). The same applies to emotion recognition (Willis and Todorov, 2006; Liu and Ioannides, 2010;
Vuilleumier and Pourtois, 2007).
To test our hypothesis, we created gradient operator of total contrast (GOTC), a computer program
that simulates the second-order lters and calculates a map of instantaneous values of the non-local
contrast modulation function over the entire image (Babenko et al., 2021). These maps make it possible
to create stimuli using areas of the raster image with certain modulation values.
To a certain extent, this approach resembles the Bubbles method (Gosselin and Schyns, 2001;
Smith et al., 2005). In both approaches the accuracy of expression recognition is studied when fragments
of the face image are shown to the subjects. The difference is that in the Bubbles method, the fragments
are selected randomly, and in our study, they are selected in accordance with the contrast gain. In
addition, the Bubbles technique involves the preliminary learning of the initial set of faces, so observers
are working with familiar faces, and this changes the range of effective spatial frequencies (Butler et
al., 2010; Lobmaier and Mast, 2007; Smith, Volna and Ewing, 2016). Our approach allows us to use
unfamiliar faces, which does not limit the number of stimuli and brings the experimental procedure closer
to the real conditions of face perception. In addition, the bubbles technique can not be used to answer the
question about the mechanisms for highlighting certain facial features.
Prior to creating stimuli, it was necessary to determine several parameters of the model that
simulates how second-order lters work. First of all, we had to choose the spatial frequency ranges in
which the contrast modulation should be calculated. Since second-order lters were previously found to
form ve spatial frequency pathways that are tuned in 1 octave steps (Ellemberg et al., 2006), we decided
to follow this scheme.
Secondly, it was necessary to select the parameters of the apertures through which the whole
image and its fragments are passed during the formation of facial stimuli. To keep the constant ratio
between the carrier and envelope frequencies, the aperture diameter was reduced by a factor of 2 to
increase the ltering frequency in cycles per image (CPI) by 1 octave, while the ltering frequency inside
the aperture of different diameters remained constant and was equal to 4 cycles per aperture diameter.
Such a ltering frequency was due to the data on the optimal ratio of the carrier and envelope frequencies
in human perception of contrast modulations (Babenko, Ermakov and Bozhinskaya, 2010; Sun and
Schoeld, 2011). Similar psychophysical results were also obtained in the analysis of neuronal responses
in V2 in primates (Willis and Todorov, 2006). Another aperture parameter is the transfer function. Based
on the central subeld prole of the second-order lter the transfer function was set as Gaussian.
Thirdly, the number of apertures at each ltering frequency had to be determined. The entire face
image is described by a single aperture with the lowest ltering frequency (in CPI). We decided that since
at each next step the ltering frequency should double, the number of selected areas should also double.
In this case, the total diameter of apertures remains constant, and the ltering frequency in cycles per
image increases by a factor of 2 at each frequency step.
www.ijcrsee.com
39
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
Materials and Methods
Participants
The experiments involved 179 subjects of both sexes in total, Europeans, aged 18 to 30 years.
All participants had normal or corrected vision and had no history of neurological or psychiatric disease.
The subjects were informed about the upcoming procedure and gave written consent to voluntarily
participate in the experiment. The study was approved by the local ethics committee and was performed
in accordance with The Code of Ethics of the World Medical Association (Declaration of Helsinki).
Equipment
The experimental setup included an x86-64 compatible Ubuntu Linux PC with NVIDIA GeForce
GT 730 graphics and Acer VG271U Pbmiipx monitor. Screen resolution was 2560x1440, frame rate was
60 Hz. The monitor was calibrated with a digital luminance meter in grey scale mode. ACM (Adaptive
Contrast Management) and HDR (High Dynamic Range) functions have been disabled. The luminance
varied from 1 to 225 cd/m2, gamma non-linearity was standard with an exponent of 2.2.
Stimuli
The set of stimulus images of faces with different emotional expression was compiled from open
access databases: MMI (Pantic et al., 2005), KDEF (Lundqvist, Flykt and Öhman, 1998), Rafd (Langner
et al., 2010) and WSEFEP (Olszanowski et al., 2015). For further processing and preparation of the
stimulus material we selected 70 initial full-faced photographs of male and female Caucasian faces with
the expression of 6 basic emotions according to P. Ekman (Ekman, 1992) (fear, anger, sadness, disgust,
surprise, happiness), and a neutral expression. Each emotion was represented by 10 faces (5 male and
5 female). Different faces were used for different expressions.
First, faces from different databases were equalized in average luminance (50 cd/m2) and RMS
contrast, and size-adjusted to a circle of 880 pixels. Then, each initial image was processed using GOTC
that simulates the functioning of the second-order lters set with the same localization and ltering
frequency in full range of orientation tunings. The operator is a concentric area with Difference of Gaussians
prole. The diameter of the center of this area («window») is equal to the width of the surrounding ring.
The ltration frequency in the window was constant and equaled to 4 cycles per window. When the size of
the operator was 2 times reduced, the ltering frequency in cycles per image (CPI) doubled. Thus, for an
image ltered at a frequency of 4 CPI, the window diameter is equal to the size of the entire image. For
an image ltered at a frequency of 8 CPI, the window 2 times decreased and equaled the half the image
size, for a ltering frequency of 16 CPI it decreased by 4 times, for 32 CPI - by 8 times and for 64 CPI - by
16 times. The bandwidth of all lters was the same and equaled 1 octave.
The operator window calculates spectral power of the image ltered at a given frequency in CPI.
The spectral power of all spatial frequencies perceived by a human was calculated in the surrounding
ring and rescaled to average power per 1 octave. The non-local contrast increase in each position was
cal-culated as the difference between the total energy in the center of GOTC and on its periphery. The
operator scans the entire image and builds a two-dimensional map of the contrast gain.
As a result, 5 saliency maps were generated for each initial image (for 5 ltering frequencies).
Then, on each map, the local maxima of the increase in contrast were ranked in descending order of the
am-plitude value. Local maxima were selected, starting from the highest, according to the following rule: 1
position was selected at a ltering frequency of 4 CPI, 2 positions were selected at a frequency of 8 CPI,
4 positions were selected at a frequency of 16 CPI, and 8 positions, and on 64 CPI - 16.
After that, we moved on to creating stimuli. First, each initial image was ltered (with a 10th order
Butterworth lter) in ve one-octave-wide frequency bands with center frequencies of 4, 8, 16, 32, and
64 CPI. Then, a circular aperture with a Gaussian transfer function was placed in the positions previously
selected on the saliency maps. An already ltered image of the corresponding spatial frequency was
passed through it. The aperture diameter was equal to the diameter of the central region of the gradient
operator (at the lowest frequency, the entire image is transmitted; at higher frequencies, progressively
smaller fragments of the image are transmitted).
Facial stimuli were created by combining images transmitted through the aperture from different
spatial frequency ranges (15 different combinations of frequency ranges were used). As a result, for each
initial face image, 15 stimuli were created, consisting of areas of highest increase in non-local contrast.
For experiment 1, stimuli were created in a similar way, consisting of areas of the initial image with the
smallest increase in contrast.
After performing all calculations, the created stimuli were scaled down to 8.5 ang deg. As a result,
www.ijcrsee.com
40
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
the lowest ltering frequency, equal to 0.5 CPD, approximately corresponded to the frequency tuning of
the lowest frequency channel in the human visual system.
Procedure
Prior to the experiment, observers were instructed and looked through the examples of faces
expressing basic emotions. In the experiment, stimuli were presented in a random sequence, and their
duration was not limited. Viewing distance was 70 cm. The observers were tasked with recognizing facial
expressions, choosing 1 of 7 possible responses that characterize emotional expression. The responses
were given verbally. The accuracy of recognition for each type of stimuli was calculated as a percentage
of correct responses.
Statistical data analysis
ANOVA was used for statistical analysis of the results. Pairwise comparison of the percentages of
correct responses by Student’s t-test was carried out in the ANOVA procedure as post-hoc tests performed
with Holm’s correction for multiple comparisons.
Results
Experiment 1. Inuence of the magnitude of the increase in the contrast of the regions forming the
stimulus on the recognition of facial expression
In one of the previous works, it was shown that the greater the increase in the total contrast in
the areas from which the facial stimulus is formed, the more accurately happy (joyful) and neutral faces
are distinguished (Babenko et al., 2021). However, since in the present study it was supposed to use a
signicantly larger number of facial expressions (6 basic emotions according to Ekman and a neutral
expression), we considered it necessary to conduct a repeated experiment in which we compared the
recognition accuracy of 7 expressions. Now we have limited ourselves to two sets of stimuli created from
areas with the largest and the smallest increase in total contrast.
Procedure
Experiment 1 involved 52 observers.
Stimuli were created by combining selected fragments of the initial image in 4 spatial frequency
ranges with peak frequencies of 8, 16, 32, and 64 CPI (Fig. 1). Each of the 7 facial expressions was
represented by 20 stimuli formed from 5 female and 5 male faces (10 images were created from areas
with the lowest non-local contrast modulation, and 10 from areas of highest contrast). A total of 140 stimuli
((10+10)*7) were generated for this experiment.
26 subjects were tasked to categorize facial expression when viewing stimuli created from regions
with the lowest non-local contrast gain. The other 26 observers were tasked similarly with stimuli generated
from regions with the highest increase in non-local contrast. Each subject was presented with 70 stimuli.
One of the possible responses was the “I don’t know” answer.
Data analysis was performed using one-way ANOVA (intersubject, repeated measures). The
independent variable was the amplitude of contrast modulation of the areas that were used for synthesized
stimuli. The dependent variable was the proportion of correct responses in the expression recognition
task.
Results
Experiment 1 revealed a statistically signicant effect of the contrast of the areas that were used for
creating stimuli on the accuracy of expression recognition (F(1,50) = 699.28, p = 0.000, ω2=0.931). The
performance was signicantly higher for stimuli created from areas of the initial image with the highest
increase in non-local contrast (max) compared to stimuli created from areas with the lowest increase in
contrast (min) (Fig. 2).
www.ijcrsee.com
41
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
Figure 1. Examples of stimuli used in experiment 1.
An example of a stimulus created from areas
of the initial face image with the highest contrast gain
(above). An example of a stimulus created from areas
with the lowest gain in non-local contrast (bottom).
The regions used to create stimuli were selected in
the range of spatial frequencies from 5.6 to 90.2 CPI.
Figure 2. Accuracy of expression recognition
depending on the contrast gain in the areas that were used for
creating stimuli. “Min” is for stimuli created from areas of the
initial image with the lowest non-local contrast increase, “Max”
is for stimuli created from areas of highest contrast increase.
The y-axis shows the pro-portion of correct responses.
The obtained results indicate that the information contained in the areas of the face image with the
highest contrast increase in the range of 4 octaves is useful for recognizing expressions and provides
a relatively high accuracy of recognition. In stimuli created from regions with the lowest contrast gain,
emotions are correctly determined only at a random decision level.
Experiment 2. Accuracy of expressions recognition in facial stimuli created using the areas of
highest increase in contrast with different combinations of spatial-frequency ranges
After it was established that the information contained in the areas of the facial image with the
highest contrast gain is useful for expression recognition, it was necessary to understand in which
frequency range this information provides the best result for recognizing a particular facial expression.
The majority of researchers agree that the average spatial frequencies are most important for face
recognition. However, there is a variety of data on different “effective” ranges: 8-16 CPF (Costen, Parker
and Craw, 1996; Gold, Bennett and Sekuler, 1999), 8-13 CPF (Nasanen, 1999), 11-16 CPF (Tanskanen
et al., 2005). Collin et al. (2006) extended this range to 25 CPF. At the same time, the role of the general
conguration in face recognition was emphasized by many studies (eg, Cheung et al. 2008; Leder and
Bruce 2000; Maurer et al., 2002; McKone, 2008). A holistic perception of the face implies its low-frequency
description – lower than 8 CPF (Awasthi et al., 2011; Goffaux and Rossion, 2006).
As for facial expression recognition, many authors also prefer conguration information, and hence
low spatial frequencies, when solving this problem (e.g., Bombari et al., 2013; Calder et al., 2000; Calvo
and Beltrán, 2014; Tanaka et al., 2012; White, 2000). Others, on the contrary, emphasize the role of internal
features of the face and, as a result, higher spatial frequencies (Blais et al., 2012; Royer et al., 2018;
Smith and Schyns, 2009). The fMRI data also contradicts the notion that low frequency information plays
a critical role in the processing of facial expressions (Morawetz et al., 2011). Moreover, C. Deruelle and
J. Fagot provide evidence in favor of the priority of high-frequency information in the task of expressions
categorization (Deruelle and Fagot, 2005). This contradiction in experimental ndings could be caused
by the fact that different emotional expressions are encoded by different spatial frequencies (Kumar
and Srinivasan, 2011; Pourtois et al., 2005; Stein et al., 2014; Vlamings, Goffaux and Kemner, 2009;
Vuilleumier et al., 2003).
Thus, the objective of the second experiment was to determine the frequency ranges for the best
recognition accuracy for each of the basic emotions, as well as neutral facial expressions, created from
www.ijcrsee.com
42
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
areas of highest increase in non-local contrast.
Procedure
Experiment 2 involved 78 subjects.
The stimuli were created using the areas of the initial images with the highest increase in the total
non-local contrast. Fragments of the face were isolated in ve ranges of spatial frequencies with peak
frequencies of 4, 8, 16, 32, and 64 CPI. All possible combinations of adjacent frequency ranges were
used. A total of 1050 facial stimuli were created (10 initial faces (5 male + 5 female) * 7 facial expressions
* 15 combinations of spatial frequencies).
The stimuli were presented in a random sequence. Observers chose one of 7 possible responses
after each stimulus was presented.
Results
In experiment 2, we calculated the accuracy of recognition of all basic emotions and neutral facial
expressions in stimuli created from areas of highest increase in total nonlocal contrast with different
combinations of spatial frequency bands in the stimulus (Table 1).
Table 1
Expression recognition accuracy with different frequency contents of facial stimuli created from
areas of highest increase in nonlocal contrast
* - here and in the following tables the integration of spatial frequency ranges in the stimuli is shown (the central
frequency of the range is in cycles per image)
We began the analysis of the obtained results with an assessment of the accuracy of expression
recognition based on a low-frequency holistic description of the face. To do this, we analyzed the per-
centage of correct responses for those trials when the image of the entire face ltered in the range of 2.8–
5.6 CPI (central frequency 4 CPI) was presented as a stimulus. These stimuli were created by ltering the
initial images at a specied frequency, through an aperture with a Gaussian transfer function, the diameter
of which corresponded to the largest extent of the analyzed image (facial image height). Table 1 shows
that in this case the accuracy of expression recognition was 17.8% (the random decision level was 14.2%
and the condence interval ranges was from 10.76% to 27.86% for the 95% signicance level). At the
same time, our previous ndings indicate that if such facial stimuli are presented in a set of other objects
created in a similar way, the accuracy of face detection reaches 75%. It suggests that low-frequency
information may be sufcient to detect a face, but not enough to differentiate the emotions expressed on it.
This conrms the idea that only low-frequency information is not enough for facial expression recognition
(e.g., Jennings, Yu and Kingdom, 2017).
Taking into account the data conrming the global precedence effect (Goffaux et al., 2011; Peyrin et
www.ijcrsee.com
43
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
al., 2010), we studied how the accuracy of recognition changes with a gradual expansion of the bandwidth,
starting from the lowest frequency range (2.8-5.6 CPI), by adding more and more high-frequency ranges
(function 1 in Figure 3). As expected, expanding the range of spatial frequencies improves the results. The
most noticeable performance increase was observed when expanding the range from 1 to 3 octaves. The
addition of the 5th octave no longer affected the accuracy for this task.
Figure 3. Accuracy of expression recognition with expanding the range of spatial frequencies that
used for the facial stimuli. For function 1, the expansion of the frequency range starts from a frequency of
4 CPI, for function 2 - from 8 CPI, for function 3 - from 16 CPI. On the x-axis is the width of the frequency
band of the stimulus in octaves. The y-axis shows the percent of correct responses.
Functions 1 and 2 in Figure 3 overlap when the bandwidth becomes equal to 3 octaves. However,
the initial increase for function 2 was more signicant. The difference is especially noticeable at a
bandwidth of 2 octaves. If the spatial frequency increment starts from a higher frequency range (11.3-22.6
CPI, the center frequency is 16 CPI), a signicant difference between this curve and the previous ones
arises already for a frequency band of 1 octave (function 3 in Figure 3).
It has been shown that any range of spatial frequencies three octaves wide is sufcient for relatively
efcient (about 70% correct responses) differentiation of expressions in facial stimuli created from areas of
highest contrast gain. The comparison of the obtained functions was performed using two-way Repeated
Measures ANOVA with Greenhouse-Geisser correction (main effects: Band Width (1, 2 and 3 octaves) and
Start Frequency (4, 8 and 16 CPI), as well as their interaction). It revealed that a signicant increase in the
performance with the expansion of the frequency band of the stimuli towards higher spatial frequencies
(F(1.699, 130.852) = 1804.298, p<0.0000, ω2=0.824) depends on the frequency from which the band
expansion begins (F( 1.661, 127.934) = 519.873, p<0.0000, ω2=0.584). Signicantly more information
about facial expression is contained precisely in the range with a central frequency of 16 CPI and 1
octave width, in comparison with other frequency ranges of the same width (Table 2). And the increase in
performance occurs faster when expanding the range, starting from this frequency (F(3.479, 130.852) =
246.979, p<0.0000, ω2=0.472).
Table 2
Comparison of expression recognition accuracy for stimuli with a bandwidth of 1 octave
However, if we track how the accuracy of expression recognition changes with the expansion of the
frequency range not only towards an increase, but also towards a decrease in the spatial frequency, then
we will get a somewhat unexpected result. For different emotions, the optimal direction of the frequency
range expansion is evidently different (Table 3).
www.ijcrsee.com
44
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
Table 3
Comparison of recognition accuracy of different expressions for stimuli with a bandwidth of 2
octaves
Higher accuracy values in comparison pairs are shown in bold.
The table shows that for the happiness and a neutral facial expression, it is really more optimal
to add a higher spatial frequency to the range of 11.3 - 22.6 CPI the information. For the recognition of
emotions of negative valence (fear, anger, sadness), it turned out to be more effective to expand the
frequency range towards lower spatial frequencies. Moreover, this is less typical for anger than for other
negative emotions. At the same time, for disgust and surprise, the expansion in both directions turned out
to be almost equivalent.
Considering that the range with the central frequency of 16 CPI turned out to be the most informative
(see Figure 3), we can assume that information from this range is processed rst. This information may
be sufcient to hypothesize a probable facial expression, and the results of this preliminary analysis
determine the direction of further expansion of the frequency range.
This assumption does not contradict the thesis about the sequential processing of spatial
frequencies from lower to higher ones, but at the same time, it is consistent with the data on the possibility
of exible use of early perceptual representation by top-down control. This allows the visual system to
selectively use different spatial frequencies depending on how useful they are for solving a particular
problem (Flevaris and Robertson, 2016; Oliva and Schyns, 1997).
We then moved on to the main question in experiment 2: what combination of frequency ranges
is most effective for recognizing each of the expressions? The result of this analysis is shown in Table 4.
Table 4
Combinations of spatial-frequency ranges in facial stimuli formed from areas of highest contrast
gain, providing the best result of expression recognition
Higher accuracy values in comparison pairs are shown in bold.
It is shown that for different facial expressions, the optimal combinations of spatial frequencies
in the stimulus differs. So for better recognition of a neutral facial expression and happiness, the full
frequency range, that is, all 5 octaves, is more preferable. To recognize other emotions, a band of 4
octaves is enough. However, for stimuli expressing sadness, the effective range is shifted to a lower
spatial frequency, while for other emotions it is shifted to a higher frequency region. It should also be noted
that for the negative emotions (fear, anger, sadness) the optimum is quite clear (signicant differences
were obtained according to Student’s test), and for other expressions it is not so obvious.
Finding the optimal combination of spatial-frequency ranges for each facial expression allowed us
to move on to experiment 3.
www.ijcrsee.com
45
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
Experiment 3. Testing the possibility of effective expressions recognition in facial stimuli created
with the areas of highest contrast gain.
The results obtained indicate that the information from the areas of the face with the highest contrast
gain is indeed useful for expression recognition. However, the question remains how much the solution to
this problem depends on whether the subject uses all the information about the face, or only information
from areas of highest increase in non-local contrast. To do this, under the same experimental conditions,
it was necessary to compare the accuracy of expression recognition in photographic images of real faces
(unltered) and in faces formed from fragments selected in the optimal spatial frequency ranges for each
emotion.
Procedure
Experiment 3 involved 49 subjects.
Synthesized facial stimuli expressing fear, anger, disgust, and surprise included frequencies of 8,
16, 32, and 64 CPI. Stimuli expressing sadness were created from the ranges with central frequencies
of 4, 8, 16, and 32 CPI. Stimuli with a neutral expression and happiness were created from fragments
identied in the range of ve octaves: 4, 8, 16, 32 and 64 CPI. The set of real face images used as
stimuli did not overlap with the set of initial images used to create the synthesized stimuli. A total of 70
synthesized and unltered facial images were used (10 faces x 7 expressions).
The stimuli were presented in a random sequence. The exposure time was not limited. After
training, the subjects were asked to make a decision on each presented stimulus as quickly as possible
and press the key. Pressing the key removed the image. That way it allowed us to measure the decision
time. Then the subjects gave a verbal response and it was recorded by the experimenter. As before, the
range of possible responses was limited to 7 expressions.
Results
The results obtained in experiment 3 are shown in Figure 4. In general, the average accuracy of
expression recognition was expectedly somewhat higher when perceiving natural facial images (83%
correct responses) compared to synthesized stimuli (73%). For real images, the decision time was also
shorter (by 290 ms on average).
Figure 4. Accuracy of expression recognition in real (continuous line) and synthesized (dotted line)
faces.
For statistical analysis of the obtained data we used a two-way Repeated Measures ANOVA
(main effects: Expression (7 expressions) and Stimulus Type (real and synthesized), as well as their
interaction). It was conrmed that the recognition accuracy of different expressions is different for both
real and synthesized facial stimuli (F(3.284, 157.609)=68.276, p<0.0000, ω2=0.530, Greenhouse-Geisser
corrected). The accuracy of expression recognition for different types of stimulus differs signicantly (F(1,
48)=110.154, p<0.0000, ω2=0.351). The curves from Figs. 4 are also different (F(4.755, 228.233)=8.911,
p<0.0000, ω2=0.101, Greenhouse-Geisser corrected). The last of these differences is determent by the
fact that for disgust, surprise, happiness and neutral expression the recognition accuracy is higher for
real face images, while fear, anger and sadness are actually recognized with the same accuracy as in
www.ijcrsee.com
46
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
synthesized images (Table 5).
Table 5
Comparison of recognition accuracy of different expressions for real and synthesized facial stimuli
Higher accuracy values in comparison pairs are shown in bold.
The accuracy of expression recognition in real and synthesized facial stimuli somewhat differs. At
the same time, real and synthesized faces formed the same sequence of gradual increase in recognition
accuracy in a series of expressions (see Fig. 4). Statistical analysis using rank correlation coefcient
showed that these are similar functions (Kendall’s τb (47) = 1, p = 0.000). This may indicate that the natural
course of the information processing is not disturbed when a real face is replaced with a synthesized
image created from fragments with the highest contrast gain. However, there is enough information to
recognize emotions of negative valence in synthesized stimuli, but not enough for recognition of other
expressions. That suggests that in the synthesized facial stimuli some important information is missing.
Discussions
The ability of the human visual system to process huge amount of information in a very short time is
determined by the ability to nd “useful” areas in the input image. This step can be based on the search for
spatial heterogeneities in the image using the second-order visual mechanisms. To simulate the operation
of these mechanisms and to test the usefulness of the information extracted by them in the expression
recognition, we created the gradient operator of total non-local contrast (GOTC). Two variables determine
the overall contrast: the contrast of the single luminance gradients and the number of gradients in a given
area of the image. Moreover, the second variable make a greater contribution to the total signal energy.
Therefore, regions of interest rst of all are the areas with the largest accumulation of luminance gradients.
The design of the created operator reects the main properties of second-order visual lters: the
multichannel nature of the second-order mechanism (a set of operators of different sizes); bandpass
ltering of carrier and the certain relationship between the carrier and envelope frequencies (the operator
size has inverse relation with ltering frequency in CPI); opponent organization of the lter, which makes
it possible to encode the amplitude of the contrast modulation (concentric organization of the GOTC);
weighting function of the lter receptive elds (Gaussian transfer function aperture). The stimuli we used
were created using this gradient operator.
In experiment 1, we showed that the recognition of 7 basic emotions in facial expressions has
relatively high level of accuracy when it is based on the information of different spatial frequencies from
areas of highest increase in non-local contrast (about 75% of correct responses). At the same time, facial
stimuli created from areas with the lowest contrast gain turned out to be absolutely ineffective in terms of
solving this problem (recognition accuracy was at a random decision level). Together with the previously
published results (Babenko et al., 2021), this indicated that the informativeness of the image area is
determined by the degree of its difference in the total contrast from the surroundings.
We then analyzed the possibility of using a low-frequency representation of the entire face in
expression differentiation tasks. In previous studies, we have shown that stimuli generated by the operator
with a central area that matched the full size image were recognized as faces in a series of other stimuli
with high accuracy (about 75%). When in experiment 2 the task was transformed and it was required
not only to detect a face, but to differentiate the emotions in facial expressions, the result decreased
signicantly - to about 18% of correct responses (when a random decision level was 14%). This result is
www.ijcrsee.com
47
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
consistent with widely accepted assumption that face processing should be considered as consecutive
steps of face detection and individualization (Comfort and Zana, 2015). However, at the second stage
of the processing the low-frequency description is no longer enough. Higher spatial frequencies provide
additional information about the internal features of the face, which are very important for its congurative
description (Goffaux, 2009; Piepers and Robbins, 2012).
In experiment 2, we studied the accuracy of expression recognition in facial stimuli with different
combinations of fragments isolated in different ranges of spatial frequencies. We found that the most
effective frequency range is the 11.3-22.6 CPI band with a center frequency of 16 CPI. And while this
result is not consistent with the idea of the low spatial frequencies importance in the perception of faces,
it is consistent with the data indicating that the frequencies of the middle range are most important in
identifying faces. It is noteworthy that in this frequency range (11.3–22.6 CPI), the GOTC more often
singled out the eyes and mouth as areas of interest in the initial images, which are known to be very
important for conveying emotionally signicant information.
However, unlike the experiments with Bubbles technique, we did not aim to determine the
independent contribution of each frequency range to expression recognition, since the perception of a
face is not a simple sum of its components (Jack et. al., 2012, but see Gold, Mundy and Tjan, 2012). It
was important for us to determine the range of spatial frequencies for each expression that provides the
best accuracy of recognition.
Our results certainly do not provide an unambiguous answer to the question of how information from
different spatial frequency pathways is combined. Previously published results in this area have also been
somewhat controversial. There is data on that the visual system processes spatial frequencies in a certain
sequence, from low to high (Gao and Bentin, 2011). At the same time, exible top-down selection of spatial
frequency channels can signicantly optimize the visual processing (Flevaris and Robertson, 2016). It is
also impossible to exclude the possibility of simultaneous processing of all frequencies. Considering the
above, our results clearly indicate the frequency range that contains the most useful information about
facial expressions and which would be the most reasonable to start processing with (11.3-22.6 CPI). The
conclusion that this information can determine the strategy for further integration of spatial frequencies is
also supported by the fact that for emotions of negative valence it is more optimal to add information from
a lower frequency ranges, and for other facial expressions from a higher frequency ranges.
Different frequency ranges turned out to be effective for different expressions. For the best
recognition of neutral and joyful facial expressions, all 5 octaves were required. This result is consistent
with the data on neutral facial expression containing a complete set of basic expressions (Lee and Kim,
2008), and that the expression of happiness is encoded by both low and high spatial frequencies (Becker
et al., 2012). Our data showed that in sadness recognition, 4 octaves were enough (without the highest
frequency range). To recognize fear, anger, disgust and surprise, 4 octaves were also enough, but without
the lowest-frequency range.
So, as a result of the experiment 2, we have determined in what ranges of spatial frequencies the
areas of the greatest contrast gain should be extracted in order to provide the best recognition accuracy
of a particular expression. Now it was necessary to make sure that this is exactly the information that is
used by the visual system when recognizing the expression of real faces. To do this, in experiment 3 we
compared the accuracy of recognition of each expression in the perception of the images of real faces
and stimuli formed from the optimal combination of selected fragments. Indeed, synthesized images were
recognized somewhat worse than real ones (73% versus 83%).
It is interesting to note that the decrease in the recognition accuracy for the synthesized stimuli was
not found for the expressions of negative valence. In these cases, these fragmentary images of faces
were perceived with approximately the same accuracy as real ones. Such peculiarity of the recognition
of negative expressions is consistent with the data on that the perception of such emotions is associated
with the activation of special mechanisms (Shaw et al., 2011; Stein et al., 2014; Vuilleumier et al., 2003).
However, this does not dismiss the question of the insufciency of the information contained in the selected
areas for the recognition of other emotions. It became obvious that some of the useful information in the
synthesized stimuli is missing. Probably the same is evidenced by the increase in reaction time. In fact,
this was expected.
Even though choosing the operator parameters we tried to rely on literature data, we had to make
the choice arbitrarily in a number of cases. This concerns the number of areas that stand out in each of
the frequency ranges, for example. An increase in their number, especially at high spatial frequencies,
will be expected to improve the recognition rate. Another aspect that can affect accuracy of expression
recognition is the ltering frequency in cycles per aperture. Previous research suggests that the optimal
carrier-envelope ratio in second-order lters is 1 to 8 (Babenko, Ermakov and Bozhinskaya, 2010; Peng
www.ijcrsee.com
48
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
and Schoeld, 2011). However, this result was obtained in the tasks with modulated textures and not
faces. Obviously, even a slight increase in the ltering frequency (for example, from 4 to 4.5 cycles per
aperture) can improve the accuracy of expression recognition.
The most interesting nding that we would like to emphasize is that numerous studies have shown
that people recognize different expressions with different efciency, and the recognition accuracy for
different expressions form a certain sequence. Fear is recognized with the worst accuracy, and happiness
with the best. In experiment 3, as in previous studies, we found a certain sequence of the increase in
accuracy of expression recognition for images of real faces. And it was repeated with synthesized images
created from the areas of the greatest contrast gain. This may be evidence that the replacement of a real
image by a fragmented one, although accompanied by some general decrease in recognition accuracy,
does not violate the general logic of the processing.
Conclusions
The obtained results indicate that the informative content of image areas can be determined by the
difference between these areas and their surroundings in terms of such a physical parameter as the total
non-local contrast. Moreover, the greater this difference, the higher the informational signicance of these
fragments. This seemingly unexpected result can be explained by the fact that the greatest contribution to
the value of the total contrast is made not so much by the contrast of each single luminance gradients, but
by the total number of gradients in the analyzed image area. And since each gradient is a kind of visual
information unit, the more gradients it contains, the more informative this area would be.
We established that information from the areas of highest increase in contrast is necessary for
facial expression recognition. Moreover, this information is sufcient for recognition of basic expressions
with a very high accuracy.
These areas are characterized by spatial modulation of luminance gradients and they can be
extracted from the input image by second-order visual lters. Thus, these lters are good candidates to
be viewed as mechanism of selecting the areas of interest.
Since the signal at the lter output is proportional to the amplitude of the modulation, those that are
more activated than their neighbors gain an advantage, due to the lateral interaction between the lters.
The locations of these lters form a saliency map, in which priorities for selective attention are distributed
in accordance with the amplitude of the modulation.
At the same time, the lters themselves, drawing attention to certain areas of the image, can actually
play the role of windows through which information from these areas of the visual eld is transmitted to
post-attentive levels of processing.
Thus, the results obtained allow us to draw the following conclusions:
- Information from image areas of highest increase in luminance contrast is necessary and sufcient
for recognition of basic facial expressions.
- The second-order visual lters extract the salient regions of the image, and a signal value at the
lter output determines its priority for attention.
- The receptive elds of the second-order lters act as windows for the attention to extract
information, which is then transferred to post-attentive levels of processing.
Acknowledgements
The study was carried out with the nancial support of the Russian Science Foundation (project
20-64-47057).
Conict of interests
The authors declare no conict of interest.
References
Açık, A., Onat, S., Schumann, F., Einhäuser, W., & König, P. (2009). Effects of luminance contrast and its modications on
xation behavior during free viewing of images from different categories. Vision research, 49(12), 1541-1553. https://
doi.org/10.1016/j.visres.2009.03.011
Awasthi, B., Friedman, J., & Williams, M. A. (2011). Faster, stronger, lateralized: Low spatial frequency information supports
face processing. Neuropsychologia, 49(13), 3583-3590. https://doi.org/10.1016/j.neuropsychologia.2011.08.027
www.ijcrsee.com
49
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
Babenko, V. V., Ermakov, P. N., & Bozhinskaya, M. A. (2010). Relationship between the Spatial-Frequency Tunings of the First-
and the Second-Order Visual Filters. Psikhologicheskii Zhurnal, 31(2), 48-57. Retrieved from https://www.elibrary.ru/
item.asp?id=14280688 (in Russ.)
Babenko, V. V., Yavna, D. V., Ermakov, P. N., & Anokhina, P. V. (2021). Nonlocal contrast calculated by the second order visual
mechanisms and its signicance in identifying facial emotions. F1000 Research, 10, 274. https://doi.org/10.12688/
f1000research.28396.1
Babenko, V., Yavna, D., Vorobeva, E., Denisova, E., Ermakov, P., & Kovsh, E. (2021). Relationship Between Facial Areas With
the Greatest Increase in Non-local Contrast and Gaze Fixations in Recognizing Emotional Expressions. International
Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 9(3), 359–368. https://doi.
org/10.23947/2334-8496-2021-9-3-359-368
Barabanshchikov, V. A. (2012). Ekspressii lits i ikh vospriyatiye [Facial expressions and their perception]. Moscow: Izdvo
«IPRAN» [IPRAS Publishing House]. (in Russ.)
Barabanshchikov, V. A., Hoze E.G. (2013) Vospriyatiye ekspressiy spokoynogo litsa [Perception of expressions of a neutral
face]. Mir psikhologii [World of Psychology], 1:203-223 Retrieved from https://www.elibrary.ru/item.asp?id=18907610
(in Russ.)
Becker, D. V., Neel, R., Srinivasan, N., Neufeld, S., Kumar, D., & Fouse, S. (2012). The vividness of happiness in dynamic
facial displays of emotion. PLoS One, 7(1), e26551. https://doi.org/10.1371/annotation/f0519e8c-f347-4950-b7e8-
3e9cbc3ec2a9
Blais, C., Roy, C., Fiset, D., Arguin, M., & Gosselin, F. (2012). The eyes are not the window to basic emotions. Neuropsychologia,
50(12), 2830-2838. https://doi.org/10.1016/j.neuropsychologia.2012.08.010
Bombari, D., Schmid, P. C., Schmid Mast, M., Birri, S., Mast, F. W., & Lobmaier, J. S. (2013). Emotion recognition: The role of
featural and congural face information. Quarterly Journal of Experimental Psychology, 66(12), 2426-2442. https://doi.
org/10.1080/17470218.2013.789065
Bruce, N. D., & Tsotsos, J. K. (2009). Saliency, attention, and visual search: An information theoretic approach. Journal of
vision, 9(3), 5-5. https://doi.org/10.1167/9.3.5
Butler, S., Blais, C., Gosselin, F., Bub, D., & Fiset, D. (2010). Recognizing famous people. Attention, Perception, &
Psychophysics, 72(6), 1444-1449. https://doi.org/10.3758/APP.72.6.1444
Calder, A. J., Young, A. W., Keane, J., & Dean, M. (2000). Congural information in facial expression perception. Journal of
Experimental Psychology: Human perception and performance, 26(2), 527. https://doi.org/10.1037/0096-1523.26.2.527
Calvo, M. G., & Beltrán, D. (2014). Brain lateralization of holistic versus analytic processing of emotional facial expressions.
Neuroimage, 92, 237-247. https://doi.org/10.1016/j.neuroimage.2014.01.048
Cauchoix, M., Barragan-Jason, G., Serre, T., & Barbeau, E. J. (2014). The neural dynamics of face detection in the wild
revealed by MVPA. Journal of Neuroscience, 34(3), 846-854. https://doi.org/10.1523/JNEUROSCI.3030-13.2014
Cheng, M. M., Mitra, N. J., Huang, X., Torr, P. H., & Hu, S. M. (2014). Global contrast based salient region detection. IEEE
transactions on pattern analysis and machine intelligence, 37(3), 569-582. https://doi.org/10.1109/TPAMI.2014.2345401
Cheung, O. S., Richler, J. J., Palmeri, T. J., & Gauthier, I. (2008). Revisiting the role of spatial frequencies in the holistic
processing of faces. Journal of Experimental Psychology: Human Perception and Performance, 34(6), 1327-1336.
https://doi.org/10.1037/a0011752
Collin, C. A., Therrien, M., Martin, C., & Rainville, S. (2006). Spatial frequency thresholds for face recognition when comparison
faces are ltered and unltered. Perception & psychophysics, 68(6), 879-889. https://doi.org/10.3758/BF03193351
Comfort, W. E., & Zana, Y. (2015). Face detection and individuation: Interactive and complementary stages of face processing.
Psychology & Neuroscience, 8(4), 442. https://doi.org/10.1037/h0101278
Costen, N. P., Parker, D. M., & Craw, I. (1996). Effects of high-pass and low-pass spatial ltering on face identication.
Perception & psychophysics, 58(4), 602-612. https://doi.org/10.3758/BF03213093
Deruelle, C., & Fagot, J. (2005). Categorizing facial identities, emotions, and genders: Attention to high-and low-spatial
frequencies by children and adults. Journal of experimental child psychology, 90(2), 172-184. https://doi.org/10.1016/j.
jecp.2004.09.001
Ekman, P. (1992). An argument for basic emotions. Cognition & emotion, 6(3-4), 169-200. https://doi.
org/10.1080/02699939208411068
Ellemberg, D., Allen, H. A., & Hess, R. F. (2006). Second-order spatial frequency and orientation channels in human vision.
Vision Research, 46(17), 2798-2803. https://doi.org/10.1016/j.visres.2006.01.028
Flevaris, A. V., & Robertson, L. C. (2016). Spatial frequency selection and integration of global and local information in visual
processing: A selective review and tribute to Shlomo Bentin. Neuropsychologia, 83, 192-200. https://doi.org/10.1016/j.
neuropsychologia.2015.10.024
Frey, H. P., König, P., & Einhäuser, W. (2007). The role of rst-and second-order stimulus features for human overt attention.
Perception & Psychophysics, 69(2), 153-161. https://doi.org/10.3758/BF03193738
Frischen, A., Eastwood, J. D., & Smilek, D. (2008). Visual search for faces with emotional expressions. Psychological bulletin,
134(5), 662-676. https://doi.org/10.1037/0033-2909.134.5.662
Gao, Z., & Bentin, S. (2011). Coarse-to-ne encoding of spatial frequency information into visual short-term memory for faces
but impartial decay. Journal of Experimental Psychology: Human Perception and Performance, 37(4), 1051-1064.
https://doi.org/10.1037/a0023091
Goffaux, V. (2009). Spatial interactions in upright and inverted faces: Re-exploration of spatial scale inuence. Vision research,
49(7), 774-781. https://doi.org/10.1016/j.visres.2009.02.009
Goffaux, V., & Rossion, B. (2006). Faces are” spatial”--holistic face perception is supported by low spatial frequencies. Journal
of Experimental Psychology: Human perception and performance, 32(4), 1023-1039. https://doi.org/10.1037/0096-
1523.32.4.1023
Goffaux, V., Peters, J., Haubrechts, J., Schiltz, C., Jansma, B., & Goebel, R. (2011). From coarse to ne? Spatial and temporal
dynamics of cortical face processing. Cerebral Cortex, 21(2), 467-476. https://doi.org/10.1093/cercor/bhq112
Gold, J. M., Mundy, P. J., & Tjan, B. S. (2012). The perception of a face is no more than the sum of its parts. Psychological
science, 23(4), 427-434. https://doi.org/10.1177/0956797611427407
www.ijcrsee.com
50
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
Gold, J., Bennett, P. J., & Sekuler, A. B. (1999). Identication of band-pass ltered letters and faces by human and ideal
observers. Vision research, 39(21), 3537-3560. https://doi.org/10.1016/S0042-6989(99)00080-2
Gosselin, F., & Schyns, P. G. (2001). Bubbles: a technique to reveal the use of information in recognition tasks. Vision research,
41(17), 2261-2271. https://doi.org/10.1016/S0042-6989(01)00097-9
Graham, N. V. (2011). Beyond multiple pattern analyzers modeled as linear lters (as classical V1 simple cells): Useful additions
of the last 25 years. Vision research, 51(13), 1397-1430. https://doi.org/10.1016/j.visres.2011.02.007
Hubel, D. H., & Wiesel, T. N. (1959). Receptive elds of single neurones in the cat’s striate cortex. The Journal of physiology,
148(3), 574-591. https://doi.org/10.1113/jphysiol.1959.sp006308
Jack, R. E., Garrod, O. G., Yu, H., Caldara, R., & Schyns, P. G. (2012). Facial expressions of emotion are not culturally universal.
Proceedings of the National Academy of Sciences, 109(19), 7241-7244. https://doi.org/10.1073/pnas.1200155109
Jennings, B. J., & Yu, Y. (2017). The role of spatial frequency in emotional face classication. Attention, Perception, &
Psychophysics, 79(6), 1573-1577. https://doi.org/10.3758/s13414-017-1377-7
Kumar, D., & Srinivasan, N. (2011). Emotion perception is mediated by spatial frequency content. Emotion, 11(5), 1144-1151.
https://doi.org/10.1037/a0025453
Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D. H., Hawk, S. T., & Van Knippenberg, A. D. (2010). Presentation and validation of
the Radboud Faces Database. Cognition and emotion, 24(8), 1377-1388. https://doi.org/10.1080/02699930903485076
Leder, H., & Bruce, V. (2000). When inverted faces are recognized: The role of congural information in face recognition. The
quarterly journal of experimental psychology Section A, 53(2), 513-536. https://doi.org/10.1080/713755889
Lee, H. S., & Kim, D. (2008). Expression-invariant face recognition by facial expression transformations. Pattern recognition
letters, 29(13), 1797-1805. https://doi.org/10.1016/j.patrec.2008.05.012
Li, G., Yao, Z., Wang, Z., Yuan, N., Talebi, V., Tan, J., ... & Baker, C. L. (2014). Form-cue invariant second-order neuronal
responses to contrast modulation in primate area V2. Journal of Neuroscience, 34(36), 12081-12092. https://doi.
org/10.1523/JNEUROSCI.0211-14.2014
Liu, L., & Ioannides, A. A. (2010). Emotion separation is completed early and it depends on visual eld presentation. PloS one,
5(3), e9790. https://doi.org/10.1371/journal.pone.0009790
Lobmaier, J. S., & Mast, F. W. (2007). Perception of novel faces: The parts have it!. Perception, 36(11), 1660-1673. https://doi.
org/10.1068/p5642
Lundqvist, D., Flykt, A., & Öhman, A. (1998). The Karolinska directed emotional faces (KDEF). CD ROM from Department
of Clinical Neuroscience, Psychology section, Karolinska Institutet, 91(630), 2-2. https://doi.org/10.1037/t27732-000
Marat, S., Rahman, A., Pellerin, D., Guyader, N., & Houzet, D. (2013). Improving visual saliency by adding ‘face feature
map’and ‘center bias’. Cognitive Computation, 5(1), 63-75. https://doi.org/10.1007/s12559-012-9146-3
Maurer, D., Le Grand, R., & Mondloch, C. J. (2002). The many faces of congural processing. Trends in cognitive sciences,
6(6), 255-260. https://doi.org/10.1016/S1364-6613(02)01903-4
McKone, E. (2008). Congural processing and face viewpoint. Journal of Experimental Psychology: Human Perception and
Performance, 34(2), 310-327. https://doi.org/10.1037/0096-1523.34.2.310
Morawetz, C., Baudewig, J., Treue, S., & Dechent, P. (2011). Effects of spatial frequency and location of fearful faces on human
amygdala activity. Brain research, 1371, 87-99. https://doi.org/10.1016/j.brainres.2010.10.110
Näsänen, R. (1999). Spatial frequency bandwidth used in the recognition of facial images. Vision research, 39(23), 3824-3833.
https://doi.org/10.1016/S0042-6989(99)00096-6
Oliva, A., & Schyns, P. G. (1997). Coarse blobs or ne edges? Evidence that information diagnosticity changes the perception
of complex visual stimuli. Cognitive psychology, 34(1), 72-107. https://doi.org/10.1006/cogp.1997.0667
Olszanowski, M., Pochwatko, G., Kuklinski, K., Scibor-Rylski, M., Lewinski, P., & Ohme, R. K. (2015). Warsaw set of emotional
facial expression pictures: a validation study of facial display photographs. Frontiers in psychology, 5, 1516. https://doi.
org/10.3389/fpsyg.2014.01516
Pantic, M., Valstar, M., Rademaker, R., & Maat, L. (2005, July). Web-based database for facial expression analysis. In 2005
IEEE international conference on multimedia and Expo (pp. 5-pp). IEEE. https://doi.org/10.1109/ICME.2005.1521424
Perazzi, F., Krähenbühl, P., Pritch, Y., & Hornung, A. (2012, June). Saliency lters: Contrast based ltering for salient region
detection. In 2012 IEEE conference on computer vision and pattern recognition (pp. 733-740). IEEE. https://doi.
org/10.1109/CVPR.2012.6247743
Peyrin, C., Michel, C. M., Schwartz, S., Thut, G., Seghier, M., Landis, T., ... & Vuilleumier, P. (2010). The neural substrates and
timing of top–down processes during coarse-to-ne categorization of visual scenes: A combined fMRI and ERP study.
Journal of cognitive neuroscience, 22(12), 2768-2780. https://doi.org/10.1162/jocn.2010.21424
Piepers, D. W., & Robbins, R. A. (2012). A review and clarication of the terms “holistic,”“congural,” and “relational” in the face
perception literature. Frontiers in psychology, 3, 559. https://doi.org/10.3389/fpsyg.2012.00559
Pourtois, G., Dan, E. S., Grandjean, D., Sander, D., & Vuilleumier, P. (2005). Enhanced extrastriate visual response to
bandpass spatial frequency ltered fearful faces: Time course and topographic evoked-potentials mapping. Human
brain mapping, 26(1), 65-79. https://doi.org/10.1002/hbm.20130
Royer, J., Blais, C., Charbonneau, I., Déry, K., Tardif, J., Duchaine, B., ... & Fiset, D. (2018). Greater reliance on the eye region
predicts better face recognition ability. Cognition, 181, 12-20. https://doi.org/10.1016/j.cognition.2018.08.004
Sakai, K., & Finkel, L. H. (1995). Characterization of the spatial-frequency spectrum in the perception of shape from texture.
JOSA A, 12(6), 1208-1224. https://doi.org/10.1364/JOSAA.12.001208
Shaw, K., Lien, M. C., Ruthruff, E., & Allen, P. A. (2011). Electrophysiological evidence of emotion perception without central
attention. Journal of Cognitive Psychology, 23(6), 695-708. https://doi.org/10.1080/20445911.2011.586624
Smith, F. W., & Schyns, P. G. (2009). Smile through your fear and sadness: Transmitting and identifying facial expression
signals over a range of viewing distances. Psychological Science, 20(10), 1202-1208. https://doi.org/10.1111/j.1467-
9280.2009.02427.x
Smith, M. L., Cottrell, G. W., Gosselin, F., & Schyns, P. G. (2005). Transmitting and decoding facial expressions. Psychological
science, 16(3), 184-189. https://doi.org/10.1111/j.0956-7976.2005.00801.x
Smith, M. L., Volna, B., & Ewing, L. (2016). Distinct information critically distinguishes judgments of face familiarity and identity.
Journal of Experimental Psychology: Human Perception and Performance, 42(11), 1770-1779. https://doi.org/10.1037/
www.ijcrsee.com
51
Babenko et al. (2022). Recognition of facial expressions based on information from the areas of highest increase in luminance
contrast, International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 10(3), 37-51.
xhp0000243
Solomon, J.A., & Morgan, M.J. (2017). Orientation-dened boundaries are detected with low efciency. Vision Research, 138,
66-70. https://doi.org/10.1016/j.visres.2017.06.009
Stein, T., Seymour, K., Hebart, M. N., & Sterzer, P. (2014). Rapid fear detection relies on high spatial frequencies. Psychological
science, 25(2), 566-574. https://doi.org/10.1177/0956797613512509
Sun, P., & Schoeld, A. J. (2011). The efcacy of local luminance amplitude in disambiguating the origin of luminance signals
depends on carrier frequency: Further evidence for the active role of second-order vision in layer decomposition. Vision
research, 51(5), 496-507. https://doi.org/10.1016/j.visres.2011.01.008
‘t Hart, B.M., Schmidt, H.C.E.F., Roth, C., & Einhäuser, W. (2013). Fixations on objects in natural scenes: dissociating
importance from saliency. Frontiers in Psychology, 4.- Article 455.- 9p. https://doi.org/10.3389/fpsyg.2013.00455
Tanaka, J. W., Kaiser, M. D., Butler, S., & Le Grand, R. (2012). Mixed emotions: Holistic and analytic perception of facial
expressions. Cognition & Emotion, 26(6), 961-977. https://doi.org/10.1080/02699931.2011.630933
Tanskanen, T., Näsänen, R., Montez, T., Päällysaho, J., & Hari, R. (2005). Face recognition and cortical responses show similar
sensitivity to noise spatial frequency. Cerebral Cortex, 15(5), 526-534. https://doi.org/10.1093/cercor/bhh152
Vlamings, P. H., Goffaux, V., & Kemner, C. (2009). Is the early modulation of brain activity by fearful facial expressions primarily
mediated by coarse low spatial frequency information?. Journal of vision, 9(5), 1-13. https://doi.org/10.1167/9.5.12
Vuilleumier, P., & Pourtois, G. (2007). Distributed and interactive brain mechanisms during emotion face perception: evidence from
functional neuroimaging. Neuropsychologia, 45(1), 174-194. https://doi.org/10.1016/j.neuropsychologia.2006.06.003
Vuilleumier, P., Armony, J. L., Driver, J., & Dolan, R. J. (2003). Distinct spatial frequency sensitivities for processing faces and
emotional expressions. Nature neuroscience, 6(6), 624-631. https://doi.org/10.1038/nn1057
White, M. (2000). Parts and wholes in expression recognition. Cognition & Emotion, 14(1), 39-60. https://doi.
org/10.1080/026999300378987
Willis, J., & Todorov, A. (2006). First impressions: Making up your mind after a 100-ms exposure to a face. Psychological
science, 17(7), 592-598. https://doi.org/10.1111/j.1467-9280.2006.01750.x
Wu, J., Qi, F., Shi, G., & Lu, Y. (2012). Non-local spatial redundancy reduction for bottom-up saliency estimation. Journal of
Visual Communication and Image Representation, 23(7), 1158-1166. https://doi.org/10.1016/j.jvcir.2012.07.010
Xia, C., Qi, F., Shi, G., & Wang, P. (2015). Nonlocal center–surround reconstruction-based bottom-up saliency estimation.
Pattern Recognition, 48(4), 1337-1348. https://doi.org/10.1016/j.patcog.2014.10.007
Yarbus, A. L. (2013). Eye movements and vision. Springer. https://doi.org/10.1007/978-1-4899-5379-7
Yavna, D. V. (2012). Psikhoziologicheskiye osobennosti zritel’nogo vospriyatiya prostranstvenno modulirovannykh priznako
[Psychophysiological features of visual perception of spatially modulated features]. PhD Thesis. Rostov-on-Don (in
Russ.)