Quantitative assessment of the influence of sound in affective audio-visual elicitations

Polo, Em; Mollura, M; Paglialonga, A; Barbieri, R

Background: Many studies are related to affective elicitations by means of pictures or video-clips while investigations of reactions to affective sounds are relatively few. Although a soundtrack in video-clips is a part of the stimulus itself, it is not very easy to separate the joint effect of visual and auditory stimulation. In this regard, we set out to investigate through the monitoring of the physiological signal of galvanic skin response (GSR) the role of affective sounds compared to affective pictures. The aim of this study was to investigate if affective sounds can elicit emotions with a minor, equal or higher extent with respect to affective pictures with the same valence and arousal levels and if the joint effect of the two senses by using pictures matched with pertaining sounds amplifies emotional responses in terms of sympathetic activation or deactivation. Methods: 10 subjects were tested. All participants underwent pure-tone audiometry on both ears to be sure that participants were normal-hearing (pure tone average thresholds < 20 dB HL). All subjects with visual impairments were able to use glasses. The protocol was divided in 3 phases in which subjects saw only pictures (P), they heard only sounds (S) and they saw images with background sounds (P+S) pertinent with the content of the pictures. The three phases were randomized for each subject in the order of occurrence. For all the duration of the test, GSR signal was monitored by means of the Procomp Infinity device. Pictures from the International Affective Pictures System (IAPS) and sounds from the International Affective Digital Sounds (IADS) were used. After 5 minute of gray screen visualization, each phase consisted of 4 sequences (S1, S2, S3, S4) of pictures, sounds or pictures with sounds at increasing arousal levels (ranging from 3 to 8) and a median value for valence equal to 5. After each phase, a two-minute gray screen visualization was used to let the GSR signal come back to baseline. In all phases, equal sequences (e.g. S1 sounds vs. S1 pictures) were characterized by equal median value of arousal. In the case of pictures with sounds, the arousal and valence levels of matched stimuli were around the same. Each sequence lasted 90 seconds (6 stimuli of 15 seconds). The GSR signal was filtered at 2 Hz with a zero-phase low pass Butterworth of 4th order to remove noise and it was then downsampled from 256 to 5 Hz. A median filter was applied by computing the median GSR of the surrounding samples in an interval of +/- 4 seconds centered on the current sample. The phasic component, linked with sympathetic neuronal activity, was then found by subtracting the median GSR from the raw signal. After finding peak onsets (amplitude > 0.01 ?S) and offsets (0 ?S < amplitude) on the phasic signal, GSR peaks was found on the raw signal between each onset and offset occurrences. Four basic but relevant features were extracted from the signal in each sequence: the average amplitude of GSR peaks, the number of peaks, the average envelope of the phasic component and the average value of the raw signal. The same features were computed for the last 90 seconds of the three baselines and final features were computed as differences between each feature in the sequence and the same feature in the immediately preceding baseline. Results: The Friedman test (non-parametric and pairwise) with Bonferroni correction indicated statistically significant differences only in S4 for the average peak amplitude between P vs. P+S, for the GSR average between P vs. S and P vs. P+S and for the average phasic envelope between P vs. P+S and S vs. P+S. In particular, for all features highest values were found for P+S and lowest values for P. Conclusions: Statistical analysis shows how differences can be assessed among the three phases only under elicitation of high levels of arousal. In particular, pictures seem to provide the least exciting stimuli, whereas sounds alone and sounds and pictures together seem to elicit similar responses. Overall, we can say that there is a higher sympathetic activation in only sounds elicitation than pictures and sounds together, with lower activation for pictures alone. Therefore, sounds seem to have higher emotional power with respect to pictures, but the joint effect of the two systems seem to amplify single effects. Further research is needed to fully validate the analysis.

CNR Institutional Research Information System