A Computational Approach for the Understanding of Stochastic Resonance Phenomena in the Human Auditory System

— Stochastic resonance (SR) is a nonlinear phenomenon by which the introduction of noise in a system causes a counterintuitive increase in levels of detection performance of a signal. SR has been extensively studied in different physical and biological systems, including the human auditory system (HAS), where a positive role for noise has been recognized both at the level of peripheral auditory system (PAS) and central nervous system (CNS). This dualism regarding the mechanistic underpinnings of the RS phenomenon in the HAS is confirmed by discrepancies among different experimental studies and reflects on a disagreement about how this phenomenon can be exploited for the improvement of prosthesis and aids devoted to hypoacusic people. HAS is one of the human body’s most complex sensory system. On the other hand, SR involves system nonlinearities. Then, the characterization of SR in the HAS is very challenging and many efforts are being made to characterize this mechanism as a whole. Current computational modelling tools make possible to investigate the phenomena separately in the CNS and in the PAS, then simplifying the analysis of the involved mechanisms. In this work we present a computational model of PAS supporting SR, that shows improved detection of sounds when input noise is added. As preparatory step we provided a test signal to the system, at the edge of the hearing threshold. As next step we repeated the experiment adding background noise at different intensities. We found an increase of relative spike count in the frequency bands of the test signal when input noise is added, confirming that the maximum value is obtained under a specific range of added noise, whereas further increase in noise intensity only degrades signal detection or information content.


I. INTRODUCTION
While initially studied as a possible mechanism to explain long term climatic variations [1], stochastic resonance (SR) has been extended, over the last 30 years, to many contexts involving non-linear systems. SR phenomenon is today being applied in many fields, from physics to electronics, e.g., for the improvement of analog/digital conversion equipment [2] and the realization of innovative nanodevices able to transform electrical noise in useful energy [3]. There are also experimental evidences for a role of SR in the functions performed by the nervous system, including the detection of weak signals, synchronization and coherence between groups of neurons, playing an important role in some sense organs including the ear [4], [5].
With regards to the HAS ( fig.1), it is not entirely clear at present whether RS occurs prevalently in the PAS (at the level of internal hair cells) [7], or in the CNS [8]. Additionally, the widespread multidisciplinary interest of this topic gave rise to a number of debates, misunderstandings, and controversies [9]. In the same way there is a difference of opinions of some authors regarding the applicability of RS for people with hearing loss. While some authors consider that this phenomenon can be exploited for the improvement of the performances of hearing devices and prostheses (e.g., [10]), other claim that there are serious limitations to the applicability of SR in this field. Nevertheless, all these considerations underline a positive role for noise in such systems, and that SR phenomena can be detected already at the neuron level, e.g. during the encoding of sensory information [9,11]. Current computational modeling tools make possible to investigate the sound-detection mechanisms mediated by the SR in the PAS and the CNS separately, then simplifying the analysis of the mechanisms at the basis of this phenomenon. In this work we find evidences of SR in a model of PAS, that shows improved detection of sounds when an optimal quantity of input noise is added. The model is composed of a block which emulates the PAS, based on the Brian Hears library (Brian simulator, Python®) [12], and a subsequent detector of occurrences.
Firstly, by varying noise strength, we confirmed the presence of a noise level for which the test signal is better represented in terms of relative spike count in the frequency bands of the test signal. The presented model shows that SR is able to ameliorate the representation of the signal in the PAS, allowing deeper zones (located in the CNS) to increase the recognition performances of the original sound.
The realized model allows for an understanding model of SR mechanism in the PAS, and suggests applications in which noise with proper frequency features would be able to maximize the intelligibility of the input signal.

A. Stochastic resonance in the auditory system
Nonlinear systems need a threshold, subthreshold stimulus and noise for SR phenomenon to occur. These three ingredients account for the observation of SR in fields ranging from physics and engineering to biology and medicine [13]. Focusing on the auditory system, the introduction of a small quantity of noise to a periodic signal slightly below the threshold, involves reaching the level sufficient for the system to detect it. Then, in some contexts the optimal signal noise ratio (SNR) does not occur when the noise is reduced to the minimum [5,14,15].
Studies in this field are still not able to explain whether there is an interaction between the effect of SR occurring in the PAS and that of the CNS, nor if there is an optimal SNR varying with the frequency. However, there is experimental evidence of SR in the auditory system of vertebrates [6] and many models have been developed based on the hair cells (which are key elements in auditory transduction) and the activity of cochlear fibers afferent to them [16], to uncover the underpinnings of stochastic resonance (SR). Subsequent stages of CNS, which have the task of interpreting the information coming from hair cells, are in turn at least influenced by the effects of SR from the PAS.

B. The transduction of sounds in the basilar membrane of the cochlea
The vibration of the basilar membrane of the cochlea (Fig.2) causes the bending of the stereocilia of the hair cells. The vibration of the latter causes the cellular depolarization and hyperpolarization, depending on the direction of the movement. Depolarization is enabled by a chain of events that can be synthesized with the inflow of potassium ions into the hair cell. The resulting membrane potential variation, results in the release of a neurotransmitter from the base of hair cells (i.e., the glutamate) [17]. When this neurotransmitter reaches the synaptic junction, it promotes an action potential in one or more nerve endings [18]. Basilar membrane encodes frequency tonotopically, i.e., tones which are close to each other in terms of frequency are represented in topologically neighbouring regions. Other aspects of the stimulus are encoded in the discharge of cochlear nerve fibers, as the duration (which is signalled by the length of the activity) and the intensity of the stimulus (related to the density of the nervous activity and by the number of hair cells involved in the discharge) [18]. Dedicated hair cells that have higher activation threshold, are responsible to signalize intense stimuli with their depolarization. Tonotopical organization is maintained in the following projections to the primary auditory cortex via the auditory radiation pathway. Despite the tonotopy is an important source of information for the discrimination of sound frequencies, the identification of sounds below 4 kHz occurs in part also thanks to the coherence in the discharge of excited nerve fibers [17,18], through a phenomenon called phase locking, by which the discharge of the neuron always corresponds to a particular phase of the sound wave.

C. Absolute threshold of hearing
The first studies to quantify the perception of sound were those carried out by Fletcher and Munson [19,20], who conducted experiments in order to relate the loudness (i.e., the subjective perception of sound pressure) to varying frequency, showing the astonishing sensitivity and dynamic range of the human ear. An equal-loudness contour (i.e., isophonic curve) is a measure of sound pressure (measured in decibel, ref. 20 μPa) over the frequency spectrum, for which a listener perceives a constant loudness (measured in phon) when presented with pure steady tones ( fig. 3), and is determined by structural features of the auditory channel and the middle ear. Their studies show that the frequency range [1][2][3][4][5] kHz is perceived with a greater intensity for the same sound pressure level with respect to higher and lower frequencies. The lowest of the curves represents the absolute threshold of hearing, or minimum audibility curve, and indicates the minimum threshold for a tone to be perceived. The original isophonic curves, which consist in a standardized graph for an average human, have been recently defined by a new standard, the ISO 226: 2003 (current standard, confirmed in 2014).

D. Critical bands
A critical band is defined as a range of frequencies within which two pure tones are not properly discriminated by our hearing system. This effect is caused by the proximity of the areas in which the two tones are represented on the basilar membrane [21].
The size of the critical band is not constant for all frequencies, but it grows with increasing frequency. This trend is explained by the fact that each critical band is represented on the basilar membrane by a ~1.2 mm long band, and that with increasing frequency, more tones coexist within this space [17].
Zwicker and Terhardt [22] characterized the approximate behaviour of the critical bandwidth with respect to the center frequency f 0 with the following expression: (1)

E. Computational model of the PAS
Our ear model is realized with Brian Hears [23], an auditory library that includes sound generation and manipulation tools, filter banks (e.g., gammatone, gammachirp), detailed cochlear models (e.g., dynamic compressive gammachirp, DRNL), HRTF filtering, and easy integration with the spiking neural network (SNN) simulation package Brian [12], which is written in the Python programming language [24]. In order to synthesize a realistic computational model that allows realistic simulations, we reproduced the scheme evidenced in Fig.1, using a band pass filtering stage based on the work of Tan & Carney [25], which is inspired on the frequency response of the middle ear [26]. A gammatone filterbank allows separating the sound spectrum into 3000 bands [27]. A rectification block simulates the transduction of internal hair cells and finally a block of 3000 Leaky integrate and fire neurons with noise and refractoriness ( fig. 5) approximates the cochlear fibers afferent to the internal hair cells. In the circuital scheme are indicated the leak resistance (Rm), the membrane capacitance (Cm), the resting potential (Vr) and finally the stimuli (input current with noise). In the functional scheme, a qualitative behaviour of this neuron model is represented. The neuron makes a continuous integration of the input contributions (leaky summation). The added noise is able to anticipate the spiking of the neuron. Once the neuron fires, its membrane potential (Vm) is reset to Vr and maintained for a period (refractory period, blue arrow) during which the neuron is unable to integrate new inputs. Such model is able to accept sounds as input, including pure tones, complex sounds, and different types of noise. Moreover it is possible to vary level and duration of the sound stimuli. In order to evaluate the model activity, a rasterplot (i.e., a diagram with the number of neurons on the ordinate and the time relative to the duration of the simulation on the abscissa) is printed on the screen at the end of each simulation, showing the spikes, or the action potentials, produced over time.
The model reflects the tonotopy of the basilar membrane and nerve fibers afferent to the inner ear: spiking activity of low-cardinality neurons (closer to the apex) is obtained for low frequencies; vice versa, spiking activity of highcardinality neurons (closer to the base) is obtained for high frequencies ( fig.6). In addition, when the basilar membrane is set in vibration by a tone, the number of internal hair cells involved increases with the amplitude of the stimulus: an higher sound pressure level will involve more neurons than one of low intensity, in addition to generate more spikes.

F. Method
In order to validate the SR phenomenon we performed and extracted quantitative measures of neural activity. Since the noise causes a variability of such measures we considered sets of 5 simulations of the same type (trials) and then extract the mean value. In this work we will refer to the conventional value of SPL (20 μPa).

G. Mapping critical bands in the PAS model
Since the size of the critical band (see fig.7) increases with the frequency (eq.1), as first step we characterized the correspondence between the frequency of stimulation (central frequency) and the band of neurons that is activated in the model. We tested the model with 6 different frequencies, i.e., test frequencies: 100Hz, 290Hz, 823Hz, 1.99 kHz, 4.5 kHz, and 9.4 kHz. Critical bands related to the 6 test frequencies (BW f0 ) were obtained through eq.1. Lower frequencies have not been checked because not necessary (similar values of bandwidth [18]). In order to have an intuitive evaluation of the extreme limits of the critical band for the considered frequency f 0 (i.e., f A and f B ), we considered the band to be symmetrical with respect to f 0 , and computed with the set of eq.2: The tonotopic organization of the model is summarized in the following table, where each value has been obtained by averaging the results of a set of 5 trials of 900 ms for each one of the chosen frequencies. H. Identification of the minimum audibility curve of the model In order to verify whether the model adheres to the real auditory threshold described in literature, we checked the model for 7 different frequencies (20 Hz, 100 Hz, 290 Hz , 823 Hz, 1.99kHz, 4.5 kHz and 9.4 kHz) considering simulations of the same input duration. Note that here we tested also frequencies lower that 100 Hz, in order to have a complete characterization of the human hearing range. For each one of the 7 frequencies chosen for the check, we made different sets of 5 simulations of 10 second, considering the pure tone as input, each one at different input levels, and calculated for each combination (frequency-input level) the mean number of spikes evoked in neurons pertaining to the critical band (NBW C ). Finally, we compared the obtained value with the NBW C obtained by 5 simulations of 10-second each in absence of input. In order to have a measure of the spiking activity evoked by the test tone, we calculated the average number of spikes in the critical band of the test tone (NBW C ), i.e., the average spike count (ASC), with the following formula: We traced the absolute threshold for each frequency considering the SPL value that produced an ASC of almost 50% the ASC obtained in absence of input.

I. Stochastic resonance validation tests
The protocol followed in this work is aimed to detect whether the addition of white noise to a subthreshold tone is able to increase neuronal activity along the basilar membrane of the cochlea preferentially in the neurons corresponding to the frequency of the test tone. In order to understand how the injection of noise is able to modify the spiking activity, we performed the following simulations: • a set of trials with spontaneous activity of the model (i.e., without stimulus); • a set of trials with a pure tone in the neighbourhood of the audibility threshold as input; • a set of trials containing the same signal with added noise. The duration of each simulation was 10 seconds; each simulation was repeated 5 times for each check. We firstly computed and compared the ASC from the 3 sets of simulations.
In addition, a coefficient called relative spike count (RSC) was conceived to understand if the introduced noise would produce a relative increase of NBW C considered, with respect to the neurons of adjacent areas. The RSC is defined as the ratio between the number of spikes NBW C and the half-sum of the spikes produced in the adjacent critical bands (i.e., the mean number of spikes evoked in neurons pertaining to the upper and lower critical bands, NBW U and NBW L respectively):  (5) Conversely to ASC, ARSC gives us not just a measure of activity in the band of interest, but a measure of the relative activity with respect to adjacent areas. Then, such metric gives us a measure of the detection performance of the PAS model. In the next section, we will focus on the results obtained on a single frequency, i.e., 4.5 kHz, where significant results have been obtained.

III. RESULTS AND DISCUSSION
In this section we report and discuss the results obtained with a battery of simulations related to a test tone of 4.5 kHz which level is in the range [-5,-4] dB and noise in the range [-12, -4] dB, checked with steps of 1 dB for both tone and noise.  Table II shows the analysis of the absolute and relative activity with regards to the critical band of interest. The response of the system in absence of stimulus is highlighted in grey, whereas response to pure tones is highlighted in green. In yellow we highlighted the cases where the relative activity reported in the critical band increases in presence of noise. In fig. 10 we show the trends of ARC and ARSC with the two tone levels.
The simulations carried out in this work show that the variation in the intensity of the noise added to an underthreshold signal is able to activate the effect of SR, resulting in an increase or decrease in the detection performances. Taken together, the results show that the detection performances are not deterministically correlated to the noise level. In figures 10 and 11, we report as emblematic case the graphical representation of data from Table II, concerning neural activity evoked in the model with regards to a pure tone of 4.5 kHz at the levels of -5 and -4 dB, with varying noise level.
The computational model of the PAS is able to support the SR phenomenon, showing an improvement of detection performances near the threshold, for a particular range of SNR. In addition to serving as a model of understanding, such kind of models can be of help in the design of prostheses and acoustic aids able to compensate for the typical problems encountered by people with hearing loss, in which the absolute threshold of audibility is perturbed. In facts, a decrease of the minimum threshold of audibility has been found for some combinations of SR, corresponding to an improvement in the detection of weak signals. Another improvement could be that of using neuron models able to exhibit the neurocomputational feature spike latency [28,29] and modelling the backward connections shown in fig.1. The delayed feedback introduced by these two elements could give rise to further resonance phenomena, facilitating neural synchronization.

IV. CONCLUSIONS
Finally, given the complexity of the system, it will be necessary to investigate the impact of the choice of model parameters to the system behaviour in order to avoid artefacts and misinterpretations. An extension of this work could be that of applying the present system for recognition applications. While here we have used a simple measure of the relative activity with respect to adjacent areas (ARSC) in order to evaluate the detection performance, it is known that in the real HAS dedicated neuronal microcircuits exploit different features of the evoked spike patterns for the detection and recognition of stimuli. In this regards, the use of machine learning techniques, which are increasingly being developed in a wide variety of areas [30][31][32][33][34] including the field of hearing [35], would allow us the extraction of additional information with respect to the mere count of occurrences from the spike patterns, leading us to an improvement of the detection performances. Hereof, since today's machine learning systems are frequently based on neural networks [36][37][38][39][40] (among which the SNNs [41]) a direction could be that of using nature-inspired recognition systems, which would allow us to better mimic the subsequent stages of PAS (i.e., SNC part of fig. 1) and to expand the system to model the complete HAS. The tendency of adopting neuro-inspired solutions in the field of machine learning [36,[42][43] is the emblem of the continuously growing similarities between biological systems and computer science that are characterizing the last few years (computing techniques (e.g., [44]), communication networks (e.g., [45][46]) and so on).