Temporal Voice area dataset April 2015 Purpose of the study and hypotheses The goal in creating this dataset was inquire which brain regions are involved in automated voice information processing using large N, emphasizing both commonalities and differences between subjects. The main hypothesis was despite an overall left/right superior temporal involvement, local maxima are variable across subjects. People/institutions/funders involved in the data collection This dataset is a collection of voice localizers obtained from multiple studies done at the Centre for Cognitive Neuroimaging and Institute of Neuroscience and Psychology, University of Glasgow, United Kingdom. Data were collected by Phil McAleer, Marianne Latinus, Ian Charest, Patricia E.G. Bestelmeyer, Rebecca H. Watson, and David Fleming, under the supervision of Pascal Belin. Data collection was supported by BBSRC grants BB/E003958/1, BBJ003654/1 and BB/I022287/1, ESRC-MRC large grant RES-060-25-0010. People involved in the data analysis. Cyril R. Pernet, Phil McAleer, Krzysztof J. Gorgolewski,. Mitchell Valdes-Sosa and Pascal Belin. Experimental protocol The voice localizer consists of 10 min and 20 sec block design with forty 8-sec long blocks of either vocal (20 blocks) or non-vocal (20 blocks) sounds from stimuli already used in Belin, et al. (2000). About 60% of the stimuli were recorded specifically for the localizer and the rest was taken from public databases available in year 2000 as well as from recordings of American English vowels from (Hillenbrand et al., 1995). For the current experiments, stimuli were presented using Media Control Functions (DigiVox,Montreal, Canada) via electrostatic headphones (NordicNeuroLab, Norway; or Sensimetrics, USA) at a comfortable level (80-85 dB SPL). The 40 blocks are intermixed with 20 period of silence allowing the haemodynamic to relax. The relative power spectra of voice and non-voice blocks after convolution are similar, avoiding bias due to data sampling. Blocks are made of a mixture of either vocal sounds or non-vocal sounds, with at most a 400ms delay between consecutive stimuli. The order of stimuli was attributed randomly, but is fixed for all subjects. Vocal blocks contain only sounds of human vocal origin (excluding sounds without vocal fold vibration such as whistling or whispering) obtained from 47 speakers (7 babies, 12 adults, 23 children and 5 elderly people) and consist of speech sounds (words, syllables or sentence extracts ? 1 in English, 1 in French, 3 in Finnish, 2 in Arabic) and non-speech sounds (emotional positive or negative sounds like laughs, sighs, or cries, and neutral sounds like coughs, and onomatopoeias). The median ratio speech, non-speech is of 22.5% (min 0% max 75%). Non-vocal blocks consist of natural sounds (from natural sources like falls, sea waves, wind, and from various animals like cats, dogs, lions, elephants) and of man-made sources (from objects like cars, glass, alarms, and clocks, and from (classical) musical pieces). All source categories are easily recognizable; although specific exemplars might have not been known to all participants (e.g. we can recognize an animal sound without identifying the species). Stimuli (16 bit, mono, 22050 Hz sampling rate) are normalized for RMS (same normalization for all stimuli); a 1-kHz tone of similar energy is provided for calibration. The protocol is available to download with this dataset as voice_localizer.zip. The archive contains all the stimuli in blocks. There is also a text file that corresponds to the sound blocks order.