Audition Module Guide

A guide to understanding and using ACT-R/PM's Audition Module.


The Audition Module (AM) gives ACT-R rudimentary audio perception abilities. Unlike the Vision Module, though, the Audio Module does not give ACT the ability to deal with real sounds, but it does allow the simulated perception of audio.

The AM is designed to work similarly to the Vision Module. There is a store of "features" called the audicon, and these can be transformed into ACT chunks by way of an attention operator. "Features" in the audicon are, of course, not things that have spatial extent like visual features, but instead have temporal extent--they are sound events.

Each sound event has several attributes:

Support will later be added for spatial location of the sound, for use in modeling things like dichotic listening.

Basic kinds of sounds currently supported are tones, digits, and speech. The content delay and recode time for tones and digits are uniform (settable via system parameters), as is digit duration. Since speech strings can differ in length, and presumably content delay and recoding time, these must be supplied when sound events are created. Finally, the audicon has a decay parameter (default is three seconds). After a sound event ends and the decay time elapses, the sound event is deleted from the audicon.

Using the Audition Module

There are two ways to use the Audition Module. One of them is more or less parallels vision, which uses tests built into the production syntax to deal with audio. For example, if you wanted the earliest unattended sound in the audicon, this test would find it:

   ISA    audio-event
   onset  lowest

Note that the detect-time of the sound (after onset) has to have passed for this to match. To shift auditory attention to it, send an +aural command to the Audition Module. Once a sound event has been attended to, after some time (determined by the recode time for that sound), a sound chunk will be created. The base level activation of new sound chunks is controlled by the parameter :sound-base-level. Sound chunks have three slots, type for the type of sound, content for the content, and pitch for the pitch range (high, middle, low). For example, the spoken string "nine" would have a type of digit, a content of "seven", and a pitch of middle (unless the speaker is someone like Kerri Strug, in which case high might be more appropriate). If this was the last attended sound, it could be matched with:

   isa      sound
   kind     digit
   content "nine" 

When sounds occur, they can be simulated by creating sound events through the Lisp functions new-digit-sound, new-tone-sound, and new-other-sound. Parameters for these commands are documented in the Command Reference.


Last modified 2004.03.02