Computational Audio Intelligence for Perception & Representation from denoising and spatial hearing to cross-modal understanding

Important dates

  • Paper submission: 31st of January, 2026

  • Notification of Paper Acceptance: 15th of March, 2026

  • Camera-Ready Paper Due: 15th of April, 2026

  • Conference: 22nd - 26th of June, 2026


Click here to submit your paper!


Scope and Topics

Sound is a fundamental carrier of information for both physical events and human activities. Beyond speech, the auditory domain consists of a multitude of mixtures of environmental, musical, and spatial cues that allow humans and machines to perceive, interpret, and (inter-) act with and within their surroundings. Audio plays an essential role in perceptual intelligence, where the goal is not only to process signals but to learn and infer internal representations that support reasoning and interaction.

Recent advances in computational intelligence and machine/deep learning have significantly improved the ability of learning algorithms and computational methods to extract and manipulate meaningful information from audio signals. Methods for denoising, dereverberation, and source separation are increasingly coupled with representation learning methods, enabling the capturing of semantic and spatial aspects of sound. At the same time, data-driven models for spatial hearing, cross-modal learning and correspondence, and generative modeling are reshaping how methods and models represent and synthesize auditory scenes. These developments bridge the traditional boundary between low-level signal enhancement and high-level understanding, showing the way towards a unified perspective on computational audio perception.

This special session aims to bring together researchers working across various complementary domains. The goal is to explore how computational intelligence can support robust and adaptive computational audio processing systems, that can generalize across tasks and modalities.

The topics of the special session include (but are not limited to):

  • Self-supervised learning for audio representation
  • Intelligent denoising, dereverberation, and source separation
  • Spatial audio understanding and neural rendering
  • Cross-modal and multi-sensor fusion for sound perception
  • Generative and diffusion-based models for audio transformation
  • Adaptive, bio-inspired, and neuro-computational models of hearing
  • Applications in machine hearing, AR/VR audio, and auditory scene analysis
  • Context-aware immersive speech processing and applications
  • Audio augmented and mixed reality
  • Machine hearing for 3D audio scene reconstruction
  • Computational hearing aids and assistive listening technologies
  • Audio-based environmental monitoring and smart cities
  • Audio intelligence for autonomous systems and robotics
  • Ethical and social implications of synthetic audio and deepfake
  • Audio-based localization
  • Multimodal audio-visual embeddings for cross-domain perception

Organizers

  • Emmanouil Benetos - Queen Mary University of London, London, U.K.

  • Konstantinos Drossos - Nokia Technologies, Espoo, Finland

  • Michele Scarpiniti - Sapienza University of Rome, Rome, Italy

You can contact us for anything about the special session, by