Computational Audio Intelligence for Perception & Representation from denoising and spatial hearing to cross-modal understanding

Special Session at the IEEE World Congress on Computational Intelligence (WCCI), 2026

Important dates

Paper submission: 31st of January, 2026
Notification of Paper Acceptance: 15th of March, 2026
Camera-Ready Paper Due: 15th of April, 2026
Conference: 22nd - 26th of June, 2026

Click here to submit your paper!

Scope and Topics

Sound is a fundamental carrier of information for both physical events and human activities. Beyond speech, the auditory domain consists of a multitude of mixtures of environmental, musical, and spatial cues that allow humans and machines to perceive, interpret, and (inter-) act with and within their surroundings. Audio plays an essential role in perceptual intelligence, where the goal is not only to process signals but to learn and infer internal representations that support reasoning and interaction.

Recent advances in computational intelligence and machine/deep learning have significantly improved the ability of learning algorithms and computational methods to extract and manipulate meaningful information from audio signals. Methods for denoising, dereverberation, and source separation are increasingly coupled with representation learning methods, enabling the capturing of semantic and spatial aspects of sound. At the same time, data-driven models for spatial hearing, cross-modal learning and correspondence, and generative modeling are reshaping how methods and models represent and synthesize auditory scenes. These developments bridge the traditional boundary between low-level signal enhancement and high-level understanding, showing the way towards a unified perspective on computational audio perception.

This special session aims to bring together researchers working across various complementary domains. The goal is to explore how computational intelligence can support robust and adaptive computational audio processing systems, that can generalize across tasks and modalities.

The topics of the special session include (but are not limited to):

Self-supervised learning for audio representation
Intelligent denoising, dereverberation, and source separation
Spatial audio understanding and neural rendering
Cross-modal and multi-sensor fusion for sound perception
Generative and diffusion-based models for audio transformation
Adaptive, bio-inspired, and neuro-computational models of hearing
Applications in machine hearing, AR/VR audio, and auditory scene analysis
Context-aware immersive speech processing and applications
Audio augmented and mixed reality
Machine hearing for 3D audio scene reconstruction
Computational hearing aids and assistive listening technologies
Audio-based environmental monitoring and smart cities
Audio intelligence for autonomous systems and robotics
Ethical and social implications of synthetic audio and deepfake
Audio-based localization
Multimodal audio-visual embeddings for cross-domain perception

Organizers

Emmanouil Benetos - Queen Mary University of London, London, U.K.
Konstantinos Drossos - Nokia Technologies, Espoo, Finland
Michele Scarpiniti - Sapienza University of Rome, Rome, Italy

You can contact us for anything about the special session, by

creating an issue at the GitHub repository of this website or ,
by sending an email to E. Benetos using firstname [dot] lastname [at] qmul [dot] ac [dot] uk

Computational Audio Intelligence for Perception & Representation from denoising and spatial hearing to cross-modal understanding

Important dates

Paper submission: 31st of January, 2026

Notification of Paper Acceptance: 15th of March, 2026

Camera-Ready Paper Due: 15th of April, 2026

Conference: 22nd - 26th of June, 2026

Click here to submit your paper!

Scope and Topics

Self-supervised learning for audio representation

Intelligent denoising, dereverberation, and source separation

Spatial audio understanding and neural rendering

Cross-modal and multi-sensor fusion for sound perception

Generative and diffusion-based models for audio transformation

Adaptive, bio-inspired, and neuro-computational models of hearing

Applications in machine hearing, AR/VR audio, and auditory scene analysis

Context-aware immersive speech processing and applications

Audio augmented and mixed reality

Machine hearing for 3D audio scene reconstruction

Computational hearing aids and assistive listening technologies

Audio-based environmental monitoring and smart cities

Audio intelligence for autonomous systems and robotics

Ethical and social implications of synthetic audio and deepfake

Audio-based localization

Multimodal audio-visual embeddings for cross-domain perception

Organizers

Emmanouil Benetos - Queen Mary University of London, London, U.K.

Konstantinos Drossos - Nokia Technologies, Espoo, Finland

Michele Scarpiniti - Sapienza University of Rome, Rome, Italy

creating an issue at the GitHub repository of this website or ,

by sending an email to E. Benetos using firstname [dot] lastname [at] qmul [dot] ac [dot] uk