Projects

These are personal and group projects that showcase my interest areas and expertise.

Ongoing

Room Impulse Response Completion

Master's Thesis Research

In every day life, sounds from the environment are crucial in understanding, navigating, and interacting with the world around us. The room impulse response (RIR) is the underlying acoustic signature of a space present in all the real-world sounds we hear: played back by itself it is a time-domain signal that sounds like an echo. A dry RIR indicates that the room is small and/or is surfaced with acoustically dampening materials, whereas a loud ringing RIR indicates the environment is big and/or has hard, reflective surfaces. RIRs are used to add spatiality to dry recorded sound by a convolution operation, and thus are used in many applications such as adding reverb to music, acoustical rendering of architectural spaces, adding spatiality to audio sources in virtual environments, and more. However, in the cases where obtaining an RIR through real-world recording is too resource intensive or impossible, we are faced with the non-trivial task of generating RIRs.

The task of the thesis, RIR Completion, is to synthesize the remainder of the RIR given the early part of the RIR, about 50ms, using machine learning (ML). This approach would reduce the amount of time and compute of by only requiring traditional simulation methods to produce a fraction of the time samples of the RIR, a short RIR "head", as the input to a neural network (NN), which would subsequently generate the rest of the samples, the RIR "tail". with fixed time and compute cost. A trained neural network on this task generates RIRs with fixed time and compute cost, and because it is a data-based approach, can potentially be more accurate than other algorithmic reverbs.

Datasets used: ARNI, MOTUS, R3VIVAL, MIT IR Survey, BIRD

Supervisor: Professor Sebastian Schlecht

Advisors: Georg Gotz, Ricardo Falcon Perez, Kyung Yun Lee

Completed

Neural Acoustics Fields

Audio Technology Seminar Research Paper

Impulse response modelling has been a well studied topic in the acoustics and audio signal processing fields. Advancements in deep learning have led to such methods being applied to room impulse response (RIR) generation, and has yielded some interesting and comparable results to traditional methods.

Recently, the development of Neural Radiance Fields NeRF in computer vision has been a breakthrough in neural representation learning, and has been good at interpolating unseen views within a complex 3D scene trained on sparse input views. NeRF has been adapted to acoustic scene representation and has shown to produced RIRs at unseen source-listener pair positions
in Neural Acoustic Fields NAFs. This seminar research paper reviews the contribution, affordances, and performance of several NeRF-based and non-NeRF-based deep learning-based RIR generation methods.

Read the research report.

Course: ELEC-E5632 - Audio Technology Seminar (May 2023)

Professor: Vesa Välimäki

Dark Velvet Noise Reverb

Audio Signal Processing Demo

Velvet noise, sparse pseudo-random noise, in the synthesis of impulse response tails proves to be simple and computationally efficient. This project is the implementation and audio demo of late reverberation synthesis using filtered velvet noise.

Read the demo report.

See the demo presentation.

Collaborators: Verneri Hirvonen

Course: ELEC-E5620 - Audio Signal Processing (April 2023)

Professor: Vesa Välimäki

"Where do we go when we fall asleep?"

VR Experience

"Where do we go when we fall asleep" is a virtual reality experience that experiments with the psychological liminal space between awakeness and dreaming. An experience where the dreamer (user) is physically laying down on a bed and experiences multiple dreamlike worlds. When the dreamer gets up from the bed the dream vanishes and they remain in the virtual bedroom.

Collaborators: Hiski Huovila, Karolina Nowak, Sasha Usoskin

Course: AXM-E0404 - Designing and Creating Virtual Worlds

Professor: Lily Diaz-Kommonen

Predict Azimuth Angles of HRIR with Machine Learning

Machine Learning Research Project

Binaural signals can be digitally constructed with head-related impulse responses (HRIR), a set of laboratory-measured pairwise signals which encode the sound source position in relation to the listener. An impulsive sound is played from a loud speaker at some distance, horizontal, and vertical angle in an anechoic chamber and HRIRs are recorded by two in-ear microphones in a dummy head or human test subject. Thus a personalized set of HRIRs with radius, azimuth angle, and elevation angle properties are captured for each subject.

This project explores different supervised machine learning methods to predict the associated azimuth angle of an unknown subjects’ HRIR using machine learning. The results of linear regression, polynomial regression, and multilayer perceptron methods are compared and are shown to be promising for this application. The motivation for the project was to deepen my understanding of spatial sound, binaural techniques, auditory modelling, and machine learning applications for audio.

Read the full report.

Course: CS-C3240 - Machine Learning (December 2022)

four-symbols-sound-zoo

Architecture and other.

Four Symbols Sound Zoo

Spatial audio architectural VR experience

Sound plays a complex and important role in how humans infer the world. This project explores the landscapes that can be created with spatialized audio in the Unity game engine, and the degree to which sounds can create presence of unseen beings and psychologically immerse inhabitants in a virtual environment. The application of realistic sound for virtual environments is important and can be manipulated to create surprising and striking experiences. It is this edge of human experience that I am excited to expand using sound. This project used the native audio spatializer SDK in Unity and the Resonance Audio SDK developed by Google.

The four symbols sound zoo is a soundscape created in Unity, in which the inhabitant moves around a perforated circular room, guided by sound emitting from outside of the room. Four mythological creatures from Chinese lore, the Azure Dragon of the East, the Vermillion Bird of the South, the White Tiger of the West, and the Black Tortoise of the North are placed into different spatial environments ocean, sky, forest, and cave. Neither the mythological creatures nor their habitats appear visually, but based on the direction of the sound, reverb effects, and movement of the sound through space, the presence of four large animals is felt by the inhabitant.

Collaborators: Kedi Hu

Course: 4.502 Architecture in Motion Graphics (MIT 2019)

Professor: Takehiko Nagakura

Welcome home, Computer

Spatial VR immersive experience

"Welcome home, Computer" imagines how digital entities such as computer generated character Lil Miquela, with over 2.7 million Instagram followers, and Apple virtual assistant Siri exist in space just for them. How a human intruder would experience their world? This project is an immersive reality experience built in Unreal Engine. Distance, movement, and communication within a home environment exclusively for virtual entities are altered to imagine the machine subjectivity of spatial inhabitation and experience.

See the presentation.

Course: Reality Design Workshop (MIT 2020)

Instructor: Cagri Hakan Zaman

more coming soon...

Back to home