Uncategorized Archives

Building an AI-Powered Headphone Detector and Automated Audio Switcher: Step-by-Step Guide

In this post, we’ll show how to solve everyday niche problems with the help of AI and machine learning tools—even if you’re new to the field. With a few simple steps, we will build an Ai Powered , Vision based audio switcher.

Background

As an avid gamer with a triple-monitor setup, my PC handles everything from multitasking at work to entertainment and learning outside of work hours. I often switch between watching movies, reading, and gaming (love-hate relationship with DOTA2!), but constantly changing my audio output from headphones to speakers is a hassle.

Wouldn’t it be great if it could switch automatically? In the age of AI, this seemed possible!

With limited AI knowledge, I explored GitHub for solutions and found inspiration in projects using neural networks like VGG16 and YOLO. Rather than build on those, I decided to create my own system from the ground up—well, almost from scratch.

If you’re unfamiliar with neural networks, I recommend watching this excellent explainer by 3Blue1Brown, which covers the basics clearly and visually.

We chose the YOLO v11 model for its speed and compact size, making it ideal for edge devices like Raspberry Pi and low-powered microcontrollers. To accomplish our goal, we need to fine-tune this neural network to recognize headphones. This involves retraining the final layers, which requires:

Collecting image data
Cleaning the data
Creating positive, negative, and validation sets
Training the network

Think of it like gaining new skills in a familiar field—similar to a teacher receiving extra training in a new subject.

Now, Lets go through that List one by one.

Data Collection

Think of training our model like teaching a smart dog new tricks. While it already knows basic things (like spotting objects and people), we need to teach it exactly what we want it to find. For example, if we want it to spot people wearing headphones, we need to show it lots of examples of this.

I did some searching online and found a helpful resource: https://universe.roboflow.com/ginger-ojdx6/earphone-detection-75qzd

This website has two useful things:

A collection of images we can use for training (called a dataset)
A ready-made model that can detect earphones

Though there’s a pre-made model available, we’re going to build our own from scratch – it’s like cooking a meal yourself instead of ordering takeout. You learn more that way!

We’ll download the dataset in something called ‘YOLO format.’ We chose this format because it comes with special instruction files that tell our training program exactly what to look for in each image. It’s like having a detailed recipe that our computer can understand.

Environment Setup

To start coding, we’ll use Docker, as it provides a pre-configured environment with the base model and necessary dependencies. After a quick search, we found this Ultralytics Docker repo that includes CUDA, Python, and other essential libraries.

Since my system has an NVIDIA 3090 GPU, we’ll leverage that, but this setup can also work with other hardware—more on that in future posts.

Our project will be split into two parts: a server and a client. The server will accept an image, process it with the model, and return results (whether headphones are detected or not, along with confidence scores and bounding boxes). The client will run on the user’s machine (which doesn’t need a GPU) and will use a webcam to capture frames at 3 FPS. These frames will be sent to the server for processing. If headphones are detected, the client will trigger an action, like switching the audio output.

Most of the code has been generated with the help of AI tools like Claude.ai and ChatGPT. I view AI as a great tool to speed up development, and I’ll share the prompts used in the GitHub repository.

For the server, I based it on the Ultralytics Docker image and added Nginx for reverse proxy functionality. This allows the server to serve static content (such as images with bounding boxes) and provides added security. For more on reverse proxies and Nginx setup, check out this excellent tutorial on DigitalOcean.

Note: We’re using Ubuntu for the server, as I already have a working Docker setup on an Ubuntu machine with a GPU. This machine also serves as my media server.

client server diagram — architecture diagram

Training the model

Now that we have the data ( this is a already partitioned data , means its already split train, test and validate data – we don’t have to manually split it again ). Lets take a look at the data

Our data is already split into three directories—train, test, and valid—so no manual partitioning is necessary:

Train: Used for the actual model training, containing about 70% of the data.
Valid: Evaluates model performance during training with around 20% of the data.
Test: Used to assess final accuracy, containing roughly 10% of the data.

For more on label formats and model requirements, see this Ultralytics YOLO guide.

Run docker interactively to train the model

Since we’re using the Ultralytics Docker image, the following command initiates training:

bashCopy codesudo docker run -it --rm --ipc=host -v ./:/app -v ./models:/models --gpus all ultralytics/ultralytics

Run this command from the directory where the repository is cloned. This creates an interactive shell with two mounted volumes:

The current directory as /app
The models directory as /models

Think of mounts as file sharing between Docker and the host. Once inside the shell, proceed with the following steps.

The Server Code

The code for the server is fairly simple, First we load the model

Then we open up a endpoint which receives images. and then we run ultralitics inference on that image ( already initialized with our custom model ).

The Client / End Device code

After that we send the results to the client via JSON.

please note the format here , it returns class, confidence and bbox ( bounding box ) . Class is, what is being detected, confidence is : the confidence that the class is correct and bbox is bounding box ( a box around the item which was detected ) .

Now lets jump to the client side.

The Client Code

The client is a straightforward Python script I mostly built with ChatGPT’s assistance—why reinvent the wheel? This script reads the video stream from the camera at a frame rate of 2 FPS (just enough to check for headphones without overloading the server; even 1 FPS would suffice).

The second part of the code is a function that sends each captured frame (image) to the server created in the previous step and reads the response. If the server response includes labels like “headphone” or “earphone,” the function triggers audio output switching.

The final part is the audio switcher code, which adjusts based on the operating system:

Windows: Uses pycaw
macOS: Uses applescript
Linux: Uses pactl with PulseAudio

Each method is implemented in a separate function, tailored to its respective OS. Check the code for specific implementation details.

Switch to headphones based on operating system

Wrapping up

And there you have it! We’ve built a fully functional local model and service that can handle a real-world task with minimal effort. While this approach is hands-on, there are even easier alternatives, like using AutoML services to automate much of this process. Let me know if you’re interested—we can explore AutoML in upcoming articles!

Please feel free to go through, fork and modify this github repo hosting the above code

https://github.com/shreyasubale/headphone-audio-switcher

Happy Hacking !

Genetics is not that hard (Sort of )

A few months ago, while aimlessly browsing youtube videos, i came across this great channel The Thought Emporium . I was initially browsing for some SDR / Radio telescopy related stuff and this guy does some of it.

But while watching his other videos, i came across some biology stuff ( he does a lot of those, along with laser / quantum physics and everything in between )

What really got me hooked was a specific video of his. ( Justin ) is Lactose Intolerant. So he went about creating “cure” for his condition. He successfully performed “Gene Therapy” on himself by injecting plasmids ( Basically DNA ) via viruses as delivery mechanisms to his digestive tract which produce the enzyme ( Lactase) which in turn breaks down the primary sugar of milk / milk based food items ( Lactose )

I was really surprised that it was even possible for an individual to accomplish this. So, i decided to investigate further.

Easy way to learn molecular biology basics

I am a engineer by profession and a hacker / scientist by passion. Having no interest in learning anatomy i never considered picking up biology as a hobby, but i had no idea biology ( molecular biology ) would be so fun. So i decided on going through some online stuff. after looking at sheer overwhelming overload of information, i realized that i need something short, to the point resource which is easy to understand and is not a dry “textbook” which makes me sleep whenever i try to read it.

So i came across this book

The Manga Guide to molecular biology

Yep, you read it right. Manga. It’s a very basic guide to building blocks of molecular biology – woven into a story which while mediocre, keeps the flow and pace of information very natural.

DNA & RNA

From this book, i learned the basics of molecular biology – Basically there are 4 nucleotides – Adenine,Thymine,Guanine and Cytocine ( A,T,G,C ) in human DNA. Now due to the molecular structure of these nucleotides, A always bonds with T and G always bonds with C

Almost each cell in the body, contains a nucleus ( depending on the type of organism, and cell type nucleus might be present or absent ). This nucleus contains the DNA, which is a long chains of AT and GC pairs woven together in double helix pattern.

RNA is just a single strand as opposed to the double stranded DNA

Proteins

Proteins are the building blocks of the body, everything in our body ( almost everything ) is made up of proteins. Skin, nails, tissues, Hormones, blood – everything is either a protein, or a combination of multiple types of proteins

Proteins are made up of amino acids, and there are a total of 21 known molecules, which combine in long chains to make up a single protein. The DNA, with its sequence of 3 base pairs ( called codons ) code for one molecule of building block of protein

As you can see from the above image, 3 Base pairs of DNA codes for a instruction of a single molecule of an amino acid( out of 20 possible), and these sequence of amino acids, in turn form long chains, which are proteins .

Out bodies are, in turn made up of Water, proteins and Carbs and lipids ( basically oily, fat-like substances )

So, to me it all looks very much like computers. The building blocks are binary ( AT might be thought as 1 , GC as 0). Now these 1’s and 0’s together form instructions which form protiens, which can be thought of as macros/function depending on the type of organism. now these functions combine together to form larger Routines ( Lets consider them as objects maybe ? – analogous to cells) which in turn makes up the entire program ( body ) work.

What really fascinates me is, a lot of these DNA sequences are shared between different organisms .

So essentially, what it means is the building blocks for all organisms are same. and what we can do, is pick-up/isolate traits/genetic makeup from one organism and plant it into other organism(okay its wayyy more complex than it sounds), and that is my friends – Bioengineering.

Over the next few articles, ill go into various concepts of bioengineering, equipment builds, and a “Hello World” of bioengineering. Yes, its a costly hobby, but its magical to see your experiments manifest in living organism.I will be majorly working with Plants & Microbes initially due to ethical reasons – At least till i have learned enough to not screw things up royally.

Here is a teaser of whats to come

The hello World of biotech, Bacterial Transformation of E.Coli Bacteria using GFP Plasmid (Essentially, modifying the E.Coli bacteria with DNA from Jellyfish to produce green florescence)
DIY Biolab – Making your own DIY-Biolab ( India specific version )

Happy (Bio) Hacking 🙂

A Re-found love for electronics

We all know that the IOT phenomenon is on a full swing nowadays. The rapid development of new and low cost devices has fuelled this phenomenon.

A year( or two) ago, i came across a new board, The Raspberry Pi – This is a cheap full blown computer with USB ports, ethernet ports, HDMI and GPIO ports. The best part is GPIO ports which lets you directly interact with hardware.It also lets you install many flavours of linux on it and has a 1Ghz CPU , and a dedicated GPU, which makes it much more powerful then regular Arduino boards.

I did a lot of fun projects with the Raspberry Pi – some of which were software only and some hardware based. The ones i remember are :

A Auto downloading web based torrent client using transmission- can be accessed from anywhere
XBMC ( Now called Kodi ) : As a media library for my newly purchased TV
Ambilight Clone with Ws2801 LED Strip and Raspberry pi
Location based AC Switch with a IR led and Raspberry pi

But as i thought more about automating my home, the cost of raspberry pi became a big factor and my interest slowly ebbed.

Until recently( a month ago ) when a friend told me about a marvellous new board – The ESP8266, A tiny board which contains a powerful wireless radio, with full TCP Stack and a integrated microcontroller and Almost 19 GPIO pins for a mere Rs150-250.

This sparked my interest. I have spent last 3 weeks ordering, playing and doing awesome stuff with the ESP8266. I have also started re-learning electronics from the ground up. I have re-built my electronics lab ( so to speak ) .

The coming posts on this blog will cover my experiments with the ESP8266, Ardunio , General Electronics and Software( which is my bread and butter ). I am writing a blog for the first time, so please bear with me and feel free to suggest/point out any mistakes/suggestions.

Time to Rock 🙂