Building a remote for Microsoft Teams, Skype and Zoom calls

Introduction

TLDR: I wanted to make something useful at home with some IoT and electronic stuffs 😉

I am Italian and I live in Lombardy region. Me and my family have been in lockdown for several weeks because of the COVID-19 emergency. Even if the restrictions are now more loose and many activities are slowly coming back to sort-of-normality, the school is still closed.

Our daughter, Federica, is 5 and she is having daily video-calls with the school and other activities she likes. After thinking a bit about it, I decided to provide her with one of my old laptops and connect it to our TV in the living room. This way she can enjoy a bigger screen and there’s room for her desk and to play/dance in front of the camera.

I was looking a way to provide her the ability to mute/unmute the mic during calls (often this is required by the teacher and it is a good practice to reduce overall background noise). One of us parents is always in the room with her but we’d like her to have as much as independence as possible.

Let’s build a simple (one or two buttons) physical remote she can use!

Compared to use of a touch-enabled device (smartphone, tablet or even touch enabled desktop solutions) or having to deal with the mouse, this is really more immediate for her (and she can focus on activity she’s doing) and much less error prone (no chance to inadvertently end the call or activate some other features like desktop sharing).

Federica making use of the remote. This initial version had an external battery as power supply, hence the black wire.

Addressing the problem

Each video-call platform has its own peculiarities and I wanted to provide a single remote to my user (Federica). I briefly looked for APIs to interact with Skype, Zoom and Teams (the three platform I needed to support) but with scarce success. As far as I know, being an attendee and not the organizer, it is not immediate to interface with the client application from the outside (let’s say a companion app running on the same machine).

Dealing with the microphone settings at OS level seemed a no-way since the start: even being able to globally mute/unmute the mic would not change the corresponding mic state in the specific app (Skype).

Then I tried a simplistic approach. Most of these video-call platforms have their own applications provided with keyboard shortcuts (for the pleasure of keyboard-master final users as well as for accessibility concerns). One specific shortcut provided is to the function “Toggle mute during a call” (Ctrl+Shift+M on Microsoft Teams, Ctrl+M on Skype, Ctrl+Shift+A on Zoom).

The idea is simple: to build a remote (wireless remote she can keep on her desk or nearby) and trigger the “Toggle mute” keyboard shortcut (replicating the right key combination depending on the current application) when she push a button on the remote.

Hardware

Even though I am a software engineer more than other things, I already had some skills with IoT and specifically with ESP32 and ESP8266 devices. My confidence with electronics is limited but I am capable of dealing with buttons, switches, LEDs, sensors and other amenities you can get with any (cheap) IoT kit available on Amazon.

You can program ESP32 and ESP8266 in C/C++ and I’ve used the Arduino IDE to program the chip. For simplicity and because I had a couple of spare, I’ve used the development board (you can easily find it on Amazon too) that is cheap (€ 7-8) but not cheaper of the board itself, obviously.

ESP32 Dev Board

Given this is a home application, I’ve gone for WiFi connectivity (my laptop also connects to the Internet through an existing local WiFi network).

I spent a night building the physical remote (reusing a plastic box I had, setting physical buttons on it and placing an ESP8266 development board and a LiPo battery on the inside). I have to admit my manual skills need some improvements and I had to change my project a couple of times before making everything fit.

Software

The ESP8266 is easy to configure for WiFI connection and it is trivial to read status of buttons and implement an HTTP GET call towards some HTTP server out there.

Given no configuration was actually needed (my WiFi network, my laptop IP), there is not much to deal with on the remote side.

void connectToWiFi(){
    WiFi.begin(ssid, password);
    delay(100);
    while (WiFi.status() != WL_CONNECTED) {
        delay(500);
    }
}

void sendCommand(int b1, int b2){
  if (WiFi.status() != WL_CONNECTED) {
    connectToWiFi();
  }
  
  WiFiClient client;
  if (!client.connect(host, httpPort)){
    return; // connection failed
  }

  String url = baseURL + String(b1) +"/" + String(b2);
  client.print(
    String("GET ") + url + " HTTP/1.1.\r\n" 
    + "Host: " + host + "\r\n"
    + "Connection: close\r\n\r\n"
   );

   unsigned long timeout = millis();
   while (client.available() == 0) {
    if (millis() - timeout > 2000) {
      client.stop();
      return;
    }
   }
}

void loop()
{
  int tempD5 = digitalRead(D5);
  int tempD6 = digitalRead(D6);
  if (tempD5 + tempD6 < 2) { // at least one down
    digitalWrite(LED_BUILTIN, LOW); // LED on
    sendCommand(tempD6, tempD5);
    delay(250); 
    digitalWrite(LED_BUILTIN, HIGH);// LED off
  }
 
  delay(250); 
}

On the laptop side, I set up a MARS-Curiosity REST server built with Embarcadero Delphi of course. In less than 5 minutes I had my server application capable to respond to requests from the remote.

Actually, there’s not much response to produce for the client. We need to take action and the specific action is to simulate key strokes to trigger the “Toggle mute during a call” shortcut of the current app (Skype, Teams or Zoom). I used the keybd_event Windows API to the purpose together with some simple FindWindow/GetForegroundWindow/GetWindowText API calls to retrieve the handle of the top most window in the z-order and determine the corresponding application.

Considerations

It is actually working (she is using it since a few days) but there’s room for many improvements of course. I like to play with IoT and electronics even if I recognize I am more comfortable with software than hardware. I learned many little things (spacers, wires, connectors, solder, crimper…) are needed if you want to do things properly (well, “home made properly” because the best option would probably be to design a PCB and have it produced by somebody like MD srl).

Building the REST server has been very easy (you can find the RemoteMic project among the MARS’s Demos, specifically you can have a look at the “command” resource implementation).

I really like to play with IoT devices and after months of small (but real) projects, I am becoming more and more confident with them as well as some basic electronics. Will probably do some more stuff in this area in the near future.

Stay tuned 😉

What’s next: a tentative roadmap

  • support “Raise hand” functionality in Microsoft Teams (please vote this uservoice entry!)
  • implement better power management of the remote (deep sleep mode);
  • find some way to integrate with Skype/Teams/Zoom in order to have feedback on the call or other settings;
  • make WiFi and other remote settings configurable (hard coded right now), maybe switching to ESP32 and a BLE interface;
  • make application interface (which application, which shortcut, …) configurable;
  • use of gestures or other creative ways to trigger commands apart from buttons;
  • provide Mac OS X version;
  • provide a better external case for the remote;

3 thoughts on “Building a remote for Microsoft Teams, Skype and Zoom calls

  1. Ronaldo says:

    Great story

  2. Magni Angelo says:

    Geniale. Unione strategica fra necessità e soluzione semplice a un esigenza reale.
    Bella la valutazione di crescita per rispondere a ulteriori problematiche correlate.

  3. Ed says:

    Making your own Zoom 😀

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.