Researchers have found new ways for smart speakers to track a user’s movement around a room, including using updated Soviet-era spy technology.
As smart speakers gradually fill up millions of people’s homes across the world, the technology and hardware within them are advancing at a rapid pace. Yet one thing that they can’t do right now is track a user’s location within a room.
Despite what you might think about the concept from a privacy perspective, the idea would be very attractive to Amazon et al as the devices’ artificial intelligence (AI) would be able to know if you were at your fridge, for example.
Now, researchers at Carnegie Mellon University’s Future Interfaces Group have identified two ways user tracking can be achieved by analysing sound and vibrations.
Chris Harrison and his fellow researchers are set to publish two papers on these methods, one of which employs a modern-day version of eavesdropping technology deployed by the Soviet spy agency – the KGB – in the 1950s.
Called Vibrosight, the technology is capable of detecting vibrations in specific locations in a room using laser vibrometry, similar to light-based devices used by the Soviets to detect vibrations on reflective surfaces such as windows. This allowed them to listen in on the conversations that generated the vibrations.
PhD student Yang Zhang said: “The cool thing about vibration is that it is a by-product of most human activity. The other cool thing is that vibrations are localised to a surface.” This means that any activity can be tracked at a distance and, unlike microphones, each activity wouldn’t interfere with vibrations from another. The researchers also argue that this improves user privacy as, unlike microphones and cameras, vibration monitoring is more discreet.
All that is needed to add this capability is a low-power laser combined with a motorised, steerable mirror, which the team built for $80. Reflective monitors – similar to those worn by cyclists – are then applied to the objects and monitored to an accuracy of 92pc.
Finding familiar sounds
The second proposed method, dubbed Ubicoustics, is a sound-based activity recognition system that uses microphones in most household devices, such as smart speakers, phones and smartwatches. When a device is placed in a room, it could recognise sounds associated with places, such as bedrooms, kitchens, workshops, entrances and offices.
“The main idea here is to leverage the professional sound-effect libraries typically used in the entertainment industry,” said PhD student Gierad Laput.
“They are clean, properly labelled, well segmented and diverse. Plus, we can transform and project them into hundreds of different variations, creating volumes of data perfect for training deep-learning models.”
Unlike the vibration detector, the Ubicoustics solution could be deployed to existing smart speakers as part of a software update, rather than needing to buy a new device. When activated, it would be able to alert you when to move to the next stage of a recipe after hearing the sound of a blender, for example.
Getting the AI to recognise and place sounds in the correct context is challenging, Laput admitted, and the system had an accuracy of about 80pc, only putting it in line with existing human accuracy.
However, better microphones, higher sampling rates and different model architectures all might increase accuracy with further research.