Here’s an interesting experiment Google is kicking off on its smart displays: voice input without a hotword. A video detailing the feature is on YouTube from Jan Boromeusz, a Nest Home hacker who has a proven track record of scoring early smart display features before they’re announced.
Boromeusz’s Nest Hub Max is somehow in “Dogfood” mode, meaning it’s receiving early, non-public builds of the smart display software intended for internal use at Google only. A special menu called “Dogfood Features” lists a “Blue Steel” feature that allows the device to respond to commands without first have to say the hotword “Hey Google” – you just say a command and it will respond. Boromeusz says the device will listen for commands after “detection of presence,” so if someone is in front of the screen, it will just start answering questions.
Today, Google’s voice command hardware listens all the time, but only for the “Hey Google” hotword. Once that’s detected, it starts processing additional commands. The more modern implementations also use the hotword as the termination point for connecting to the Internet: “Hey Google” detection is handled locally, and everything after that is uploaded, processed, and stored on Google’s servers. The hotword also acts as a form of permission, not only uploading the following words to the internet, but also because it would be annoying to have the device listening all the time and responding to anything that could possibly be called a command interpreted.
It’s not clear how “presence” is detected for the Blue Steel feature. There’s a camera on the front of the Home Hub Max, which is used for a “Face Match” feature that can identify a user. if you really want to read too much into the Zoolander-inspired codename, “Blue Steel” is about making a face for the camera. However, the smaller Nest Hub doesn’t have a front-facing camera, so this wouldn’t be very scalable across the Nest Hub/Home Hub line. Not to mention that if Google wanted this kind of interaction to become the standard, it probably would want it to work on smart speakers too.
Google also has a more scalable presence detection feature at its disposal: Ultrasonic Sensing. This is sonar: the speakers pump out inaudible sound and record any bounce from an object that has moved in front of the device. The sonar-based person detection would likely work on anything with a microphone and speakers, which would be scalable across the entire smart display and smart speaker line.
We must emphasize that Google is only testing this feature for now, and our information comes from a leaked internal build that Google never wanted to show to the public. There is no indication that Blue Steel will roll out to consumer devices any time soon. Getting the balance right on something like this would be absolutely crucial and would be the difference between “quick and useful” or “annoying and invasive”.
Frame image by Jan Boromeusz