There may be an engineering reason (currently at least) that they are doing that. Because the quantity of data that each device will have to send up to the main AI data centers is gargantuan. It includes “constant” video and audio so the device/system can learn everything about the surroundings at all times. And that will take quite a lot of bandwidth. More bandwidth than is available on a typical cellular connection if these things ever become popular. So they will need to connect to a WiFi connection that has higher bandwidth available. Furthermore, if they were to constantly use cellular bandwidth, the power consumption will be quite high and they will require a larger battery for the mobile scenario. If they connect via a cellphone to use its data connection, then BOTH batteries will drain relatively rapidly under constant use.
All the choices have downsides. Glasses have downsides because not everyone wants to wear glasses, and power requires battery which adds weight. I’ve worn glasses for 50+ years and lighter is better. Always. Phone has downsides, need to “take it out”, again power constraints, size and weight, etc. But everyone already has a phone, so maybe no big deal. Puck has downsides, it’ll need to be big enough for a decent sized battery, if it does AI processing on board, and it will, it’ll need to dissipate heat (like a phone does under heavy use), and field of view, etc. Of course, “everyone” is used to the puck from existing Amazon Alexa and Google/Apple Home devices.
I don’t know if either of the form factors will “work” because I’m not sure the AIs are up to conversational answers yet. If the only interface will be voice then I don’t know if it’ll work well [yet]. I think the screen on the phone is still necessary for a while yet. For example “AI, I want to book a flight to Aruba in a week from Thursday”, reply “There are 19 flights available that day, do you want direct or connection?”, “direct”, “Do you want morning or evening?”, “morning”, “There are 4 flights, one at 6am, one at 7am, one at 9am, and one at 10am, do you want me to book one for you?”, “what are the prices?”, “The 6am one is the lowest at $450, the 10am one is the highest at $940”, “What is the 9am price?”, “It is $650”, “Okay, book it for me an my wife and both kids”, “I can only book one at the $650 price, the other 3 passengers will have a higher price”, “Can you check afternoon prices?”, etc. Now, you can go through this whole conversation, and it could take 10 minutes, or you can simply display the price matrix on a screen and get all the info you need in 15 seconds. Which would you choose?