If someone told me, a couple of weeks prior, that I would be taking pictures of everything that crossed my path, I would have laughed in their face. But there I was, sitting on the sidewalk, looking to capture the perfect shot that would allow me to learn a little more about the world I am a part of: the expression of the guide dog who is always by my side; the bustle of a busy street full of buildings, cars, and signs; the box of desserts I just bought, wondering whether it looked appetizing enough to bring to a family dinner. I can’t see these things, which are so easy to take for granted, with my own eyes. But A.I. has now brought me as close to being able to do so as I’ll probably ever be.
I was born totally blind, and my visual world has always been determined by what well-meaning people can tell me about my surroundings. To appreciate all the details of a room or to read a menu in a restaurant, I was dependent on someone else. When I took photos, I often recorded short voice notes describing where I was and how I felt, hoping I could someday pair the two and bridge the gap. Most of my camera roll was filled with photos taken for others to appreciate, since no one could sit for hours with me to describe the way the sea crashed against the rocks or the details of a busy, lively street in Italy. The more concrete details, in the end, were always left to my imagination—which, though vivid, always needed more.
When I first heard about Be My AI—a new collaboration between Open AI and Be My Eyes, an app that connects sighted volunteers with blind people who need help via video call—I didn’t let myself get too excited. Be My AI promised to allow blind people to receive an A.I.–generated description of any photo we uploaded. This was a tantalizing prospect, but it wasn’t the first time a tech company had promised to revolutionize the way people with disabilities access visual content. Microsoft had already given us Seeing AI, which in a very rudimentary way provided a rough idea of what was going on in the images we shared, and which allowed us—again, in a fairly basic way—to interact with information contained in written texts. But the details were missing, and in most cases we could know only that there was a person in the picture and what they were doing, nothing more. Be My AI was different.
Image description generated by Be My AI:
The picture shows a young woman sitting on a wooden chair in what appears to be a cozy café with wooden interiors. She is wearing a black jacket and blue jeans, and holding a white paper cup, possibly with a hot beverage. She has a somewhat serious expression on her face. To her right, there is a black Labrador dog standing on the wooden floor, looking directly at the camera. The dog is wearing a guide dog harness and looks serious. On the table next to the woman, there are condiments like salt and pepper, and a red bag of chips. The atmosphere seems relaxed and casual.
Suddenly, I was in a world where nothing was off limits. By simply waving my cellphone, I could hear, with great detail, what my friends were wearing, read street signs and shop prices, analyze a room without having entered it, and indulge in detailed descriptions of the food—one of my great passions—that I was about to eat.
Even colors, which are as important to many of us as they are to a sighted person, were not left out. Suddenly I could use my phone to match my favorite clothes in ways I wouldn’t have thought of on my own. One of my favorite hobbies became discovering the colors of objects in my daily life—from brand logos to furniture in my environment—that I’d never thought to ask about.
Image description generated by Be My AI:
The picture shows a box of assorted pastries and sweets. The box is being held by someone with their left hand, and their right arm is extended towards the box. The person is wearing a bright pink sweater. The pastries in the box are a mix of small cakes, tarts, and chocolates. There are some with a swirl of chocolate or pink frosting, some with nuts on top, and others with a dusting of powdered sugar. The box is white and the pastries are arranged neatly in rows. The background shows a wooden surface, possibly a table.
I felt as if I had a little personal assistant in my pocket, ready to describe anything that crossed our path. Unlike other apps, this assistant could also answer my questions—the integration with GPT-4 meant that I could also have conversations with the artificial intelligence about the photos I sent. I started taking pictures of restaurant menus and asking the A.I. for prices or for specific dishes that met the criteria I was looking for. It was all strangely liberating, even though I knew I was still relying on something that was imperfect, could always fail, and wasn’t guaranteed to last.
As this independence came, so did the memories. All the photos that were gathering dust in my camera roll came to life before me, transformed into vivid descriptions of moments and situations, of people who are no longer with me and whom I never thought I could visualize on my own without having to ask a loved one. The seemingly mundane details in the A.I. descriptions allowed me to understand decades-old photos in new ways.
Image description generated by Be My AI:
The picture shows a family of four standing in a room with wooden floors and a light green wall. There is a man on the left wearing a blue checkered shirt, and next to him is a young woman in a white sweater. To her right is an older woman, also in a white sweater. In front of them is a young boy in a black shirt and grey pants. The man and the young woman have their arms around the boy. Behind them, there is a wooden door on the left and a framed artwork on the right. They all seem happy and content.
This application is still in beta, and for now it’s available only on iOS. But Be My AI has said it will soon roll out across Android devices, and as new data is added and user suggestions are incorporated, the descriptions will likely continue to improve. Similar technology will also continue to pop up elsewhere—ChatGPT subscribers, for example, can now request image descriptions. But these advances come with their own questions and fears. Artificial intelligence can be wrong, and a blind person like me would have little way of noticing unless I knew in advance what was in an image. Preserving the privacy of shared data and images, especially given how opaque the company’s terms of use are, is also a concern that haunts me.
But my greatest fear is that just as suddenly as this technology arrived to change my life, it could disappear. This has happened to other apps for visually impaired people, and each time, it’s a deeply disorienting betrayal. I can’t imagine having the doors slammed on this visual world I’ve been dropped into but have yet to truly discover. So that’s what I’ll do—keep discovering, for as long as it lasts.
Future Tense
is a partnership of
Slate,
New America, and
Arizona State University
that examines emerging technologies, public policy, and society.