Facebook is investing a lot of time and money in augmented reality, including collaborating with Ray-Ban to create its own AR glasses. These devices can only record and exchange images for now, but what do you believe they’ll be used for in the future, according to the company?

The scale of Facebook’s objectives is revealed in a new research project led by the company’s AI unit. It envisions AI systems constantly analyzing people’s life through first-person video, capturing what they see, do, and hear in order to assist them with daily activities.

“Episodic memory” (answering inquiries like “where did I leave my keys?”) and “audio-visual diarization” are two of the skills Facebook researchers want these systems to develop (remembering who said what when).

At this time, no AI system can successfully do the duties mentioned above, and Facebook emphasizes that this is a research initiative rather than a commercial venture. However, it’s evident that the business sees this type of functionality as the future of augmented reality computing.

“Definitely, thinking about augmented reality and what we’d like to be able to accomplish with it,” Facebook AI research scientist Kristen Grauman, “there are possibilities down the line that we’d be using this kind of study.”

Such objectives have far-reaching ramifications for privacy. Experts are already concerned about how Facebook’s augmented reality glasses enable users to secretly record members of the public. Such fears will only grow if future iterations of the device not only record but also analyze and transcribe footage, thereby turning wearers into walking surveillance machines.

The first commercial AR glasses from Facebook can only record and share movies and photographs, not analyze them.

Ego4D refers to the analysis of first-person, or “egocentric,” video, and it is the name of Facebook’s research project. It has two primary components: an open dataset of egocentric video and a set of benchmarks that Facebook believes AI systems will be able to handle in the future.

The dataset is the largest of its sort ever created, and it was compiled in collaboration with 13 universities around the world. A total of 3,205 hours of film was recorded by 855 people from nine different nations. The colleges, not Facebook, were in charge of gathering the information.

Participants, some of whom were compensated, wore GoPro cameras and AR glasses to capture unscripted activities on video. This includes everything from construction labor to baking to pet care and chatting with friends. The colleges de-identified all of the footage, which includes obscuring onlooker faces and eliminating any personally identifiable information.

The dataset is the “first of its sort in both scale and diversity,” according to Grauman. According to her, the closest comparable project has 100 hours of first-person film shot solely in kitchens. “We’ve opened these AI systems’ eyes to footage from Saudi Arabia, Tokyo, Los Angeles, and Colombia, as well as kitchens in the UK and Sicily.”

Ego4D’s second component is a set of benchmarks, or tasks, that Facebook wants researchers all across the world to try to answer using AI systems trained on its dataset. These are described as follows by the company:

Episodic memory: What happened when (e.g., “Where did I leave my keys?”)?

Forecasting: What am I likely to do next (e.g., “Wait, you’ve already added salt to this recipe”)?

Hand and object manipulation: What am I doing (e.g., “Teach me how to play the drums”)?

Audio-visual diarization: Who said what when (e.g., “What was the main topic during class?”)?

Social interaction: Who is interacting with whom (e.g., “Help me better hear the person talking to me at this noisy restaurant”)?

Any of these challenges would be extremely tough for AI systems to solve right now, but developing datasets and benchmarks are tried-and-true strategies for accelerating AI development.

Indeed, the creation of ImageNet, a dataset and yearly competition, is often credited with igniting the contemporary AI boom. The ImagetNet datasets contain images of a wide range of things that researchers used to train AI systems to recognize. The winning entry in the competition in 2012 used a specific approach of deep learning to blow past competitors, ushering in the present era of research.

The Ego4D dataset from Facebook should encourage researchers to look at AI systems that can interpret first-person data.

Facebook hopes that its Ego4D project will have a similar impact on the augmented reality realm. According to the business, Ego4D-trained systems could one day be utilized in wearable cameras as well as home helper robots, which rely on first-person cameras to explore the world around them.

“The project has the potential to accelerate work in this subject in a way that hasn’t been possible before,” Grauman says. “To shift our field from the ability to analyze heaps of images and films recorded by humans for a specific purpose to this fluid, ongoing first-person visual stream that AR systems, robots, and other AI systems must comprehend in the context of ongoing action.”

Although the goals outlined by Facebook appear to be feasible, many people are concerned about the company’s involvement in this sector.

Facebook’s track record on privacy is dismal, with data leaks and FTC fines totaling $5 billion. In numerous fields, the corporation has also demonstrated that it prioritizes growth and engagement over the well-being of its users.

With this in mind, it’s concerning that the Ego4D project’s standards lack visible privacy measures. For example, there is no mention of eliminating data about persons who do not wish to be recorded in the “audio-visual diarization” task (transcribing what different people say).

When asked about these concerns, a Facebook spokeswoman said that the company expected privacy safeguards to be implemented in the future. The spokesperson stated, “We hope that to the extent that organizations use this dataset and benchmark to develop commercial applications, they will develop protections for such applications.”

“For example, before AR glasses can enhance someone’s voice, there could be a protocol in place that requires them to ask permission from someone else’s glasses, or they could limit the range of the device to only pick up sounds from people with whom I am already conversing or who are in my immediate vicinity.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here