Meta’s Fundamental AI Research (FAIR) team just rolled out five major innovations that bring machines closer to perceiving and interacting with the world like humans. These advancements span visual perception, language modeling, robotics, and collaborative AI — all feeding into the company’s overarching mission to build advanced machine intelligence (AMI).
Seeing Like Humans: The Perception Encoder
At the heart of Meta’s new releases is the Perception Encoder — a large-scale vision encoder that allows AI to excel in both image and video tasks. It acts as the “eyes” of intelligent systems, enabling them to recognize fine-grained visual details such as a stingray hidden in sand or a distant goldfinch in the background.
This encoder not only surpasses existing open-source and proprietary models in zero-shot classification and retrieval but also enhances performance on visual-language tasks like captioning, visual question answering, and understanding spatial relationships. When paired with large language models (LLMs), it shows promising potential in multimodal AI applications.
Open-Source Vision-Language AI: The Perception Language Model
Alongside the encoder, Meta introduced the Perception Language Model (PLM), a vision-language model trained on open and synthetic datasets without relying on proprietary model distillation. To bolster PLM’s performance in video tasks, FAIR compiled a new dataset of 2.5 million labeled samples — the largest of its kind — focused on nuanced video understanding.
PLM comes in 1B, 3B, and 8B parameter sizes and is accompanied by PLM-VideoBench, a new benchmark to test fine-grained activity recognition and spatio-temporal reasoning. Together, these tools aim to drive progress in open research on vision-language learning.
Robots That Understand the World: Meta Locate 3D
Meta Locate 3D bridges language and spatial perception, allowing robots to understand and respond to complex language queries in a 3D environment. Using RGB-D sensors and a three-part system (2D-to-3D preprocessing, 3D-JEPA encoder, and Locate 3D decoder), the model can identify specific objects based on context — like distinguishing between “a vase on a table” and “a vase near the TV console.”
This technology, supported by a new dataset of over 130,000 annotations across 1,346 scenes, will play a vital role in enhancing human-robot interaction, especially within Meta’s own robotics initiatives.
Rethinking Language Models: Byte-Level Transformers
Meta also introduced the Dynamic Byte Latent Transformer, an 8-billion-parameter language model that ditches traditional tokenization in favor of byte-level processing. This change enhances the model’s robustness, especially in handling typos, rare words, and adversarial inputs.
It outperforms token-based models across various benchmarks, with significant gains in perturbed reasoning tasks. With open weights and code, this release invites the AI community to explore a novel approach to natural language understanding.
Teaching AI to Collaborate: The Collaborative Reasoner
Finally, the Collaborative Reasoner tackles the challenge of socially intelligent AI. Meta wants its agents to help with tasks like homework or interview prep, but that requires more than knowledge — it requires communication, empathy, and the ability to reason with others.
This framework focuses on multi-turn dialogue between two agents to evaluate skills like constructive disagreement, persuasion, and reaching mutual solutions. Initial tests showed that today’s models struggle in these areas, so Meta introduced a self-improvement technique using synthetic conversations — powered by its new Matrix inference engine — which led to up to 29% improvements in performance.
A New Era of AI Building Blocks
Together, these five innovations showcase Meta’s ambitious push to create AI that doesn’t just calculate — it sees, speaks, understands, and collaborates. By releasing these tools to the open-source community, Meta invites researchers around the world to help shape the next generation of intelligent machines.
Sources: https://www.artificialintelligence-news.com/news/meta-fair-advances-human-like-ai-five-major-releases/,https://www.prnewswire.com/news-releases/meta-reports-second-quarter-2023-results-301886658.html