At the OpenAI Developer Day in London, OpenAI disclosed the five core capabilities of the o1 model, including image understanding. The spokesperson hinted that the image model would soon see significant enhancements. The live demonstrations showcased the development of applications such as drone operation, phone ordering, and explanations of the solar system, thrilling all attending developers. Recently, it appears that the full version of o1's image understanding capabilities was inadvertently released early. Many users have shared their experiences with o1’s new image understanding features, leading to a wave of multivariate trials flooding the internet. A series of evidence suggests that the formal release of o1 may be imminent.
AI models are continuously being upgraded and optimized, just like XXAI, which has improved its model by integrating popular AI platforms into one cohesive solution, achieving a comprehensive resolution for all problems without raising prices. The emergence of new AI models is increasing, similar to the previously impressive red_panda. We are paying great attention to the AI models in the market and enthusiastically welcoming their arrival!
During the OpenAI Developer Day in London, Developer Experience Lead Romain Huet showcased the o1 model.
Some highlights from the live demonstrations included:
Using o1 mini in conjunction with Cursor, a fully interactive application to operate a drone performing backflips was developed in under two minutes.
Constructing a real-time AI voice agent using RealTimeAPI to place food orders with vendors.
OpenAI's Product Lead Olivier Godement provided insights into new features of the o1 model at the OpenAI Developer Day. These features include function calling, developer messaging, streaming, structured outputs, and image understanding. The spokesperson indicated that significant advancements in the image model are on the horizon, and the community is eagerly anticipating these breakthroughs.
The full version of o1’s image understanding capabilities was prematurely released. Users reported that the o1 model can recognize images and make inferential summaries.
Interestingly, the release of the image understanding feature was not officially announced and might have resulted from an OpenAI backend service failure that has yet to be patched. Nevertheless, users seized the opportunity to conduct extensive tests on o1's image understanding features.
Below are some test results:
The o1 model successfully interpreted what the images represented.
An analysis of a submarine fiber optic communication cable map showed that the o1 model accurately deduced that these cables span oceans, connecting different continents and regions around the world.
While the inference speed of o1's image understanding is commendable, it is apparent that the model has not yet fully achieved comprehensive multimodal understanding.
The o1 model is currently unable to "read" videos.
It also faces challenges in resolving certain visual problems.
Q: Should we expect models like o1 or larger-scale models?
A: We hope to enhance the performance of large language models comprehensively, but this reasoning process is crucial. While I can't disclose too many specifics, I anticipate breakthrough progress in the visual modeling field.
Q: To what extent will there be technological integration? How should AI startups based on OpenAI's products plan their development?
A: I advise founders to create companies that can leverage the advantages of current large language models while also preparing for greater development potential with future model upgrades.
Q: What is an AI agent?
A: An AI agent is a system that can undertake long-term tasks with minimal supervision during execution. I believe Harrison Chase's definition in the Langchain blog is more rigorous, but from a commercial perspective, it is practical.
Q: What can AI agents do?
A: They can perform tasks that humans are incapable of due to limitations, such as simultaneously conversing with 300 restaurants to gather information instantly. Alternatively, they can act as highly intelligent colleagues, to whom you can confidently delegate work tasks for a day or a week.
Honestly, I dislike the term "agentic." Let's brainstorm together and come up with a new term!