
If there is one product in China’s technology landscape that has transcended industry cycles, it is DJI’s Osmo Pocket 3. Since its release at the end of 2023, this tiny gimbal camera has ignited a global buying frenzy reminiscent of the AirPods Pro launch years earlier. For months, it remained sold out across official channels, with resale prices climbing more than 30 percent. Online, users joked that “buying one at retail price is already a victory.”
But the Pocket 3’s popularity goes far beyond sales figures. It has captured the imagination of professionals and ordinary users alike—from journalists and top-tier video creators to travelers, lifestyle enthusiasts, and parents documenting family trips. In many ways, it has become a symbol of “creative equality” in the age of universal video creation: a tool that allows anyone, not just professionals, to capture cinematic images effortlessly.
The phenomenon surrounding the Pocket 3 reflects a deeper shift in the consumer electronics market. DJI’s success in ground-based imaging has created a new growth segment that smartphone giants can no longer ignore. According to industry sources, leading Chinese smartphone makers including OPPO, vivo, Xiaomi, and Honor have all initiated internal projects to develop Pocket-style imaging devices, with commercial releases expected as early as mid-2026. A new race to challenge the Pocket 3 has quietly begun.
At the same time, questions persist. In an age when smartphone cameras boast advanced stabilization and computational photography, why does a separate handheld camera even need to exist? To answer that, one must look beyond the Pocket 3 itself and trace the seven-year evolution of the series, from its experimental beginnings to its current status as a cultural icon.
When DJI launched its first Osmo in 2015, the company’s reputation rested on drones, not handheld devices. Yet the idea of ground-based stabilization came from users rather than engineers. Enthusiasts had begun removing the Zenmuse gimbal cameras from DJI’s Inspire drones and attaching them to improvised 3D-printed grips to capture stable shots on the ground. The results were crude but revealing. They demonstrated that the same technology that stabilized aerial footage could revolutionize handheld shooting.
DJI’s product team quickly recognized the opportunity. The first-generation Osmo combined the company’s three-axis gimbal technology with a camera into a single handheld device. It offered a groundbreaking promise: the ability to shoot drone-quality stabilized footage on land. The product impressed professionals and filmmakers, but its high price and bulky design limited its audience. It was a technical milestone but not yet a mass-market success.
The team realized that what users truly needed was not just stable footage, but stability that fits in a pocket. That insight led to the birth of the Pocket series.
The first Osmo Pocket, released in 2018, arrived just before the global explosion of vlogging. The concept of Vlog was still new in China; few users even knew the term. But DJI sensed a coming shift. Leveraging its miniaturization breakthroughs from drone projects like the Spark, the company created a gimbal camera small enough to fit in one hand and intuitive enough for anyone to use.
The original Pocket condensed a three-axis stabilizer, a camera, and a tiny screen into a lipstick-sized body. It directly addressed the unsolved contradiction between video quality, stability, and portability that smartphones of the time could not overcome. When it launched, it became an instant favorite among early vloggers and tech enthusiasts. Though imperfect—its fixed-focus lens made selfies blurry and its audio quality was poor—it validated a new product category. It wasn’t just a gadget; it was a foundation on which a new content ecosystem would grow.
Two years later, in 2020, DJI released the Pocket 2. By then, the short-video and vlog boom had exploded in the wake of the pandemic. Platforms like Bilibili saw record surges in creators, and high-quality video tools became essential. DJI, staying true to its philosophy of patient iteration, refined the Pocket 2 into a true all-round creative tool. It added a wider 20mm lens for better framing, a four-microphone array for clearer audio, and an optional creator combo kit that included a wireless mic and mini tripod. For the first time, users could film, record, and share professional-quality vlogs using a device that fit in the palm of their hand. The Pocket 2 elevated the series from “usable” to “delightful,” winning widespread adoption among professional and amateur creators alike.
Then came a three-year silence—a period that DJI’s engineers later described as one of “grinding through three great mountains.” The first was image quality. Earlier models, with their 1/2.3-inch and 1/1.7-inch sensors, were easily surpassed by smartphones within a year or two. Users demanded a leap, not an upgrade. The team realized that the only way forward was to use a 1-inch CMOS sensor—an industry benchmark for premium imaging. Yet power consumption, heat, and space constraints made integration nearly impossible. DJI waited patiently for the right component: a new-generation, low-power 1-inch stacked CMOS developed through smartphone supply chains. Once available, it became the heart of the Pocket 3, giving the camera a decisive edge that phones could not match.
The second mountain was usability. Earlier Pocket models had been praised for engineering brilliance but criticized for their tiny screens. The Pocket 3 team refused to accept that limitation. Instead of enlarging the body, they designed a rotating two-inch touchscreen that switched instantly between vertical and horizontal orientation, expanding viewability fourfold. The mechanism turned a functional adjustment into a tactile pleasure—an act as satisfying as flicking open a Zippo lighter.
The third mountain was aesthetics—specifically, color science tuned for people, not landscapes. For the first time, DJI made skin tone rendering the top priority in color calibration. Engineers studied the tonal preferences of creators, particularly women, who favored Canon’s rosy white color palette. The team conducted blind tests and concluded that perception, not metrics, defined good skin tone. They developed a proprietary color algorithm that gave subjects a natural yet cinematic look. The impact was immediate. Across TikTok, Red Note, and Meitu, users began adopting Pocket 3 filters to emulate its color profile—a sign that the device had reshaped the aesthetics of a generation.
With these breakthroughs, the Pocket 3 became a phenomenon. Within months of its launch in late 2023, it was nearly impossible to find in stock anywhere in the world. The media dubbed it the “Moutai of electronics”—a nod to China’s most coveted liquor—capturing its symbolic status as a premium yet populist product.
Its success stemmed from its perfect balance of three core qualities: stability, image quality, and ease of use. It was sophisticated enough for national broadcasters and professional filmmakers, yet accessible enough for travelers and parents. For millions, it redefined what “recording life” could look like. The Pocket 3 became not just a product but an expression of cultural aspiration—a convergence of technology, creativity, and self-expression.
Competitors have since entered the race. GoPro, once dominant in the action-camera market, long dismissed the Pocket as a niche gadget for vloggers. Its focus on extreme sports made it blind to the growing demand for lightweight lifestyle recording. When the Pocket 3 exploded in popularity, GoPro realized that it had lost users who wanted professional stability without bulk. Similarly, Sony’s ZV1, once the go-to vlog camera, found itself overshadowed by DJI’s mechanical gimbal advantage.
Now, as smartphone brands prepare to release their own Pocket-like devices, DJI faces a new wave of competition. But as the series’ engineers point out, imitation does not equal parity. Beneath the Pocket’s sleek exterior lies a moat built over years of technical and experiential accumulation.
The first barrier is gimbal algorithm know-how. DJI’s expertise stems from its Ronin professional stabilizer line and drone technology. The challenge lies not in stabilization alone, but in predicting user intent—distinguishing a deliberate camera movement from an accidental shake within milliseconds. Achieving this requires millions of hours of motion data and continuous model optimization. It is a skill that cannot be reverse-engineered quickly.
The second barrier is video-centric experience. Smartphone companies devote most of their imaging resources to photography, optimizing for snapshots, not long-form video. In contrast, DJI’s DNA is video first. Its deep mastery of dynamic range, color grading, and codecs such as D-Log M gives the Pocket series a consistent cinematic look that smartphones struggle to reproduce.
The third barrier is physics. The Pocket’s three-axis gimbal isolates the sensor from body movement, maintaining true mechanical stability. A smartphone, no matter how advanced its sensor or software, cannot escape the physical vibration of handheld movement. Electronic stabilization may reduce blur, but it cannot eliminate the micro-jitters and rolling distortions that occur during walking or running. The Pocket’s stability is thus not just a feature but a physical advantage—one rooted in engineering rather than algorithms.
Finally, DJI’s ecosystem forms an invisible moat. Years of hardware development have produced a suite of accessories and integration tools, from wireless microphones to creator kits. The seamless pairing of the Pocket 3 with the DJI Mic system has redefined audio standards for content creation.
By 2025, the Pocket 3 stands not merely as a bestseller but as a manifestation of DJI’s long-term philosophy: to create tools that empower creativity rather than chase trends. Its evolution from Osmo to Pocket mirrors the company’s broader shift from defining technologies to defining cultural habits.
Source: DJI, Geekpark, Helico Micro, Flying Eye



