AI Author:EqualOcean News Updated 3 hours ago (GMT+8)

Completing another round of hundred-million-level financing, HiDream.ai releases the over-200-billion-parameter native full-modal large model HiDream-O1-Image-Pro.

Zhixiang Weilai Mei Tao

On May 19, HiDream.ai (智象未来) held its first Open Day with the theme "Imaging the World".

At the open day, HiDream.ai (智象未来) officially released the image large model HiDream-O1-Image-Pro, built based on the new generation native full-modal model architecture Unified Transformer (UiT). This native full-modal image large model with over 200 billion parameters has not only refreshed SOTA records in multiple benchmarks, but also marks that HiDream.ai (智象未来) is moving towards the "native full-modal" stage of unified modeling across multiple modalities such as image, video, text, and audio.

At the same time, HiDream.ai (智象未来) announced the completion of a new round of financing worth hundreds of millions, with participation from multiple institutions including Shenzhen Capital Group Co., Ltd. / SCGC (深创投), GP Capital / Jinpu Investment (金浦投资), Caixin Capital (财鑫资本), and Fuju Investment / Fuju Capital (复聚资本).

This marks the second time HiDream.ai (智象未来) has completed financing within half a month, reflecting the capital market's continued optimism regarding the direction of native full-modal large models. With the accelerated integration of frontier technologies such as visual generation and embodied intelligence, world models have become an important direction for AI evolution. HiDream.ai's (智象未来) continuous breakthroughs in underlying model architecture, productization capabilities, and industrial ecosystem layout have also gained further recognition from the market.

The 200B+ parameter large image model HiDream-O1-Image-Pro (HiDream-O1-Image-Pro) has been released, with a comprehensive upgrade to its native full-modal architecture.

Currently, image generation models are moving from the traditional U-Net architecture to the Diffusion Transformer (DiT) era. The mainstream approach represented by Latent Diffusion Models (LDM) has made significant progress in efficiency and generation capabilities by using VAEs to compress images and independent language models to encode text. However, the method of encoding images and text separately also creates natural bottlenecks for the model in areas such as complex semantic understanding, high-fidelity detail restoration, precise text rendering, and multi-task generalization.

Facing this challenge, HiDream.ai (智象未来) officially released the 200B+ parameter closed-source image large model HiDream-O1-Image-Pro, based on a native full-modal architecture. Unlike traditional fragmented, multi-module spliced encoding paradigms, HiDream-O1-Image-Pro unifies raw image pixels, discrete text tokens, and task conditions into a continuous shared token space, achieving deep fusion of image, text, and multi-task conditions at the underlying representation level. This architectural breakthrough further unleashes the model's generation and generalization capabilities, enabling it to reach new SOTA levels in tasks such as general text-to-image generation, high-fidelity text rendering, diverse scene generation, and image editing, demonstrating HiDream.ai's (智象未来) leading exploration in native full-modal large model architectures.

Tao Mei (梅涛), founder and CEO of HiDream.ai (智象未来), stated that HiDream.ai (智象未来) chose the native full-modality path based on the team's long-term judgment during the process of combining visual generation with the physical world: "Currently, many 'multimodal large models' are essentially 'unimodal splicing.' Native multimodality, on the other hand, involves engraving the 'rules of the world' into the model from the very beginning—it knows physical laws, spatial relationships, and causal logic, so it can truly understand, reason about, and reconstruct the world, rather than just 'generating content.' Therefore, we judge that native full-modality is the necessary path to achieving AGI."


Ting Yao (姚霆), Co-founder and CTO of HiDream.ai (智象未来), introduced that recently, the 8B parameter open-source version of HiDream-O1-Image, which adopts a native full-modal architecture, topped the global open-source model rankings on the text-to-image list of the well-known independent evaluation platform Artificial Analysis. Its performance surpassed mainstream open-source models such as Z-Image Turbo, Qwen-Image, and FLUX.2 [dev], making it the model version with the smallest publicly disclosed parameter count among the top 20 on the list. The HiDream-O1-Image-Pro released this time is a closed-source version with over 200 billion parameters. It has comprehensively established a new SOTA in tasks such as complex text rendering, instruction editing, and multi-subject personalization, fully validating the immense scalability of the native full-modal architecture paradigm.

Ting Yao (姚霆) stated: "Under the native full-modality (UiT) architecture, all modalities have grown up together from the initial stage like childhood sweethearts. The benefit of this is that after all modalities are interconnected, it can truly achieve 'Any to Any', where any input supports any output. This is also the capability required by world models—to understand, generate, and predict different states of the real world within a unified architecture."

From Visual Generation to World Models: Industry Discusses Key Paths to AGI

Currently, the focus of competition among large models is shifting from language understanding and content generation to the understanding, generation, and prediction of the real physical world. Around world models, various technical routes have emerged within the industry, but the shared goal is consistent: to enable AI not only to generate content but also to establish internal representation capabilities for world states and their patterns of change.

During the Open Day roundtable forum, Bing Wang (王兵), Partner at Oriental Fortune Capital (东方富海); Jianlong Fu (傅剑龙), Principal Researcher at Microsoft Research Asia; Jiangbin Ning (宁江彬), Senior Solution Director at Alibaba Cloud (阿里云); Yingwei Pan (潘颖伟), Technology Partner at HiDream.ai; and Hong Hu, initiator of AI Nao, engaged in a dialogue centered on "From Multimodal to Omni-modal: Building World Models and Moving Towards AGI." The guests shared their insights on the development path of world models from perspectives including AI investment, embodied intelligence, AI infrastructure, and native omni-modal technology practices.

The attending guests believe that AI is moving from "generating content" to "understanding the world." The convergence of visual generation, Agent, embodied intelligence, and multimodal models points to the same key capability: whether the model can understand environmental states under different modalities, predict state changes, and form a unified cross-modal representation.

Therefore, visual generation is not just a content production tool. It inherently requires learning spatial structures, object relationships, motion trajectories, and state changes, and also possesses the foundation to extend towards world models. The value of a native full-modal architecture lies precisely in providing a unified modeling framework for images, videos, text, audio, and even action and embodied data, enabling the model to move from single-point modal capabilities to more complete world modeling capabilities.

Completed multiple financing rounds within half a month, the three major Agent products continue to expand the commercial ecosystem

Not long ago, HiDream.ai (智象未来) announced the completion of over 500 million yuan in financing, with a lineup of shareholders covering top investment institutions such as the provincial industrial investment company under Anhui Investment Group (安徽省投资集团旗下省产业投资公司), Hefei Industrial Investment Holdings (Group) Co., Ltd. (合肥产投), and Oriental Fortune Capital (东方富海). At the open day, HiDream.ai (智象未来) revealed that the company's financing is continuing to accelerate, completing another round of financing within half a month participated in by Shenzhen Capital Group Co., Ltd. / SCGC (深创投), GP Capital / Jinpu Investment (金浦投资), Caixin Capital (财鑫资本), and Fuju Investment / Fuju Capital (复聚资本).

Public information shows that GP Capital / Jinpu Investment (金浦投资) is the manager of the Shanghai Financial Development Investment Fund. 13 companies in the first fund's investment projects have achieved listings through IPOs or M&A, and it has deeply laid out its presence in multiple AI frontier fields such as computing infrastructure, large models, and agent applications. Caixin Capital is the core industrial investment platform under Changde's state-owned enterprise Caixin Group, dedicated to serving the real economy and promoting technological innovation through capital power, focusing on investments in hard tech fields with clear industrial implementation prospects such as artificial intelligence and embodied intelligence. Fuju Investment / Fuju Capital specializes in value discovery of leading enterprises in frontier niche sectors, with extensive layouts in strategic emerging industries such as intelligent manufacturing, new energy, new materials, biomedicine, and artificial intelligence. With the entry of new investors such as Shenzhen Capital Group Co., Ltd. / SCGC, GP Capital / Jinpu Investment, Caixin Capital, and Fuju Investment / Fuju Capital, HiDream.ai has formed a diversified capital lineup featuring continuous follow-up from industrial funds in Anhui, Shanghai, Hunan, Hangzhou, and other regions, and participation from top-tier market-oriented VCs such as Shenzhen Capital Group Co., Ltd. / SCGC, Oriental Fortune Capital, Fenghua Capital, and Dunhong Capital.

While accelerating the pace of financing, HiDream.ai (智象未来) has built a "Model + Agent" dual-drive strategy that uses models as the foundation and agent applications as the wheels to drive technology implementation and monetization. It has also formed a clear "1+1+3" business architecture: the bottom layer is 1 HiDream series large model, the middle layer is 1 capability middle platform (HiHarness enterprise service platform), and the upper layer agent applications cover 3 core scenarios: commercial marketing, film and television creation, and social media creation.

At the Open Day, three product heads from HiDream.ai (智象未来) respectively introduced the progress of their agent application products, comprehensively demonstrating the company's "combat readiness" in commercial implementation. The commercial marketing agent HiBurst has covered scenarios such as cross-border e-commerce content marketing, media operations, and app globalization. It supports mainstream platforms like TikTok, Meta, Douyin, and Xiaohongshu, and has become a TikTok official top 5 service provider. It produces over one million e-commerce marketing videos annually, covering a GMV exceeding 100 million yuan. The world's first professional-grade AI film creation and collaboration agent—"ZanZan" (帧赞)—provides professional film and television creation teams with collaborative tools that balance high quality and high efficiency, thanks to core capabilities in movie-grade image generation and the full-process integration of "idea-storyboard-final cut." The platform has cumulatively produced over 5,000 minutes of short drama series, with over 1,000 professional teams and ecosystem partners onboarded. The social media creation agent vivago recently completed a product upgrade and, leveraging its end-to-end long-thinking capability to stably output minute-level story videos, quickly climbed to the top of the Product Hunt daily chart. Currently, vivago has covered over 40 million professional and individual users in more than 100 countries and regions.

At the event, HiDream.ai (智象未来) announced strategic partnerships with Shanghai Film New Vision Fund (上影新视野基金), BlueFocus (蓝色光标), Jetsen Century (捷成世纪), and Bei’er Health / Better Health (倍尔健康). The parties will carry out in-depth cooperation around areas such as large model capability utilization, intelligent agent application development, and co-creation of industry scenarios, jointly promoting the industrial implementation of native all-modal large models in multiple tracks including film and television creation, commercial marketing, cross-border e-commerce, IP operation, and healthcare.

From visual generation to building the world

From the release of HiDream-O1-Image-Pro, to the implementation of three major agent products, and then to ecological cooperation with industry partners, HiDream.ai (智象未来) is forming a clear path: based on a native full-modal architecture, continuously improving visual generation capabilities, and further evolving towards the unified understanding, generation, and prediction capabilities required by world models.

This is also the "Imaging the World" emphasized by HiDream.ai (智象未来): not just stopping at "generating visual content," but through native full-modal modeling, enabling AI to gradually possess the ability to understand, generate, and construct the world. In the future, HiDream.ai (智象未来) will continue to center around the UiT native full-modal architecture, promoting the synergistic evolution of models, agents, and industrial scenarios, moving towards a more complete world model.