DreamTech Launches SORA-like 3D Generation Model, Secures Tens of Millions Financing

EqualOcean reports that the AI startup "DreamTech" has successfully completed two rounds of angel financing, raising tens of millions of yuan. The initial angel round was led by Oriza Holdings, TUSStar and Yun Angel Fund participating. The subsequent angel+ round saw exclusive investment from Initial Capital.

Just as advancements in large language model technology have driven the rise of text generation AI like ChatGPT, similar progress in large model technology is advancing generative AI in images, videos, and 3D content. "DreamTech" is an AI startup focusing on native 3D generation, officially commencing operations in December 2023. The CEO, Dr. Zhang Feihu, holds a Ph.D. from Oxford University, and the founding team includes members of the UK's Royal Academy of Engineering, national-level young talents, and founding members of Tencent Meeting. They have previously worked at leading companies such as Apple, Tencent, and Baidu. The founding team members have successfully established several benchmark companies in the 3D field, which were acquired by industry giants such as Apple, Google, and Bosch.

In addition to text content, generative AI initially impacted the generation of 2D images. Since 2022, companies specializing in AI-generated images, such as Midjourney and Stability AI, have rapidly emerged, leading to a booming text-to-image field. Applications like OpenAI's SORA, Luma Dream Machine, and Kuaishou's Keling, focusing on AI-generated videos, have also become hot topics. Today, there are already several foundational models in the text-to-image and text-to-video fields, with technology routes becoming more convergent. However, compared to 2D images and videos, the field of 3D content generation is at an earlier development stage, with technological routes still under exploration.

To generate a 3D model using AI, there are mainly two technical routes: 2D-to-3D upscaling and native 3D. Previously, most companies adopted the 2D-to-3D upscaling route. Specifically, this method involves converting text or single 2D images to multi-view images, followed by reconstruction to obtain a 3D model. The advantage is that it can be fine-tuned based on existing image generation models (such as Stable Diffusion), making training easier. However, the process is complex, and cumulative errors can lead to issues like distortions and multiple heads in the generated 3D models.

Additionally, since 2D images inherently lack 3D information and the model architecture for 2D upscaling is primarily optimized for 2D data processing, it cannot scale up in the same way as large language models. The quality of 3D generation has reached a bottleneck, and even increasing model parameters and training data does not significantly improve the quality of 3D generation.

In contrast, native 3D uses pure 3D data for training. The advantage is that since the training data and optimization targets are original 3D models, the quality and effects of the generated 3D models are better, more closely matching the quality of the original 3D models, and are better suited for handling complex scenes.