
Released on April 3, 2026, Wan AI 2.7 by Alibaba quickly gained widespread attention for its cutting-edge features and professional-grade video generation capabilities.
I conducted in-depth research and comparison on Wan AI versions 2.1 to 2.7, focusing on the evolution of its features, performance improvements, and pricing. By the end of this post, you’ll have a clear understanding of how Wan AI has evolved, and which version is right for your needs, whether you're a hobbyist, professional, or enterprise.
If you want to know more about Wan 2.7, you can check out Wan 2.7 Review: Features, Price, Image & Video.
Additionally, Wan AI 2.7 is now available for use on GlobalGPT, allowing users to access these powerful tools seamlessly.

In the Wan Ai series of products:
Wan 2.7 focuses on three major functional leaps:
Superior Multimodal Synchronization: Wan 2.7 aligns sound effects and musical cues with on-screen motion far more naturally than 2.6.
Enhanced Image-to-Video Fidelity: Wan 2.7 is significantly better at maintaining the details of a source image and translating them into fluid, realistic motion without distorting the original subject.
Professional Editing & Teamwork: 2.7 allows teams to work on the same project simultaneously and gain deeper control over AI character actions and scene transitions.
The Value King: For the best price-to-performance ratio, Wan 2.7 is the clear winner; it is 33% cheaper for video ($6.00/min) than older versions while delivering significantly better results across the board.
But Wan 2.5 Preview remains surprisingly relevant with the highest Text-to-Image score (1149 Elo) at the lowest price point.
Wan AI started as a simple AI video generator and has since transformed into an advanced platform with powerful customization options, faster generation times, and professional editing tools.
Wan AI is a family of AI video generation models launched by Alibaba Group under its Tongyi Lab and Wan AI research efforts. The series builds on a unified diffusion transformer architecture designed to handle multiple generation tasks—such as text‑to‑video, image‑to‑video, and instruction‑guided editing—across successive versions.
First publicly released as Wan 2.1 in February 2025, the model suite was made open source, providing complete model weights and inference code under the Apache 2.0 license, and quickly gained traction among developers and researchers on platforms such as GitHub and Hugging Face.

Wan AI models are accessible through official website (https://create.wan.video), open‑source repositories (e.g., GitHub, Hugging Face) and community tools, enabling creators to experiment with state‑of‑the‑art video generation on both research and production workloads.
I will analyze the evolution of Wan AI from three key perspectives: a quick overview of each model's official role, a comparison of technical settings like resolution and duration, and a detailed breakdown of the specific features added in each version.
With the rapid advancements in AI-driven video generation, the Wan series has continuously pushed the boundaries of what’s possible in motion graphics and audio synchronization. Below is an overview of the key models within the Wan series, showcasing their unique capabilities and the features that set them apart in the world of creative production.

As the capabilities of each model in the Wan series continue to evolve, so do their resolution, aspect ratio, and duration options. The table below highlights the specific features for each model, providing a comprehensive overview of their customization options to better suit a variety of creative needs

Model | Resolution Options | Aspect Ratios | Duration Options |
Wan 2.7 | 1080P (VIP), 720P | 16:9, 9:16,4:3, 3:4, 1:1 | 2s—15s |
Wan 2.6 | 1080P (VIP), 720P | 16:9, 9:16,4:3, 3:4, 1:1 | 3s—15s |
Wan 2.5 | 1080P (VIP), 720P, 480P | 16:9, 9:16,4:3, 3:4, 1:1 | 5s, 10s (VIP) |
Wan 2.2 | 1080P (VIP), 480P | 16:9, 9:16, 1:1 | 5s |
Wan 2.1 | No option | 16:9, 9:16,4:3, 3:4, 1:1 | 5s |
Beyond the basic settings, the true power of the Wan series lies in the functional leaps made between releases. Here is a detailed look at how each version, from the initial foundation to the current flagship, has introduced new capabilities to redefine the AI video landscape:
Wan AI 2.1 marked the platform’s entry into AI video generation, focusing on creating basic video outputs from text prompts using simple, pre-defined templates. While revolutionary at its release, it offered limited customization for the video’s look, tone, or structure.
This version transformed the series by introducing multi-shot generation, allowing users to create multiple scenes within a single video for better storytelling. It also significantly boosted rendering speeds and provided basic trimming tools to adjust scenes.
What was added: Scene sequencing, multiple camera shots, and faster processing.
Wan 2.5 focused on high-quality outputs by introducing 1080p resolution and extending video durations up to 5 minutes. This version gave creators much more control over character movements and environmental details, making the results look far more professional.
What was added: 1080p HD rendering, 5-minute video capacity, and advanced character animations.
The standout feature of Wan 2.6 was the introduction of seamless audio-video synchronization, making it a powerful tool for music videos and narrative content. It added specific tools for building structured story arcs, including setups and conflicts.
What was added: Precise audio-sync, narrative structure tools, and immersive sound effects.
The latest flagship version, Wan 2.7, provides total creative freedom with professional-level editing controls over every element, from sound design to custom transitions. It also introduced real-time collaboration, allowing teams to work together on the same project simultaneously.
What was added: Real-time team collaboration, full sound design control, and custom AI behavior settings.
Features / Capabilities | Wan AI 2.1 | Wan AI 2.2 | Wan AI 2.5 | Wan AI 2.6 | Wan AI 2.7 |
Base Video Generation | Yes (T2V) | Yes | Yes | Yes | Yes |
Multi-shot / Multi-scene | No (Single shot) | Yes | Yes | Yes | Yes |
Rendering Speed | Standard | High Speed | Optimized | Optimized | Real-time level |
Editing Tools | None | Basic Trim | Standard | Refined Editing | Pro Controls |
Creative Customization | Very Limited | Sequence Adj. | Character/Env Ctrl | Advanced | Full AI Control |
Audio-Video Sync | No | No | No | Yes | Yes |
Narrative Storytelling | No | No | No | Story Arc Tools | Advanced |
Real-time Collaboration | No | No | No | No | Yes |
To provide an objective look at the series' growth, I will examine the professional benchmark data from Artificial Analysis, a leading authority in AI evaluation. The following sections will break down the evolution of Wan AI’s capabilities in both video generation and static image production.
The data across the various categories (Text to Video, Image to Video, and Audio-integrated tasks) reveals a clear upward trajectory in performance for the Wan AI series, with Wan 2.7 emerging as the new flagship leader in most categories.
Consistent Generational Improvement: There is a significant performance leap from the Wan 2.1 and 2.2 series to the latest 2.6 and 2.7 versions. In the "Image to Video" category, Wan 2.7 (1234 Elo) shows a substantial gain over the older Wan 2.2 A14B (1112 Elo).
Wan 2.7 vs. Wan 2.6: While the two models are neck-and-neck in "Text to Video" (Wan 2.6 slightly leads by 3 points), Wan 2.7 shows its true strength in complex multimodal tasks. Specifically, in "Image to Video (with Audio)", Wan 2.7 (989 Elo) outperforms Wan 2.6 (890 Elo) by nearly 100 points, indicating a massive breakthrough in synchronizing visual and auditory elements.
Specialization in Image-to-Video: The series appears to have a stronger relative performance in Image-to-Video tasks compared to Text-to-Video, suggesting high fidelity in maintaining visual consistency from source images.
Scaling Effects: The data from older versions (A14B vs 5B) confirms that increased parameter counts (scaling from 5B to 14B) correlate directly with higher Elo scores and better video quality.
Model Version | Text to Video | Image to Video | Text to Video (with Audio) | Image to Video (with Audio) |
Wan 2.7 | 1189 | 1234 | 1050 | 989 |
Wan 2.6 | 1192 | 1208 | 1038 | 890 |
Wan 2.5 Preview | 1159 | - | - | - |
Wan 2.2 A14B | 1112 | 1112 | - | - |
Wan 2.1 14B | 1024 | 1000 | - | - |
Wan 2.2 5B | 955 | 985 | - | - |
The benchmark data for the image-focused models reveals a strategic shift in capabilities between the 2.5, 2.6, and 2.7 iterations, specifically highlighting a trade-off between pure generation and functional editing.
Regression in Pure Text-to-Image Generation: Interestingly, the Elo scores for "Text to Image" show a slight downward trend in newer versions. Wan 2.5 Preview holds the highest score at 1149, while the latest Wan 2.7 scores lower at 1105. This suggests that newer versions might be prioritizing other attributes—such as prompt adherence or structural stability—over the specific aesthetic preferences currently favored in the Arena for pure generation.
Significant Breakthrough in Image Editing: While generation scores dipped, the Image Editing capability has seen a massive improvement. Wan 2.7 Pro leads this category with an Elo of 1201, a substantial jump from Wan 2.5 Preview’s 1129. This indicates that the 2.7 series is much more capable of handling complex modifications and maintaining consistency during the editing process.
The "Pro" Advantage: In both categories, the Wan 2.7 Pro variant consistently outperforms the standard Wan 2.7. The gap is particularly noticeable in Image Editing (+20 points), signaling that the Pro model is the preferred choice for professional workflows requiring precision.
Wan 2.6 as a Balanced Middle Ground: Wan 2.6 (specifically the "Text to Image" and "Image" variants) maintains high scores across both generation (1144) and editing (1188), serving as a robust bridge between the 2.5 and 2.7 series.
Model Version | Text to Image (Elo) | Image Editing (Elo) |
Wan 2.7 Pro | 1119 | 1201 |
Wan 2.7 | 1105 | 1181 |
Wan 2.6 Text to Image | 1144 | - |
Wan 2.6 Image | 1135 | 1188 |
Wan 2.5 Preview | 1149 | 1129 |
While performance is a key factor, understanding the cost structure is essential for sustainable production. In this section, I will break down the price for each version of the Wan AI series. By combining these costs with the performance benchmarks discussed earlier, you can determine which model offers the best value and identify which version specifically suits your needs—whether you are an individual creator, a professional designer, or a large-scale enterprise.
Combining the newly provided video pricing data (Price per Minute) with the previous benchmark scores, here is an analysis of the Wan AI video series' cost-efficiency.
Significant Price Reduction in Wan 2.7: Despite being the top performer in Image-to-Video and Audio-integrated tasks, Wan 2.7 is priced at $6.00/min. This represents a 33% price cut compared to the older Wan 2.6 and Wan 2.5 Preview models, which are both priced at $9.00/min.
Entry-Level Economy: The Wan 2.2 5B remains the most affordable option at $1.80/min. It is suitable for high-volume testing where quality (955–985 Elo) is less critical than operating costs.
Model Version | Price ($/min) | T2V Elo (Quality) | I2V Elo (Quality) | Value Assessment |
Wan 2.7 | $6.00 | 1189 | 1234 | Top Tier Efficiency |
Wan 2.6 | $9.00 | 1192 | 1208 | High cost for 2.6 generation |
Wan 2.5 Preview | $9.00 | 1159 | - | Overpriced relative to 2.7 |
Wan 2.2 A14B | $4.80 | 1112 | 1112 | Mid-tier legacy choice |
Wan 2.1 14B | $4.80 | 1024 | 1000 | Low efficiency |
Wan 2.2 5B | $1.80 | 955 | 985 | Budget entry level |

The Best Value Flagship: Wan 2.7 (Standard)
At $26.00/1k images, Wan 2.7 offers a superior balance of performance and price. While it is only slightly more expensive than the 2.5 Preview, it delivers massive gains in video generation and multimodal tasks (such as video with audio). For users prioritizing modern video features without the "Pro" premium, this is the most efficient choice.
The Professional Premium: Wan 2.7 Pro
The Pro variant is positioned for high-end specialized workflows, priced at a significant premium of $64.00/1k images. This version excels in Image Editing (1201 Elo), where it holds a clear lead over the rest of the family. However, its lower score in pure Text-to-Image generation (1119 Elo) suggests its value lies in precision control rather than creative exploration.
The Pricing Anomaly: Wan 2.6 Series
Interestingly, the Wan 2.6 models are priced at $30.00/1k images, which is actually more expensive than the technically superior Wan 2.7 standard model. Unless a user requires the specific "Text-to-Video" aesthetic where Wan 2.6 slightly edges out 2.7, there is little financial incentive to remain on this version.
The Budget Baseline: Wan 2.5 Preview
At the entry-level price of $21.00/1k images, the 2.5 Preview remains the most affordable option. Notably, it still holds the highest Elo score for Text to Image generation (1149 Elo), making it the most cost-effective tool for users who only care about pure static image quality.
Model Version | Price ($/1k images) | Top Benchmark Strength | Value Proposition |
Wan 2.7 Pro | $64.00 | Image Editing (1201 Elo) | Professional precision & editing |
Wan 2.6 Image | $30.00 | Image to Video (1208 Elo) | Balanced but relatively expensive |
Wan 2.6 T2I | $30.00 | Text to Video (1192 Elo) | Legacy high performance |
Wan 2.7 | $26.00 | Video w/ Audio (1050 Elo) | Best overall value for video |
Wan 2.5 Preview | $21.00 | Text to Image (1149 Elo) | Best budget choice for static images |

Based on the latest benchmark data and API pricing structures, the Wan AI ecosystem has evolved into a specialized lineup where newer isn’t always more expensive, and "Pro" serves a very specific niche.
This is the current "sweet spot" for high-quality video production.
Performance: It holds the highest Image-to-Video Elo (1234) and leads in multimodal tasks involving audio.
Cost: At $6.00/min, it is actually 33% cheaper than the older Wan 2.6, offering a rare combination of better quality for less money.
Best For: Commercial video clips, social media content, and any project requiring high visual-audio synchronization.
If your workflow involves heavy iteration on existing assets, this is your tool.
Performance: It is the undisputed king of Image Editing with a 1201 Elo score.
Cost: It carries a heavy premium at $64.00 per 1k images.
Best For: Professional designers and product strategists who need exact modifications rather than random generation.
Interestingly, the oldest version in this comparison remains relevant for static imagery.
Performance: It maintains the highest Text-to-Image Elo (1149), outperforming even the 2.7 Pro in pure creative prompt generation.
Cost: It is the most affordable for images ($21.00/1k) but the most expensive for video ($9.00/min).
Best For: High-volume concept art or static AI photography where budget is the primary constraint.
If your priority is... | Recommended Version | Cost Consideration |
Best Video Quality | Wan 2.7 | Highly efficient ($6.00/min) |
Complex Image Edits | Wan 2.7 Pro | Luxury pricing ($64.00/1k) |
Pure Aesthetic Images | Wan 2.5 Preview | Cheapest image API ($21.00/1k) |
High Volume/Testing | Wan 2.2 5B | Lowest barrier to entry ($1.80/min) |
We’re excited to announce that Wan AI 2.7 is now available on GlobalGPT! Get access to advanced video generation tools, customization options, and professional editing features — all in one platform.
Q: Is Wan 2.7 a better choice than Wan 2.6? A: Yes, Wan 2.7 is the superior choice as it delivers higher Image-to-Video quality and advanced audio-sync features at a 33% lower cost ($6/min).
Q: Why does the Wan 2.7 Pro version carry such a high price premium? A: The Pro version is a specialized tool priced at $64/1k images because it offers industry-leading precision for professional image editing (1201 Elo).
Q: Which model offers the best results for users on a strict budget? A: For the lowest costs, use Wan 2.2 5B for video ($1.80/min) or Wan 2.5 Preview for high-quality static images at just $21/1k.
The evolution of the Wan AI series represents a significant leap from basic video templates to a comprehensive, professional-grade production suite.
Wan 2.7 has effectively redefined the "price-to-performance" standard in the industry. By delivering superior video consistency and breakthrough audio synchronization at a cost lower than its predecessors, it has become the definitive choice for creators and enterprises alike. While Wan 2.7 Pro caters to those requiring surgical precision in image modification, and Wan 2.5 Preview maintains its niche for budget-friendly high-aesthetic static images, Wan 2.7 stands as the most versatile and efficient model for modern video workflows.
With Wan 2.7 now integrated into GlobalGPT, advanced cinematic AI tools are more accessible and affordable than ever before.