Midjourney’s Evolution from V1 to V7

In case you’ve ever performed with AI picture turbines, you in all probability know Midjourney. And for those who’ve been round for the reason that early days, then yeah — you’ve seen issues.

I’ve been utilizing Midjourney since model one, and belief me, these early outcomes have been tough. Each immediate felt like rolling the cube: would I get an honest picture or a cursed gremlin with further tooth and three shadows? You by no means knew.

However that’s what makes this journey so attention-grabbing. Watching an AI go from awkward, clunky experiments to full-blown, professional-grade imagery in only a few updates is wild. And now that V7 is out, I figured it was the proper time to look again and ask: how far have we actually come?

So I ran the identical immediate by each model — from V1 to V7 — and the transformation speaks for itself. Let’s have a look.

Midjourney’s Enchancment Advised By way of Photographs

Let me take you on a journey by the evolution of AI picture era. I’ve spent numerous hours taking part in with Midjourney’s fashions, from the early days of V1 to the newest V7 launch, and the transformation is nothing wanting mind-blowing. Utilizing the very same prompts throughout all variations, I am going to present you simply how far we have come.

Portrait (Human)

Immediate: a younger man in a plain black prime, indie, retro, medium format pictures, heat gentle, dorm room aesthetics, candid

The early days of Midjourney have been… tough, to place it kindly.

From V1 to V3, you may kinda inform it was making an attempt to generate a person, however the anatomy was all types of unsuitable. Heads have been formed like malformed potatoes, positioned at unattainable angles, and the dorm background regarded like an summary portray slightly than a dwelling area. It was the uncanny valley, besides the valley was extra like a bottomless pit.

V4 marked the primary main breakthrough, lastly understanding the essential form and type of a human being. The anatomical nightmares have been gone, however one thing nonetheless felt off — that traditional uncanny valley feeling that makes you uncomfortable with out figuring out precisely why.

By V5, the technical points have been largely resolved, however as an alternative of an off-the-cuff candid shot, we acquired one thing that regarded like knowledgeable pictures session. V6 improved issues additional however nonetheless struggled with shadows and lighting.

Then got here V7, and it is like somebody really snapped a photograph with their smartphone. The informal posture, the pure lighting, the genuine dorm setting — it lastly nailed what I used to be asking for all alongside.

Portrait (Non-Human)

Immediate: gray british shorthair cat, medium shot, grainy disposable

If you wish to see nightmare gas, simply try what V1 did with a easy cat immediate. The consequence regarded like a tumor product of fur — one thing that might ship kids operating and screaming.

V2 and V3 confirmed some enchancment, however the eyes have been nonetheless disturbingly unsuitable, and there wasn’t a lot evolution between these two variations. these photos, you’d assume the AI had by no means really seen a cat earlier than.

By V4 and V5, issues acquired considerably higher. The cats really regarded like cats, however there was nonetheless that telltale AI giveaway — fur that blended collectively in an unnatural method, missing the person strands you’d see in an actual picture.

V6 nailed it with a extra practical fur definition and texture that would move as an actual grainy disposable digital camera shot. Curiously, in my view, V6 really outperformed V7 for this one. The most recent mannequin’s picture had some unusual lighting points and even some tearing close to the cat’s ear.

Panorama

Immediate: the view from the height of a mountain, sea of clouds, huge and mesmerizing panorama, Pictures, captured with a Fujifilm GFX 100S medium format digital camera

The early variations of Midjourney actually struggled with landscapes. V1 by V3 appeared confused in regards to the basic distinction between mountains and clouds — they blended collectively in a disorienting mess. The angle regarded off, scale was all unsuitable, and the entire scene felt like a fever dream slightly than an imposing mountain view.

V4 began to grasp the project higher however nonetheless had main points with perspective and scaling. It additionally randomly inserted an individual I by no means requested for within the immediate (a traditional AI hallucination second). By V5, these issues have been largely mounted, with significantly better distinction between mountains and clouds, and a extra pure sense of scale.

The soar to V6 and V7, although? The atmospheric lighting, the feel of the clouds, the dramatic mountain peaks — these newer variations created photos that would simply move for skilled panorama pictures. The distinction between V6 and V7 right here was minimal, suggesting that Midjourney could be approaching the ceiling for this specific sort of images.

Product Pictures

Immediate: industrial pictures, a scented candle, on pastel purple background, with flowers, minimal, dreamy, smooth lighting, middle composition

Early variations of Midjourney had no concept what a candle ought to seem like. V1 and V2 produced unusual, summary interpretations that hardly resembled something cylindrical, not to mention a purposeful candle.

V3 lastly grasped the essential form of a candle however nonetheless had that apparent “AI-generated” high quality that plagued early text-to-image fashions — flat lighting, unusual textures, and an general lack of realism.

By V4, the mannequin might produce a recognizable candle, however the consequence was painfully plain and lacked any understanding of learn how to render flowers or leaves realistically. V5 improved issues however nonetheless did not have that skilled product pictures really feel with the dramatic lighting and distinction that makes a product pop off the web page.

V6 and V7 utterly remodeled the sport — they each produced photos that would simply be mistaken for skilled product pictures, with completely centered candles, heat inviting lighting, and background flowers that complemented slightly than distracted from the topic.

Pixel Artwork

Immediate: pixel artwork scene, a legendary medieval city with fog, darkish fantasy, 8-bit recreation

Pixel artwork continues to be a captivating case research in Midjourney’s evolution.

V1 by V3 have been objectively horrible, however I used to be shocked that even at this early stage, the AI might acknowledge and generate the essential shapes of medieval towers — one thing it struggled with in different picture classes. The outcomes have been crude and barely recognizable, however hey, the elemental understanding was there.

Mockingly, V4 would possibly really be essentially the most profitable model for true pixel artwork. It produced photos with the genuine chunky pixels and restricted coloration palette that outline the style.

As Midjourney superior to V5, V6, and V7, the outcomes turned more and more detailed and “HD” — which is definitely counterintuitive to my immediate. The newer variations appear to simulate pixel artwork slightly than create precise pixel artwork, including easy gradients and particulars that would not be attainable within the genuine medium.

Illustrations

Immediate: a costumed supervillain with fireplace powers trying down a road full of individuals whereas flying, overhead perspective, graphic novel illustration fashion of katsuhiro otomo, comedian guide fashion, jim lee, brian michael bendis

V1 by V3 understood possibly 10% of my immediate (the “supervillain” half) however ignored the whole lot else. No background particulars, no correct perspective, and the themes themselves regarded like melted motion figures slightly than menacing antagonists.

V4 and V5 confirmed vital enchancment however nonetheless missed important parts that modified my immediate’s intent totally. V4 had the villain standing on the bottom as an alternative of flying, whereas V5 created a formless blob of fireplace above a crowd slightly than a costumed character. The angles have been unsuitable, the views confused, and the comedian guide fashion inconsistent.

Then got here V6 and V7, which nailed each single side of the immediate. The overhead angle, the detailed road full of individuals, the flying supervillain with fireplace powers, and the distinctive comedian guide artwork fashion — they have been good. The pictures regarded like they have been torn straight from the pages of knowledgeable comedian guide, with dynamic poses, dramatic lighting, and that good steadiness of realism and stylization that defines the medium.

Total Ideas?

After trying by all seven variations, one factor is obvious: the largest leap in Midjourney’s evolution occurred between V3 and V4. That’s the place it lastly stopped being “simply attention-grabbing tech” and began turning into an precise artistic software. Faces turned human, objects regarded like what they have been purported to, and prompts really meant one thing.

The soar from V6 to V7, although? It’s not as dramatic, however it’s nonetheless extremely spectacular. V7 tightens the whole lot. Lighting is extra pure. Scenes really feel extra lived-in. It’s like they smoothed out the previous couple of bumps that V6 couldn’t fairly polish. So, no, V7 isn’t a revolution — however it’s a refinement.

The place does it go from right here? Who is aware of. However one factor’s for certain: V1 walked so V7 might run with two legs.