You've probably tried this once already. You signed up for Veo or Kling, typed "a cinematic shot of a woman walking through a field at sunset," paid your $1.92, waited three minutes, and got back something where the grass was the wrong colour, the woman had seven fingers, and the lighting looked like a porch lamp.

You closed the tab. You decided AI video wasn't ready.

It is ready. You just got bad output because you wrote a bad prompt. Most people do. The difference between you and the creators producing viral AI video is not skill, talent, or budget. It's that they know one specific thing about how to write the brief.

I'm going to show you that thing. Then I'm going to give you five prompts you can copy tonight. Then I'm going to give you a literal Monday-morning plan.

By the end of this article, you will know enough to produce a usable AI video by the end of the week.

The thing to understand (with one example)

Here are the same scene, prompted two different ways. Same model. Same cost. Wildly different output.

Prompt A — what most people write

Prompt A
"A cinematic shot of a woman walking through a field at sunset."

What you get: A 28-year-old in a flowing white dress walking through bright green grass at high noon. The "sunset" is harsh midday light. Her hand looks like a starfish.

Prompt B — what works

Prompt B
"A woman in her late 50s, cream linen shirt, walks slowly through waist-high dry golden grass. Late golden hour, 5pm, low sun streaming from camera-right at 30 degrees. Wide shot from behind, slow forward push, 50mm lens. Aesthetic reference: Past Lives by Celine Song. Audio: soft wind, distant birdsong, no music. NEGATIVE: text, watermark, fast motion, distorted hands, eye contact with lens."

What you get: Exactly what you imagined.

That's the entire trick. Specificity. Reference. Negative prompt.

The reason Prompt A fails is the AI doesn't know what kind of woman, what kind of field, what time of day, what kind of camera, what kind of light, or what "cinematic" means. So it picks generic defaults for all of them. You get a stock-image-come-to-life.

The reason Prompt B works is you've removed the AI's freedom to guess. You told it the age, the clothing, the height of the grass, the time of day, the angle of the light, the lens, the camera move, the visual reference, the sound, and what NOT to include. There's almost no room for it to drift.

Once you've seen this, you can't un-see it. Every prompt from here forward is just a longer or shorter version of Prompt B.

The 4-part framework (the only thing you need to remember)

You don't need to memorise seven components. You need four.

  1. SUBJECT — who/what, very specifically
  2. SCENE — where, when, what kind of light
  3. CAMERA — angle, lens, movement
  4. STYLE — reference a real film or director

Then add a one-line negative prompt at the end naming what you don't want.

That's it. Five lines. Memorise the structure once and you can write a working prompt in under a minute.

Here it is filled in:

SUBJECT: A woman in her late 40s, cream knit jumper, calm expression
SCENE: Sits at a wooden kitchen table at 10am, soft natural window
       light from camera-left
CAMERA: Mid-shot, 50mm lens, locked-off camera
STYLE: Documentary interview style, A24 cinematography
NEGATIVE: text, watermark, smiling exaggeratedly, distorted hands

Five lines. Drop it into Veo or Kling. Get usable output.

Your 5-prompt starter kit (copy these tonight)

These are tested formats. Copy them exactly the first time. Change ONE thing the second time. Change another the third time. By generation 10, you'll be writing your own.

1. The talking head

SUBJECT: A woman in her [AGE] with [HAIR DESCRIPTION], wearing
[OUTFIT], calm and warm expression
SCENE: Sits at a wooden table by a window at 10am, soft natural
window light from camera-left
CAMERA: Mid-shot, 50mm lens, locked-off camera
STYLE: Documentary interview style, A24 cinematography
AUDIO: Quiet domestic ambience, no music
NEGATIVE: text, watermark, exaggerated smile, distorted hands,
deformed face

Use for: founder videos, brand introductions, testimonial-style content.

2. The mood piece

SUBJECT: A hand pours steaming coffee from a French press into a
ceramic mug
SCENE: Wooden countertop, soft morning light from camera-right,
cream walls blurred in background, single flower in a glass jar
CAMERA: Close-up overhead shot, slow pour over 5 seconds, locked-off
STYLE: Aftersun by Charlotte Wells, soft contemplative domestic
AUDIO: Pour sound, distant birdsong, no music
NEGATIVE: text, watermark, multiple hands, fast motion, plastic skin,
extra fingers

Use for: lifestyle B-roll, brand mood pieces, intro shots.

3. The "shot on iPhone" social piece

SUBJECT: A woman in her 50s, no makeup, hair slightly undone, sitting
on a kitchen stool with a coffee
SCENE: Suburban kitchen on a slow Sunday morning, soft natural light,
slightly cluttered background
CAMERA: Mid-shot, 28mm focal length feel, slight handheld camera
movement, slightly cooler colour grade
STYLE: Shot on iPhone 14, real Instagram footage, NOT cinematic
AUDIO: Distant suburban morning sounds, soft sip, no music
NEGATIVE: cinematic lighting, professional camera, polished colour
grade, posed expression, eye contact with lens

Use for: Reels, TikTok, anything that should look real. This format beats polished content by 93% on social right now.

4. The landscape

SUBJECT: A wide expanse of [LOCATION]
SCENE: [TIME OF DAY], [WEATHER], [LIGHT DIRECTION]
CAMERA: Aerial drone, slow forward push over 8 seconds, ascending
50 feet
STYLE: Terrence Malick wide nature shots, Tree of Life
AUDIO: Gentle wind, distant birdsong, no music
NEGATIVE: text, watermark, oversaturated colours, lens flare,
multiple drones, motion blur

Use for: opening shots, hero pieces, atmospheric content.

5. The product shot

SUBJECT: [YOUR PRODUCT] on [SURFACE]
SCENE: [LOCATION], [TIME OF DAY], [LIGHT DIRECTION]
CAMERA: Eye-level, slow dolly-in over 6 seconds, medium shot,
shallow focus on product
STYLE: Clean editorial commercial style
AUDIO: Quiet ambience, subtle chime at second 5
NEGATIVE: text, watermark, multiple products, oversaturated colours,
deformed perspective

Use for: sales pages, launch content, product reveals.

Your Monday morning plan

Stop reading after this section. Open the tool. Do this:

Monday 30 min · $5
Tuesday 30 min · $5
Wednesday 30 min · $5

Same as Tuesday. Change ONE different variable.

Thursday 30 min · $5

Same. One variable. Different one.

Friday 45 min · $10

End of week — what you'll have

This is faster than any course will teach you. The reason: you're learning by doing, on one variable at a time. By Friday you'll know more than 80% of people who post "ULTIMATE AI VIDEO GUIDE" content on LinkedIn.

When it goes wrong (and it will)

Five things that fail and how to fix them:

  1. The hands look wrong. Always. Add "distorted hands, extra fingers, deformed hand" to your negative prompt. If they're still wrong, change camera angle so hands aren't in frame, or use close-up that crops them out.
  2. The face changes between shots. Use the image-to-video mode (upload one reference image), or generate using the same SEED value across multiple prompts. Most platforms let you save and reuse a seed.
  3. The subtitles appear on screen. Add "no subtitles, no text overlay" to your prompt. Veo especially loves adding text.
  4. Everything looks like a stock photo. You skipped the style reference. Add "Aesthetic reference: [REAL FILM TITLE]". Never use "cinematic" or "beautiful" alone.
  5. The motion is too fast or too dramatic. State the camera move as its own sentence and specify the duration. "The camera slowly pushes in over 6 seconds." Not buried in a longer description.

What to actually spend

Honest numbers, if you're using Veo 3 via PiAPI (the same prices apply on direct API):

Use Veo 3 fast or Kling for learning. Switch to Veo 3 standard only for final pieces you'll publish.

If you follow the Monday-Friday plan above, you'll spend roughly $30 in your first week. After that, expect $50-100/month to maintain a real publishing cadence.

If you skip the plan and just YOLO prompts, you'll spend $200-400 in your first month and have nothing usable to show for it.

The bottom line

You don't need a course. You don't need a coach. You don't need to be a filmmaker.

You need:

By next Friday, you'll have produced something usable. You'll know which platform you prefer. You'll have built a starter library you can grow from.

The first 5 generations will feel awkward. The 10th will surprise you. The 20th will make you stop apologising for AI video and start using it.

You can do this.

— Jules