What We Learned Making AI Video with Google Flow and Veo 3.1

We rebuilt our website hero video this week using Google Flow and Veo 3.1. The brief was simple: a 30-second looping b-roll showing a day in the life of a franchise operation — dispatch, truck roll, arrival, proposal, payment.

It took a week of iteration to get something usable. Here is what worked, what did not, and the prompting rules we ended up with.

The Setup: Frames to Video

Flow's "Frames to Video" feature lets you provide a start frame and an end frame. The model generates the video that transitions between them. Chain five of these together and you get a continuous loop.

The key insight: each scene shares a keyframe with its neighbors. The end frame of Scene 1 is the start frame of Scene 2. That gives you continuity across the whole loop without asking the model to maintain consistency over a long sequence — something it is bad at.

Frame A → [Scene 1] → Frame B → [Scene 2] → Frame C → [Scene 3] → Frame D → ...

Mistake #1: Over-Prompting

Our first attempt used detailed narrative prompts. 150+ words per scene. Character actions, dialogue cues, specific movements, environmental details.

The result was unwatchable. Characters walked into walls that materialized from nowhere. Doors appeared in solid walls. People morphed between genders mid-scene. Objects teleported. It looked like the model was on hallucinogens.

The problem: when you describe things that are not in the keyframe images, the model invents them. And it invents them badly.

We told it "she stands up and walks toward the door." There was no door in the frame. So Veo hallucinated one — in the middle of a wall, at the wrong scale, with impossible lighting.

Mistake #2: Describing What's Already Visible

The keyframes already show the model what the scene looks like. Your prompt does not need to describe the composition, the characters, the setting, or the lighting. The images handle all of that.

When you re-describe what is in the image via text, you create conflicts. The model tries to reconcile your text description with the image and produces artifacts. "Woman in navy polo at desk with monitors" — if that is already in the start frame, saying it again in the prompt just introduces noise.

What Actually Works

After a week of iteration, here is the formula that produces clean, professional-looking video.

1. Let the Keyframes Do the Heavy Lifting

Generate your keyframe images separately. We used Nano Banana. Make them look exactly like the shot you want. Match the lighting direction between consecutive frames. Keep character clothing and props consistent.

The keyframes are your storyboard. They define composition, color, characters, and setting. The prompt only defines what happens between them.

2. Keep Prompts Under 30 Words

The best results came from prompts like:

The van drives through a quiet suburban neighborhood. Golden hour light through the trees. Slow tracking shot. No audio.

The technician continues up the walkway toward the front door. Evening light. Steady follow camera. No audio.

That is it. No character descriptions. No invented actions. No environmental details the frames already show.

3. Use Camera Language, Not Narrative Language

Words that work: slow tracking shot, steady camera, gentle drift, slow motion, push in, pull back, hold.

Words that cause hallucinations: walks toward, picks up, opens the door, turns around, stands up, puts down.

Every verb that implies a character doing something specific is a hallucination risk. The model will attempt the action and fail visually. Camera verbs are safe because they describe how the virtual camera moves — something the model is actually good at.

4. Specify "No Audio" If You Don't Want Audio

Veo 3.1 generates synchronized audio by default. If you do not specify, it will add random ambient sound, dialogue, or music. For background use like a website hero, always include "No audio" in the prompt.

5. Match Lighting Between Keyframes

The number one cause of flicker and visual artifacts is inconsistent lighting between start and end frames. If your start frame has golden hour light from the left, your end frame needs the same. When the lighting direction changes, the model produces unstable, flickery transitions.

6. The Hardest Transition Is Interior ↔ Exterior

Going from an indoor scene to an outdoor scene (or vice versa) in a single generated clip is extremely difficult for current AI video models. The lighting, color temperature, and depth of field change so dramatically that the model cannot interpolate smoothly.

Our fix: keep interior-to-exterior transitions at scene boundaries where you will add a crossfade in post-production anyway. Do not ask a single clip to handle that transition.

The Final Workflow

Write the shot list (what each scene shows).
Generate keyframe images — one per scene boundary.
Review keyframes for lighting consistency and character continuity.
Write minimal prompts: under 30 words, camera motion only.
Generate each scene in Flow using start/end frames.
Chain clips in iMovie or DaVinci with 0.5s crossfades.
Strip audio: ffmpeg -i input.mp4 -an -c:v copy output.mp4
Export as MP4, H.264, 8-12 Mbps.

The Bottom Line

AI video generation is not "type a paragraph and get a movie." It is closer to directing. You control composition through keyframes and motion through minimal prompts. The less you say in the prompt, the less the model has to hallucinate.

The keyframes are the script. The prompt is just "action" and "cut."

What We Learned Making AI Video with Google Flow and Veo 3.1

The Setup: Frames to Video

Mistake #1: Over-Prompting

Mistake #2: Describing What's Already Visible

What Actually Works

1. Let the Keyframes Do the Heavy Lifting

2. Keep Prompts Under 30 Words

3. Use Camera Language, Not Narrative Language

4. Specify "No Audio" If You Don't Want Audio

5. Match Lighting Between Keyframes

6. The Hardest Transition Is Interior ↔ Exterior

The Final Workflow

The Bottom Line

Comments

More from this blog

Designing In-App Help

Four Infrastructure Pieces That Quietly Decide Whether Agent Products Scale

Three Products That Treat Agents as Members of the Team, Not Bots

A Week Where Single-API LLM Stacks Got Harder to Defend

Time-Lapse Engineering

Command Palette

The Setup: Frames to Video

Mistake #1: Over-Prompting

Mistake #2: Describing What's Already Visible

What Actually Works

1. Let the Keyframes Do the Heavy Lifting

2. Keep Prompts Under 30 Words

3. Use Camera Language, Not Narrative Language

4. Specify "No Audio" If You Don't Want Audio

5. Match Lighting Between Keyframes

6. The Hardest Transition Is Interior ↔ Exterior

The Final Workflow

The Bottom Line

Comments

More from this blog