What We Learned Making AI Video with Google Flow and Veo 3.1

I'm a CTO and founder with nearly two decades of experience driving growth and transformation through technology. At Stronghold Investment Management, I led the development of a systematic real asset trading platform and modernized everything from Salesforce strategy to custom cloud-native infrastructure. My background spans commercial real estate, e-commerce, and private markets — always focused on delivering innovation, velocity, and meaningful business outcomes. I hold a PhD in Theoretical & Computational Biophysics and was recognized as a Google Developer Expert in Cloud. I build high-trust, high-output teams. I’ve rebuilt broken cultures, hired top-tier engineers, and helped early-stage and PE-backed companies scale with confidence. System modernization is my specialty — not just upgrading software, but aligning teams and infrastructure with what the business actually needs. Currently, I lead client engagements through Heavy Chain Engineering and am building Newroots.ai, an AI-driven relocation advisory platform.
We rebuilt our website hero video this week using Google Flow and Veo 3.1. The brief was simple: a 30-second looping b-roll showing a day in the life of a franchise operation — dispatch, truck roll, arrival, proposal, payment.
It took a week of iteration to get something usable. Here is what worked, what did not, and the prompting rules we ended up with.
The Setup: Frames to Video
Flow's "Frames to Video" feature lets you provide a start frame and an end frame. The model generates the video that transitions between them. Chain five of these together and you get a continuous loop.
The key insight: each scene shares a keyframe with its neighbors. The end frame of Scene 1 is the start frame of Scene 2. That gives you continuity across the whole loop without asking the model to maintain consistency over a long sequence — something it is bad at.
Frame A → [Scene 1] → Frame B → [Scene 2] → Frame C → [Scene 3] → Frame D → ...
Mistake #1: Over-Prompting
Our first attempt used detailed narrative prompts. 150+ words per scene. Character actions, dialogue cues, specific movements, environmental details.
The result was unwatchable. Characters walked into walls that materialized from nowhere. Doors appeared in solid walls. People morphed between genders mid-scene. Objects teleported. It looked like the model was on hallucinogens.
The problem: when you describe things that are not in the keyframe images, the model invents them. And it invents them badly.
We told it "she stands up and walks toward the door." There was no door in the frame. So Veo hallucinated one — in the middle of a wall, at the wrong scale, with impossible lighting.
Mistake #2: Describing What's Already Visible
The keyframes already show the model what the scene looks like. Your prompt does not need to describe the composition, the characters, the setting, or the lighting. The images handle all of that.
When you re-describe what is in the image via text, you create conflicts. The model tries to reconcile your text description with the image and produces artifacts. "Woman in navy polo at desk with monitors" — if that is already in the start frame, saying it again in the prompt just introduces noise.
What Actually Works
After a week of iteration, here is the formula that produces clean, professional-looking video.
1. Let the Keyframes Do the Heavy Lifting
Generate your keyframe images separately. We used Nano Banana. Make them look exactly like the shot you want. Match the lighting direction between consecutive frames. Keep character clothing and props consistent.
The keyframes are your storyboard. They define composition, color, characters, and setting. The prompt only defines what happens between them.
2. Keep Prompts Under 30 Words
The best results came from prompts like:
The van drives through a quiet suburban neighborhood. Golden hour light through the trees. Slow tracking shot. No audio.
The technician continues up the walkway toward the front door. Evening light. Steady follow camera. No audio.
That is it. No character descriptions. No invented actions. No environmental details the frames already show.
3. Use Camera Language, Not Narrative Language
Words that work: slow tracking shot, steady camera, gentle drift, slow motion, push in, pull back, hold.
Words that cause hallucinations: walks toward, picks up, opens the door, turns around, stands up, puts down.
Every verb that implies a character doing something specific is a hallucination risk. The model will attempt the action and fail visually. Camera verbs are safe because they describe how the virtual camera moves — something the model is actually good at.
4. Specify "No Audio" If You Don't Want Audio
Veo 3.1 generates synchronized audio by default. If you do not specify, it will add random ambient sound, dialogue, or music. For background use like a website hero, always include "No audio" in the prompt.
5. Match Lighting Between Keyframes
The number one cause of flicker and visual artifacts is inconsistent lighting between start and end frames. If your start frame has golden hour light from the left, your end frame needs the same. When the lighting direction changes, the model produces unstable, flickery transitions.
6. The Hardest Transition Is Interior ↔ Exterior
Going from an indoor scene to an outdoor scene (or vice versa) in a single generated clip is extremely difficult for current AI video models. The lighting, color temperature, and depth of field change so dramatically that the model cannot interpolate smoothly.
Our fix: keep interior-to-exterior transitions at scene boundaries where you will add a crossfade in post-production anyway. Do not ask a single clip to handle that transition.
The Final Workflow
Write the shot list (what each scene shows).
Generate keyframe images — one per scene boundary.
Review keyframes for lighting consistency and character continuity.
Write minimal prompts: under 30 words, camera motion only.
Generate each scene in Flow using start/end frames.
Chain clips in iMovie or DaVinci with 0.5s crossfades.
Strip audio:
ffmpeg -i input.mp4 -an -c:v copy output.mp4Export as MP4, H.264, 8-12 Mbps.
The Bottom Line
AI video generation is not "type a paragraph and get a movie." It is closer to directing. You control composition through keyframes and motion through minimal prompts. The less you say in the prompt, the less the model has to hallucinate.
The keyframes are the script. The prompt is just "action" and "cut."





