Veo 3.1 Prompt Guide: What's New

Veo 3.1 is Google's newest Veo model, and it rewards a slightly different prompting style than Veo 3. The fundamentals carry over, but three areas — audio direction, scene control, and consistency — got sharper. Here is how to prompt for them.

Veo 3 vs Veo 3.1 at a glance

AreaVeo 3Veo 3.1
Audio directionNative synchronized audioFiner control over voice tone and ambience
Scene controlStrong single-shot framingBetter at holding a described scene and staging
ConsistencyGood with careful descriptionImproved character and object persistence
Model IDsveo-3.0-generate-001, veo-3.0-fast-generate-001veo-3.1-generate-001, veo-3.1-fast-generate-001, veo-3.1-lite-generate-001

Note: there is no "Veo 4" — Veo 3.1 is the current newest model. See the model reference for the full lineup.

1. Audio direction got more precise

Veo 3 introduced native synchronized audio; Veo 3.1 lets you steer it. Instead of a vague "background music," describe tone, pacing and source. The model places dialogue, ambience and music as separate layers more reliably.

A barista chats with a customer at a cafe counter, warm spoken dialogue in a friendly tone, soft espresso-machine hiss and low indie music in the background, 50mm lens, warm window light, cozy realistic mood.

2. Scene control holds your staging

Veo 3.1 keeps described elements in their places across the clip. You can stage a scene — what is foreground, midground and background — and trust it to stay put as the camera moves.

A wooden desk in the foreground with an open notebook, a window with rain behind it in the midground, and a bookshelf in the background, slow dolly-in past the notebook, soft overcast light, calm study mood, gentle rain audio.

3. Consistency for characters and objects

Veo 3.1 is better at keeping a character looking like the same person. Lock identity with specific, repeatable descriptors — wardrobe, hair, distinguishing features — and reuse the exact phrasing across clips for a series.

A man with short curly black hair, a green corduroy jacket and round glasses laughs at a kitchen table, medium close-up, soft daylight, natural warm grade, friendly spoken audio. (Reuse this exact description in the next clip to keep him consistent.)

For the tightest match, start from a reference image — see our image-to-video prompts guide.

Which Veo 3.1 variant should you use?

veo-3.1-generate-001

Highest quality, up to 1080p in 16:9. Use for finals.

veo-3.1-fast-generate-001

Faster, cheaper iterations. Use while you refine phrasing.

veo-3.1-lite-generate-001

Most economical; outputs 720p. Great for volume and social.

All variants share the same pay-as-you-go pricing of 1 credit per $0.01 on veo3gen, with durations of 4, 6 or 8 seconds. Map quality and budget on the pricing page.

Updated workflow for Veo 3.1

Draft in Fast. Lock the composition and audio layers in veo-3.1-fast-generate-001.

Separate your audio cues. Name dialogue, ambience and music distinctly so the model layers them.

Reuse identity phrasing. Copy the exact character description between clips for a consistent series.

Finalize in Quality. Re-run the winner in veo-3.1-generate-001 at 1080p 16:9.

Try Veo 3.1 now

Pick a 3.1 variant in the generator and put the new audio and scene controls to work.