Skip to main content

Overview

Generate audio that matches given video content + prompt using ThinkSound.
  • Output Type: video
  • Estimated Cost: 0.5 * duration credits
  • Handler: replicate

Parameters

Required Parameters

video
video
required
URL of the base video

Optional Parameters

caption
string
Brief description of the video content
A short caption describing what’s happening in the video to help the model understand the context.
  • Label: Caption
cot
string
Detailed description of the sound generation process
A detailed description that begins with the sound process and includes texture, atmosphere, and timing details. This helps the model understand exactly what audio to generate and how it should match the video.
  • Label: Chain of thought
cfg_scale
float
default:5
Classifier-free guidance scale
Controls how closely the model follows the prompt. Higher values mean stricter adherence to the prompt.
  • Label: CFG Scale
  • Minimum: 1
  • Maximum: 20
num_inference_steps
integer
default:24
Number of inference steps
More steps generally produce higher quality but take longer to generate.
  • Label: Inference steps
  • Minimum: 10
  • Maximum: 50
I