Skip to main content
POST
/
api
/
v1
/
task
/
submit
curl --request POST \
  --url https://api.maxapi.io/api/v1/task/submit \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "veo3",
  "callBackUrl": "https://example.com/webhook",
  "input": {
    "prompt": "Ocean waves crashing on a rocky shore at golden hour",
    "ratio": "16:9",
    "resolution": "1080p"
  }
}'
{
  "code": 0,
  "msg": "ok",
  "data": {
    "taskId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  }
}
Use Google Veo 3.1 to generate high-quality videos from text descriptions. Supports multiple speed modes.

Available Models

veo3 · veo3_fast

Speed Modes

Best generation quality. Recommended for most use cases.
Optimized for speed. Ideal for time-sensitive scenarios.
model
string
required
Model name, e.g. veo3, veo3_fast.
callBackUrl
string
Webhook callback URL.
input
object
required
curl --request POST \
  --url https://api.maxapi.io/api/v1/task/submit \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "veo3",
  "callBackUrl": "https://example.com/webhook",
  "input": {
    "prompt": "Ocean waves crashing on a rocky shore at golden hour",
    "ratio": "16:9",
    "resolution": "1080p"
  }
}'
{
  "code": 0,
  "msg": "ok",
  "data": {
    "taskId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  }
}

Query Result

Retrieve the generation result via the Query Task endpoint or Webhook:
{
  "code": 0,
  "msg": "ok",
  "data": {
    "taskId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "status": "SUCCESS",
    "input": {
      "model": "veo3",
      "prompt": "Ocean waves crashing on a rocky shore at golden hour",
      "ratio": "16:9",
      "resolution": "1080p"
    },
    "result": {
      "type": "video",
      "urls": [
        "https://example.com/output/video-a1b2c3d4.mp4"
      ]
    },
    "created_at": "2026-02-12T10:00:00.000000Z",
    "updated_at": "2026-02-12T10:02:30.000000Z"
  }
}
Veo 3 returns 1 video per generation. The video typically lasts 5-8 seconds. No mode parameter needed — the system auto-detects based on image count.