Job Submission

Submit jobs with gbatch (similar to Slurm sbatch). You can submit a command directly or run a script.

TIP

Use direct commands for short, single-step work. Switch to a script when the command needs setup, environment activation, or multiple shell steps.

Quick Start

bash

gbatch python train.py
gbatch --gpus 1 --time 2:00:00 --name train-resnet python train.py
gbatch --project ml-research python train.py
gbatch --max-retries 2 python train.py
gbatch --notify-email alice@example.com --notify-on job_failed,job_timeout python train.py

Submit a Command

bash

gbatch python train.py --epochs 100 --lr 0.01

For complex shell logic, prefer a script file.

Submit a Script

bash

cat > train.sh << 'EOF'
#!/bin/bash
# GFLOW --gpus=1
# GFLOW --time=2:00:00

python train.py
EOF

chmod +x train.sh
gbatch train.sh

Supported script directives

Only a small subset of options are parsed from scripts:

# GFLOW --gpus=<N>
# GFLOW --shared
# GFLOW --time=<TIME>
# GFLOW --memory=<LIMIT>
# GFLOW --gpu-memory=<LIMIT>
# GFLOW --priority=<N>
# GFLOW --conda-env=<ENV>
# GFLOW --depends-on=<job_id|@|@~N> (single dependency only)
# GFLOW --project=<CODE>
# GFLOW --notify-email=<EMAIL>
# GFLOW --notify-on=<EVENT1,EVENT2,...>

INFO

CLI flags override script directives.

Memory Semantics

--memory (--max-mem / --max-memory) limits host RAM.
--gpu-memory (--max-gpu-mem / --max-gpu-memory) limits per-GPU VRAM.
Shared jobs must set both --shared and --gpu-memory.

WARNING

Shared GPU mode is incomplete unless both --shared and --gpu-memory are set.

Common Options

bash

# GPUs
gbatch --gpus 1 python train.py

# Time limit
gbatch --time 30 python quick.py

# Shared GPU mode (must set --gpu-memory)
gbatch --gpus 1 --shared --gpu-memory 20G python train.py

# Priority
gbatch --priority 50 python urgent.py

# Conda env
gbatch --conda-env myenv python script.py

# Project code
gbatch --project ml-research python train.py

# Automatic retries after execution failure
gbatch --max-retries 2 python train.py

# Per-job email notifications
gbatch --notify-email alice@example.com python train.py
gbatch --notify-email alice@example.com --notify-email oncall@example.com --notify-on job_failed,job_timeout python train.py

# Dependencies
gbatch --depends-on <job_id|@|@~N> python next.py
gbatch --depends-on-all 1,2,3 python merge.py
gbatch --depends-on-any 4,5 python process_first_success.py

# Shorthands:
# - @    = most recently submitted job
# - @~N  = Nth most recent submission (e.g. @~1 is previous)

# Disable auto-cancel on dependency failure
gbatch --depends-on <job_id> --no-auto-cancel python next.py

# Preview without submitting
gbatch --dry-run --gpus 1 python train.py

Dependency shorthands

@ refers to the most recently submitted job.
@~N refers to the Nth most recent submission. For example, @~1 means the previous submission.

INFO

Project values are immutable after submission.

INFO

Per-job notifications reuse the SMTP transports configured in Notifications. If you set --notify-email without --notify-on, gflow defaults to terminal events: job_completed, job_failed, job_timeout, and job_cancelled.

Automatic Retries

Use --max-retries <N> to allow up to N automatic resubmissions after execution failure.
Current behavior is narrow by design: only non-zero exits from Running trigger automatic retries.
Timeouts and explicit fail requests stay terminal; use gjob redo when you want a manual resubmission.
If a failed job has queued dependents, gflow retargets them to the newest retry attempt automatically.

Job Arrays

bash

gbatch --array 1-10 python process.py --task '$GFLOW_ARRAY_TASK_ID'

Monitor and Logs

bash

# Jobs and allocations
gqueue -f JOBID,NAME,ST,NODES,NODELIST(REASON)

# Details for one job (includes GPUIDs)
gjob show <job_id>

# Logs
tail -f ~/.local/share/gflow/logs/<job_id>.log

Adjust or Resubmit

Update queued/held jobs: gjob update <job_id> ...
Resubmit a job: gjob redo <job_id> (use --cascade to redo dependents)

Job Submission ​

Quick Start ​

Submit a Command ​

Submit a Script ​

Memory Semantics ​

Common Options ​

Automatic Retries ​

Job Arrays ​

Monitor and Logs ​

Adjust or Resubmit ​

See Also ​

Job Submission

Quick Start

Submit a Command

Submit a Script

Memory Semantics

Common Options

Automatic Retries

Job Arrays

Monitor and Logs

Adjust or Resubmit

See Also