- Example workflow description
- Job Arrays (low overhead)
- SnakeMake (readability prioritized)
- Pegasus (portability prioritized)
- Makeflow (simplicity prioritized)
- Questions
- Lab/Project time
2022-04-15
git clone https://github.com/uschpc/workshop-workflows.git
index.html
in browserexamples
-wget --no-check-certificate https://g-3d96ec.a78b8.36fe.data.globus.org/symphonie_fantastique.tar
(download tar file)symphonie_fantastique.tar
, you can extract with tar xfv symphonie_fantastique.tar
data
directoryinput.flac
song file as inputffmpeg
input.flac
output.mp4
output
output/<song_name>/<song_name>.mp4
output/<song_name>/images/*.png
rosa_fft.py
rosa_fft.py
python3 rosa_fft.py -f 09Teachers.flac -i 172 -d 4 -o 09Teachers
option | meaning |
---|---|
-f | which song file to read from (doesn’t have to be .flac format) |
-i | initial time to start reading data |
-d | how many seconds of song to read (default is 1) |
-o | which directory to save images |
ffmpeg
ffmpeg -threads 8 -framerate 60 -i 16Funk_Ad/images/frame%d.png -i 16Funk_Ad.flac -pix_fmt yuv420p 16Funk_Ad.mp4
option | meaning |
---|---|
-threads | How many cpus to use when converting fiels |
-framerate | How many pictures per second |
-i | name or name pattern for input file(s) |
-pix_fmt yuv420p | video encoding option? |
output_file.mp4 | output file name |
rosa_fft_array.slurm:
#!/bin/bash #SBATCH --ntasks=1 #SBATCH --mem-per-cpu-1GB #SBATCH --time=00:10:00 #SBATCH --array=0-9 module load python ffmpeg # How many seconds to process per job duration=4 time_index=$((SLURM_ARRAY_TASK_ID*duration)) song_file="data/symphonie_fantastique/berlioz_symphonie_2_un_bal_vals.mp3" out_dir="output/berlioz_symphonie_2_un_bal_vals" echo "Processing time_index:${time_index}" python3 ./scripts/rosa_fft.py -f ${song_file} \ -i ${time_index} -t ${duration} -o ${out_dir}
$SLURM_ARRAY_TASK_ID
will be any number from 0-9examples/job_array
python3 ./scripts/rosa_fft.py -f ${song_file} \ -i ${time_index} -t ${duration} -o ${out_dir}
Variable | meaning | value |
---|---|---|
${song_file} |
Song file to process | data/symphonie_fantastique/berlioz_symphonie_2_un_bal_vals.mp3 |
$SLURM_ARRAY_TASK_ID |
Which array element are we? | 0-9 |
${duration} |
How many seconds to process? | 4 |
${time_index} |
Time to start processing | $((SLURM_ARRAY_TASK_ID*duration)) |
${out_dir} |
Where to save output data | output/berlioz_symphonie_2_un_bal_vals |
examples/job_array/ffmpeg.slurm
ffmpeg.slurm:
#!/bin/bash #SBATCH --cpus-per-task=4 #SBATCH --mem-per-cpu=1GB #SBATCH --time=00:10:00 module load ffmpeg song_file="data/symphonie_fantastique/berlioz_symphonie_2_un_bal_vals.mp3" frame_dir="output/berlioz_symphonie_2_un_bal_vals/images" out_file="output/berlioz_symphonie_2_un_bal_vals.mp4" ffmpeg \ -framerate 60 -i ${frame_dir}/frame%d.png \ -i ${song_file} \ -pix_fmt yuv420p \ -threads ${SLURM_CPUS_PER_TASK} \ ${out_file}
rosa_fft_array.slurm
completes.bash
we can capture the job id
Submitted batch job XXXXX
cut
to parse textexamples/job_array/manager.sh
:
#!/bin/bash # save 4th word using ' ' as delmiter # AKA the job id jid=$(sbatch examples/job_array/job_array.slurm | cut -d ' ' -f 4) sbatch --dependency=afterok:${jid} examples/job_array/ffmpeg.slurm
rule bwa: input: "data/genome.fa", "data/samples/{sample}.fastq" output: temp("mapped/{sample}.bam") conda: "envs/mapping.yaml" threads: 8 shell: "bwa mem -t {threads} {input} | samtools view -Sb - > {output}"
MakeFile
used in building softwareoutputfile(s): inputfile(s) # Leading whitespace below must be tab character, not spaces! command to generate outputfile(s)
CURL=/usr/bin/curl CONVERT=/usr/bin/convert URL="http://ccl.cse.nd.edu/images/capitol.jpg" capitol.anim.gif: capitol.jpg capitol.90.jpg capitol.180.jpg capitol.270.jpg capitol.360.jpg LOCAL $(CONVERT) -delay 10 -loop 0 capitol.jpg capitol.90.jpg capitol.180.jpg capitol.270.jpg capitol.360.jpg capitol.270.jpg capitol.180.jpg capitol.90.jpg capitol.anim.gif capitol.90.jpg: capitol.jpg $(CONVERT) -swirl 90 capitol.jpg capitol.90.jpg capitol.180.jpg: capitol.jpg $(CONVERT) -swirl 180 capitol.jpg capitol.180.jpg capitol.270.jpg: capitol.jpg $(CONVERT) -swirl 270 capitol.jpg capitol.270.jpg capitol.360.jpg: capitol.jpg $(CONVERT) -swirl 360 capitol.jpg capitol.360.jpg capitol.jpg: LOCAL $(CURL) -o capitol.jpg $(URL)
./berlioz_symphonie_1_reveries_pa/berlioz_symphonie_1_reveries_pa.mp4: ./berlioz_symphonie_1_reveries_pa/images/frame000000000.png ./berlioz_symphonie_1_reveries_pa/images/frame000000001.png ./berlioz_symphonie_1_reveries_pa/images/frame000000002.png ./berlioz_symphonie_1_reveries_pa/images/frame000000003.png ./berlioz_symphonie_1_reveries_pa/images/frame000000004.png ./berlioz_symphonie_1_reveries_pa/images/frame000000005.png ./berlioz_symphonie_1_reveries_pa/images/frame000000006.png ./berlioz_symphonie_1_reveries_pa/images/frame000000007.png ./berlioz_symphonie_1_reveries_pa/images/frame000000008.png ./berlioz_symphonie_1_reveries_pa/images/frame000000009.png ./berlioz_symphonie_1_reveries_pa/images/frame000000010.png ./berlioz_symphonie_1_reveries_pa/images/frame000000011.png ./berlioz_symphonie_1_reveries_pa/images/frame000000012.png ./berlioz_symphonie_1_reveries_pa/images/frame000000013.png ./berlioz_symphonie_1_reveries_pa/images/frame000000014.png ./berlioz_symphonie_1_reveries_pa/images/frame000000015.png fmpeg -threads 4 -framerate 60 -i Homework/04Da_Funk/images/frame%09d.png -i data/Homework/04Da_Funk.flac -pix_fmt yuv420p Homework/04Da_Funk/04Da_Funk.mp4 ./berlioz_symphonie_1_reveries_pa/images/frame000000000.png ./berlioz_symphonie_1_reveries_pa/images/frame000000001.png ./berlioz_symphonie_1_reveries_pa/images/frame000000002.png ./berlioz_symphonie_1_reveries_pa/images/frame000000003.png ./berlioz_symphonie_1_reveries_pa/images/frame000000004.png ./berlioz_symphonie_1_reveries_pa/images/frame000000005.png ./berlioz_symphonie_1_reveries_pa/images/frame000000006.png ./berlioz_symphonie_1_reveries_pa/images/frame000000007.png ./berlioz_symphonie_1_reveries_pa/images/frame000000008.png ./berlioz_symphonie_1_reveries_pa/images/frame000000009.png ./berlioz_symphonie_1_reveries_pa/images/frame000000010.png ./berlioz_symphonie_1_reveries_pa/images/frame000000011.png ./berlioz_symphonie_1_reveries_pa/images/frame000000012.png ./berlioz_symphonie_1_reveries_pa/images/frame000000013.png python3 /scripts/rosa_fft.py -f /project/hpcroot/csul/workshop-workflows/data/symphonie_fantastique/berlioz_symphonie_1_reveries_pa.mp3 -i 1 -o ./berlioz_symphonie_1_reveries_pa
"rules":[ { "outputs":["input.txt"], "command":"echo \"Hello Makeflow!\" > input.txt", "local_job":true, }, { "outputs":[format("output.%d",i)], "inputs":["simulation.py","input.txt"], "command":format("./simulation.py %d < input.txt > output.%d", i, i), } for i in range(1,5) ], }
"define": { "OUTDIR":"output/berlioz_symphonie_2_un_bal_vals/images", "SONGNAME" : "berlioz_symphonie_2_un_bal_vals", "FFMPEG_THREADS" : 4, "FRAME_RATE" : 60, "SONG_DURATION": 370 }, "rules":[ # Merge all frames into 1 video { "command" : format("ffmpeg -threads %d -framerate %d -i output/%s/images/frame\%%d.png -i ../../data/symphonie_fantastique/%s.mp3 output/%s/%s.mp4",FFMPEG_THREADS,FRAME_RATE,SONGNAME,SONGNAME,SONGNAME,SONGNAME), "inputs" : [ format("output/%s/images/frame%d.png",SONGNAME, x) for x in range(0,SONG_DURATION*FRAME_RATE)], "output" : [format("output/%s/%s.mp4",SONGNAME,SONGNAME)], },{ "command" : format("python3 ../../scripts/rosa_fft.py -f ../../data/symphonie_fantastique/%s.mp3 -i "+N+" -o output/%s",SONGNAME,SONGNAME), "inputs" : ["../../scripts/rosa_fft.py" , format("../../data/symphonie_fantastique/%s.mp3",SONGNAME)], "outputs": [ format("%s/frame%d.png",OUTDIR,x) for x in range(N*FRAME_RATE,(N+1)*FRAME_RATE) ] } for N in range(0,370) ] }
"define": { "OUTDIR":"output/berlioz_symphonie_2_un_bal_vals/images", "SONGNAME":"berlioz_symphonie_2_un_bal_vals", "FFMPEG_THREADS":4, "FRAME_RATE":60, "SONG_DURATION":370 }, "rules": [ { "command":"ffmpeg -threads 4 -framerate 60 -i output/berlioz_symphonie_2_un_bal_vals/images/frame%d.png -i ../../data/symphonie_fantastique/berlioz_symphonie_2_un_bal_vals.mp3 output/berlioz_symphonie_2_un_bal_vals/berlioz_symphonie_2_un_bal_vals.mp4", "inputs": [ "output/berlioz_symphonie_2_un_bal_vals/images/frame0.png", "output/berlioz_symphonie_2_un_bal_vals/images/frame1.png", . . . "output/berlioz_symphonie_2_un_bal_vals/images/frame22199.png" ], "output": "output/berlioz_symphonie_2_un_bal_vals/berlioz_symphonie_2_un_bal_vals.mp4" }
salloc --ntasks=1 --cpus-per-task=8 --time=1:00:00 --mem-per-cpu=2GB
examples/clip/berlioz_symphonie_2_un_bal_vals.jx
$ makeflow --jx clip_berlioz_symphonie_2_un_bal_vals.jx parsing clip_berlioz_symphonie_2_un_bal_vals.jx... local resources: 32 cores, 191863 MB memory, 1438549300 MB disk max running local jobs: 32 checking clip_berlioz_symphonie_2_un_bal_vals.jx for consistency... clip_berlioz_symphonie_2_un_bal_vals.jx has 4 rules. creating new log file clip_berlioz_symphonie_2_un_bal_vals.jx.makeflowlog... checking files for unexpected changes... (use --skip-file-check to skip this step) starting workflow.... submitting job: python3 ../../scripts/rosa_fft.py -f ../../data/symphonie_fantastique/berlioz_symphonie_2_un_bal_vals.mp3 -i 111 -o output/berlioz_symphonie_2_un_bal_vals submitted job 211569 submitting job: python3 ../../scripts/rosa_fft.py -f ../../data/symphonie_fantastique/berlioz_symphonie_2_un_bal_vals.mp3 -i 110 -o output/berlioz_symphonie_2_un_bal_vals submitted job 211570 submitting job: python3 ../../scripts/rosa_fft.py -f ../../data/symphonie_fantastique/berlioz_symphonie_2_un_bal_vals.mp3 -i 109 -o output/berlioz_symphonie_2_un_bal_vals submitted job 211571 job 211569 completed job 211570 completed job 211571 completed submitting job: ffmpeg -threads 4 -framerate 60 -start_number 6540 -i output/berlioz_symphonie_2_un_bal_vals/images/frame%d.png -ss 00:01:49 -to 00:01:52 -i ../../data/symphonie_fantastique/berlioz_symphonie_2_un_bal_vals.mp3 output/berlioz_symphonie_2_un_bal_vals/berlioz_symphonie_2_un_bal_vals.mp4 submitted job 216156 job 216156 completed deleted makeflow.failed.0 nothing left to do.
examples/clips/output
/<song_name>/<song_name>.mp4
/<song_name>/images/*.png
images
directory contains intermediate filesmakeflow --clean=intermediates examples/clip/01Daftendirekt.jx
$ makeflow --clean=intermediates --jx clip_berlioz_symphonie_2_un_bal_vals.jx parsing clip_berlioz_symphonie_2_un_bal_vals.jx... local resources: 32 cores, 191863 MB memory, 1438503624 MB disk max running local jobs: 32 checking clip_berlioz_symphonie_2_un_bal_vals.jx for consistency... clip_berlioz_symphonie_2_un_bal_vals.jx has 4 rules. recovering from log file clip_berlioz_symphonie_2_un_bal_vals.jx.makeflowlog... checking for old running or failed jobs... checking files for unexpected changes... (use --skip-file-check to skip this step) cleaning filesystem... deleted output/berlioz_symphonie_2_un_bal_vals/images/frame6616.png deleted output/berlioz_symphonie_2_un_bal_vals/images/frame6578.png deleted output/berlioz_symphonie_2_un_bal_vals/images/frame6646.png deleted output/berlioz_symphonie_2_un_bal_vals/images/frame6559.png
projects/project1/job_array.slurm_template
#!/bin/bash #SBATCH --ntasks=1 #SBATCH --mem-per-cpu-1GB #SBATCH --time=00:10:00 #SBATCH --array=0-9 module load python ffmpeg libsndfile # How many seconds to process per job duration=4 time_index=$((SLURM_ARRAY_TASK_ID*duration)) song_file="XXX.mp3" out_dir="output/XXX" echo "Processing time_index:${time_index}" python3 ./scripts/rosa_fft.py -f ${song_file} -i ${time_index} -t ${duration} -o ${out_dir}
sed 's/XXX/cool_song/g' job_array.slurm_template
will find every XXX
and replace with cool_song
examples/job_array
as neededProject was based on an earlier version of this presentation, start point not working right now :(
projects/project2/generate_makefile.py
makeflow -T wq projects/project3/berlioz_symphonie_1_reveries_pa.makeflow -p $PORT
export PORT=8888
$PORT
parsing makeflows/Blitz_it.makeflow... local resources: 20 cores, 64334 MB memory, 5566556492 MB disk max running remote jobs: 1000 max running local jobs: 20 checking makeflows/Blitz_it.makeflow for consistency... makeflows/Blitz_it.makeflow has 72 rules. submitted job 43 submitting job: python3 ./scripts/rosa_fft.py -f data/chirpy/Blitz_it.wav -i 27 -o ./Blitz_it submitted job 44 submitting job: python3 ./scripts/rosa_fft.py -f data/chirpy/Blitz_it.wav -i 26 -o ./Blitz_it submitted job 45 submitting job: python3 ./scripts/rosa_fft.py -f data/chirpy/Blitz_it.wav -i 25 -o ./Blitz_it submitted job 46 submitting job: python3 ./scripts/rosa_fft.py -f data/chirpy/Blitz_it.wav -i 24 -o ./Blitz_it submitted job 47 submitting job: python3 ./scripts/rosa_fft.py -f data/chirpy/Blitz_it.wav -i 23 -o ./Blitz_it submitted job 48 submitting job: python3 ./scripts/rosa_fft.py -f data/chirpy/Blitz_it.wav -i 22 -o ./Blitz_it submitted job 49 submitting job: python3 ./scripts/rosa_fft.py -f data/chirpy/Blitz_it.wav -i 21 -o ./Blitz_it submitted job 50 submitting job: python3 ./scripts/rosa_fft.py -f data/chirpy/Blitz_it.wav -i 20 -o ./Blitz_it
srun work_queue_worker $HOSTNAME:$PORT projects/project3/berlioz_symphonie_1_reveries_pa.makeflow
$HOSTNAME
must be the hostname where the job manager process is running$PORT
must be same port that manager is listening forwork_queue_worker: creating workspace /scratch/csul/makeflow/example/worker-268648-412 work_queue_worker: creating workspace /scratch/csul/makeflow/example/worker-268648-416 work_queue_worker: creating workspace /scratch/csul/makeflow/example/worker-268648-406 work_queue_worker: creating workspace /scratch/csul/makeflow/example/worker-268648-404 work_queue_worker: using 24 cores, 193123 MB memory, 185757288 MB disk, 0 gpus connected to manager d06-15.hpc.usc.edu:8080 via local address 10.125.19.236:57664 work_queue_worker: using 24 cores, 193123 MB memory, 185757288 MB disk, 0 gpus connected to manager d06-15.hpc.usc.edu:8080 via local address 10.125.19.236:57668 work_queue_worker: using 24 cores, 193123 MB memory, 185757288 MB disk, 0 gpus connected to manager d06-15.hpc.usc.edu:8080 via local address 10.125.19.236:57672 work_queue_worker: using 24 cores, 193123 MB memory, 185757284 MB disk, 0 gpus connected to manager d06-15.hpc.usc.edu:8080 via local address 10.125.19.234:59316