How do I make HunyuanVideo faster and/or use less VRAM?
I'm on Pop!_OS Linux, using a 3060 12GB. Using the 12gb vram workflow in ComfyUI, in Docker (Jammy). Using the old fp8 model, with bf16 vae, with clip-vit-large-patch-14, with llava fp8 scaled. I have the FastHunyuanVideo lora, but haven't tried it yet.
I've heard of Sageattention 2, Triton, Torch Compile, but I'm not sure how to install them. I've just been pip installing individual things in the Dockerfile, and putting git clones of custom nodes there as well. VideoHelper Suite failed to import, so I use the native autowebp node. I'd use the Hunyuan wrapper if there was an alternative node for the llm, like you can do for the clip.
Also is there anything else that can make it faster and/or use less VRAM?