This text will present you the best way to set up and use Home windows-based software program that may practice Hunyuan video LoRA fashions, permitting the consumer to generate customized personalities within the Hunyuan Video basis mannequin:
Click on to play. Examples from the current explosion of celeb Hunyuan LoRAs from the civit.ai group.
In the mean time the 2 hottest methods of producing Hunyuan LoRA fashions domestically are:
1) The diffusion-pipe-ui Docker-based framework, which depends on Home windows Subsystem for Linux (WSL) to deal with a number of the processes.
2) Musubi Tuner, a brand new addition to the favored Kohya ss diffusion coaching structure. Musubi Tuner doesn’t require Docker and doesn’t rely upon WSL or different Linux-based proxies – however it may be tough to get working on Home windows.
Subsequently this run-through will give attention to Musubi Tuner, and on offering a totally native answer for Hunyuan LoRA coaching and era, with out using API-driven web sites or industrial GPU-renting processes comparable to Runpod.
Click on to play. Samples from LoRA coaching on Musubi Tuner for this text. All permissions granted by the individual depicted, for the needs of illustrating this text.
REQUIREMENTS
The set up would require at minimal a Home windows 10 PC with a 30+/40+ collection NVIDIA card that has no less than 12GB of VRAM (although 16GB is really useful). The set up used for this text was examined on a machine with 64GB of system RAM and a NVIDIA 3090 graphics playing cards with 24GB of VRAM. It was examined on a devoted test-bed system utilizing a contemporary set up of Home windows 10 Skilled, on a partition with 600+GB of spare disk house.
WARNING
Putting in Musubi Tuner and its stipulations additionally entails the set up of developer-focused software program and packages straight onto the primary Home windows set up of a PC. Taking the set up of ComfyUI under consideration, for the tip levels, this venture would require round 400-500 gigabytes of disk house. Although I’ve examined the process with out incident a number of occasions in newly-installed check mattress Home windows 10 environments, neither I nor unite.ai are chargeable for any harm to techniques from following these directions. I counsel you to again up any necessary knowledge earlier than making an attempt this sort of set up process.
Issues
Is This Technique Nonetheless Legitimate?
The generative AI scene is shifting very quick, and we are able to count on higher and extra streamlined strategies of Hunyuan Video LoRA frameworks this 12 months.
…and even this week! Whereas I used to be writing this text, the developer of Kohya/Musubi produced musubi-tuner-gui, a complicated Gradio GUI for Musubi Tuner:
Clearly a user-friendly GUI is preferable to the BAT recordsdata that I exploit on this characteristic – as soon as musubi-tuner-gui is working. As I write, it solely went on-line 5 days in the past, and I can discover no account of anybody efficiently utilizing it.
Based on posts within the repository, the brand new GUI is meant to be rolled straight into the Musubi Tuner venture as quickly as doable, which is able to finish its present existence as a standalone GitHub repository.
Primarily based on the current set up directions, the brand new GUI will get cloned straight into the prevailing Musubi digital setting; and, regardless of many efforts, I can not get it to affiliate with the prevailing Musubi set up. Which means when it runs, it would discover that it has no engine!
As soon as the GUI is built-in into Musubi Tuner, problems with this type will certainly be resolved. Although the writer concedes that the brand new venture is ‘actually tough’, he’s optimistic for its growth and integration straight into Musubi Tuner.
Given these points (additionally regarding default paths at install-time, and using the UV Python package deal, which complicates sure procedures within the new launch), we are going to most likely have to attend a bit of for a smoother Hunyuan Video LoRA coaching expertise. That stated, it seems to be very promising!
However if you cannot wait, and are prepared to roll your sleeves up a bit, you will get Hunyuan video LoRA coaching working domestically proper now.
Let’s get began.
Why Set up Something on Naked Steel?
(Skip this paragraph if you happen to’re not a sophisticated consumer)
Superior customers will marvel why I’ve chosen to put in a lot of the software program on the naked metallic Home windows 10 set up as an alternative of in a digital setting. The reason being that the important Home windows port of the Linux-based Triton package deal is much harder to get working in a digital setting. All the opposite bare-metal installations within the tutorial couldn’t be put in in a digital setting, as they need to interface straight with native {hardware}.
Putting in Prerequisite Packages and Packages
For the applications and packages that should be initially put in, the order of set up issues. Let’s get began.
1: Obtain Microsoft Redistributable
Obtain and set up the Microsoft Redistributable package deal from https://aka.ms/vs/17/launch/vc_redist.x64.exe.
It is a simple and speedy set up.
2: Set up Visible Studio 2022
Obtain the Microsoft Visible Studio 2022 Neighborhood version from https://visualstudio.microsoft.com/downloads/?cid=learn-onpage-download-install-visual-studio-page-cta
Begin the downloaded installer:
We do not want each obtainable package deal, which might be a heavy and prolonged set up. On the preliminary Workloads web page that opens, tick Desktop Growth with C++ (see picture under).
Now click on the Particular person Parts tab on the top-left of the interface and use the search field to seek out ‘Home windows SDK’.
By default, solely the Home windows 11 SDK is ticked. In case you are on Home windows 10 (this set up process has not been examined by me on Home windows 11), tick the most recent Home windows 10 model, indicated within the picture above.
Seek for ‘C++ CMake’ and test that C++ CMake instruments for Home windows is checked.
This set up will take no less than 13 GB of house.
As soon as Visible Studio has put in, it would try to run in your laptop. Let it open absolutely. When the Visible Studio’s full-screen interface is lastly seen, shut this system.
3: Set up Visible Studio 2019
Among the subsequent packages for Musubi expect an older model of Microsoft Visible Studio, whereas others want a newer one.
Subsequently additionally obtain the free Neighborhood version of Visible Studio 19 both from Microsoft (https://visualstudio.microsoft.com/vs/older-downloads/ – account required) or Techspot (https://www.techspot.com/downloads/7241-visual-studio-2019.html).
Set up it with the identical choices as for Visible Studio 2022 (see process above, besides that Home windows SDK is already ticked within the Visible Studio 2019 installer).
You will see that the Visible Studio 2019 installer is already conscious of the newer model because it installs:
When set up is full, and you’ve got opened and closed the put in Visible Studio 2019 software, open a Home windows command immediate (Sort CMD in Begin Search) and kind in and enter:
the place cl
The consequence must be the recognized areas of the 2 put in Visible Studio editions.
When you as an alternative get INFO: Couldn't discover recordsdata for the given sample(s)
, see the Verify Path part of this text under, and use these directions so as to add the related Visible Studio paths to Home windows setting.
Save any adjustments made in keeping with the Verify Paths part under, after which attempt the the place cl command once more.
4: Set up CUDA 11 + 12 Toolkits
The assorted packages put in in Musubi want completely different variations of NVIDIA CUDA, which accelerates and optimizes coaching on NVIDIA graphics playing cards.
The rationale we put in the Visible Studio variations first is that the NVIDIA CUDA installers seek for and combine with any present Visible Studio installations.
Obtain an 11+ collection CUDA set up package deal from:
https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Home windows&target_arch=x86_64&target_version=11&target_type=exe_local (obtain ‘exe (native’) )
Obtain a 12+ collection CUDA Toolkit set up package deal from:
https://developer.nvidia.com/cuda-downloads?target_os=Home windows&target_arch=x86_64
The set up course of is similar for each installers. Ignore any warnings concerning the existence or non-existence of set up paths in Home windows Setting variables – we’re going to attend to this manually later.
Set up NVIDIA CUDA Toolkit V11+
Begin the installer for the 11+ collection CUDA Toolkit.
At Set up Choices, select Customized (Superior) and proceed.
Uncheck the NVIDIA GeForce Expertise possibility and click on Subsequent.
Go away Choose Set up Location at defaults (that is necessary):
Click on Subsequent and let the set up conclude.
Ignore any warning or notes that the installer offers about Nsight Visible Studio integration, which isn’t wanted for our use case.
Set up NVIDIA CUDA Toolkit V12+
Repeat the whole course of for the separate 12+ NVIDIA Toolkit installer that you simply downloaded:
The set up course of for this model is similar to the one listed above (the 11+ model), apart from one warning about setting paths, which you’ll ignore:
When the 12+ CUDA model set up is accomplished, open a command immediate in Home windows and kind and enter:
nvcc --version
This could affirm details about the put in driver model:
To test that your card is acknowledged, sort and enter:
nvidia-smi
5: Set up GIT
GIT will likely be dealing with the set up of the Musubi repository in your native machine. Obtain the GIT installer at:
https://git-scm.com/downloads/win (’64-bit Git for Home windows Setup’)
Run the installer:
Use default settings for Choose Parts:
Go away the default editor at Vim:
Let GIT resolve about department names:
Use really useful settings for the Path Setting:
Use really useful settings for SSH:
Use really useful settings for HTTPS Transport backend:
Use really useful settings for line-ending conversions:
Select Home windows default console because the Terminal Emulator:
Use default settings (Quick-forward or merge) for Git Pull:
Use Git-Credential Supervisor (the default setting) for Credential Helper:
In Configuring additional choices, depart Allow file system caching ticked, and Allow symbolic hyperlinks unticked (until you’re a sophisticated consumer who’s utilizing onerous hyperlinks for a centralized mannequin repository).
Conclude the set up and check that Git is put in correctly by opening a CMD window and typing and coming into:
git --version
GitHub Login
Later, whenever you try to clone GitHub repositories, you could be challenged to your GitHub credentials. To anticipate this, log into your GitHub account (create one, if needed) on any browsers put in in your Home windows system. On this means, the 0Auth authentication technique (a pop-up window) ought to take as little time as doable.
After that preliminary problem, you must keep authenticated mechanically.
6: Set up CMake
CMake 3.21 or newer is required for elements of the Musubi set up course of. CMake is a cross-platform growth structure able to orchestrating various compilers, and of compiling software program from supply code.
Obtain it at:
https://cmake.org/obtain/ (‘Home windows x64 Installer’)
Launch the installer:
Guarantee Add Cmake to the PATH setting variable is checked.
Press Subsequent.
Sort and enter this command in a Home windows Command immediate:
cmake --version
If CMake put in efficiently, it would show one thing like:
cmake model 3.31.4
CMake suite maintained and supported by Kitware (kitware.com/cmake).
7: Set up Python 3.10
The Python interpreter is central to this venture. Obtain the three.10 model (the most effective compromise between the completely different calls for of Musubi packages) at:
https://www.python.org/downloads/launch/python-3100/ (‘Home windows installer (64-bit)’)
Run the obtain installer, and depart at default settings:
On the finish of the set up course of, click on Disable path size restrict (requires UAC admin affirmation):
In a Home windows Command immediate sort and enter:
python --version
This could lead to Python 3.10.0
Verify Paths
The cloning and set up of the Musubi frameworks, in addition to its regular operation after set up, requires that its elements know the trail to a number of necessary exterior elements in Home windows, notably CUDA.
So we have to open the trail setting and test that each one the requisites are in there.
A fast strategy to get to the controls for Home windows Setting is to sort Edit the system setting variables into the Home windows search bar.
Clicking it will open the System Properties management panel. Within the decrease proper of System Properties, click on the Setting Variables button, and a window referred to as Setting Variables opens up. Within the System Variables panel within the backside half of this window, scroll all the way down to Path and double-click it. This opens a window referred to as Edit setting variables. Drag the width of this window wider so you may see the total path of the variables:
Right here the necessary entries are:
C:Program FilesNVIDIA GPU Computing ToolkitCUDAv12.6bin
C:Program FilesNVIDIA GPU Computing ToolkitCUDAv12.6libnvvp
C:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.8bin
C:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.8libnvvp
C:Program Information (x86)Microsoft Visible Studio2019CommunityVCToolsMSVC14.29.30133binHostx64x64
C:Program FilesMicrosoft Visible Studio2022CommunityVCToolsMSVC14.42.34433binHostx64x64
C:Program FilesGitcmd
C:Program FilesCMakebin
Normally, the right path variables ought to already be current.
Add any paths which can be lacking by clicking New on the left of the Edit setting variable window and pasting within the appropriate path:
Do NOT simply copy and paste from the paths listed above; test that every equal path exists in your individual Home windows set up.
If there are minor path variations (notably with Visible Studio installations), use the paths listed above to seek out the right goal folders (i.e., x64 in Host64 in your individual set up. Then paste these paths into the Edit setting variable window.
After this, restart the pc.
Putting in Musubi
Improve PIP
Utilizing the most recent model of the PIP installer can clean a number of the set up levels. In a Home windows Command immediate with administrator privileges (see Elevation, under), sort and enter:
pip set up --upgrade pip
Elevation
Some instructions might require elevated privileges (i.e., to be run as an administrator). When you obtain error messages about permissions within the following levels, shut the command immediate window and reopen it in administrator mode by typing CMD into Home windows search field, right-clicking on Command Immediate and deciding on Run as administrator:
For the subsequent levels, we’re going to use Home windows Powershell as an alternative of the Home windows Command immediate. You will discover this by coming into Powershell into the Home windows search field, and (as needed) right-clicking on it to Run as administrator:
Set up Torch
In Powershell, sort and enter:
pip set up torch torchvision torchaudio --index-url https://obtain.pytorch.org/whl/cu118
Be affected person whereas the various packages set up.
When accomplished, you may confirm a GPU-enabled PyTorch set up by typing and coming into:
python -c "import torch; print(torch.cuda.is_available())"
This could lead to:
C:WINDOWSsystem32>python -c "import torch;
print(torch.cuda.is_available())"
True
Set up Triton for Home windows
Subsequent, the set up of the Triton for Home windows element. In elevated Powershell, enter (on a single line):
pip set up https://github.com/woct0rdho/triton-windows/releases/obtain/v3.1.0-windows.post8/triton-3.1.0-cp310-cp310-win_amd64.whl
(The installer triton-3.1.0-cp310-cp310-win_amd64.whl
works for each Intel and AMD CPUs so long as the structure is 64-bit and the setting matches the Python model)
After working, this could lead to:
Efficiently put in triton-3.1.0
We will test if Triton is working by importing it in Python. Enter this command:
python -c "import triton; print('Triton is working')"
This could output:
Triton is working
To test that Triton is GPU-enabled, enter:
python -c "import torch; print(torch.cuda.is_available())"
This could lead to True
:
Create the Digital Setting for Musubi
Any further, we are going to set up any additional software program right into a Python digital setting (or venv). Which means all you will have to do to uninstall all the next software program is to pull the venv’s set up folder to the trash.
Let’s create that set up folder: make a folder referred to as Musubi in your desktop. The next examples assume that this folder exists: C:Customers[Your Profile Name]DesktopMusubi
.
In Powershell, navigate to that folder by coming into:
cd C:Customers[Your Profile Name]DesktopMusubi
We wish the digital setting to have entry to what we’ve put in already (particularly Triton), so we are going to use the --system-site-packages
flag. Enter this:
python -m venv --system-site-packages musubi
Anticipate the setting to be created, after which activate it by coming into:
.musubiScriptsactivate
From this level on, you may inform that you’re within the activated digital setting by the truth that (musubi) seems firstly of all of your prompts.
Clone the Repository
Navigate to the newly-created musubi folder (which is contained in the Musubi folder in your desktop):
cd musubi
Now that we’re in the correct place, enter the next command:
git clone https://github.com/kohya-ss/musubi-tuner.git
Anticipate the cloning to finish (it won’t take lengthy).
Putting in Necessities
Navigate to the set up folder:
cd musubi-tuner
Enter:
pip set up -r necessities.txt
Anticipate the various installations to complete (it will take longer).
Automating Entry to the Hunyuan Video Venv
To simply activate and entry the brand new venv for future classes, paste the next into Notepad and put it aside with the title activate.bat, saving it with All recordsdata possibility (see picture under).
@echo off
name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate
cd C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tuner
cmd
(Exchange [Your Profile Name]
with the true title of your Home windows consumer profile)
It doesn’t matter into which location you save this file.
Any further you may double-click activate.bat and begin work instantly.
Utilizing Musubi Tuner
Downloading the Fashions
The Hunyuan Video LoRA coaching course of requires the downloading of no less than seven fashions with a view to help all of the doable optimization choices for pre-caching and coaching a Hunyuan video LoRA. Collectively, these fashions weigh greater than 60GB.
Present directions for downloading them will be discovered at https://github.com/kohya-ss/musubi-tuner?tab=readme-ov-file#model-download
Nonetheless, these are the obtain directions on the time of writing:
clip_l.safetensors
and
llava_llama3_fp16.safetensorsllava_llama3_fp8_scaled.safetensors
will be downloaded at:
https://huggingface.co/Comfortable-Org/HunyuanVideo_repackaged/tree/essential/split_files/text_encoders
mp_rank_00_model_states.pt
and
mp_rank_00_model_states_fp8.ptmp_rank_00_model_states_fp8_map.pt
will be downloaded at:
https://huggingface.co/tencent/HunyuanVideo/tree/essential/hunyuan-video-t2v-720p/transformers
pytorch_model.pt
will be downloaded at:
https://huggingface.co/tencent/HunyuanVideo/tree/essential/hunyuan-video-t2v-720p/vae
Although you may place these in any listing you select, for consistency with later scripting, let’s put them in:
C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunerfashions
That is according to the listing association prior thus far. Any instructions or directions hereafter will assume that that is the place the fashions are located; and remember to exchange [Your Profile Name] together with your actual Home windows profile folder title.
Dataset Preparation
Ignoring group controversy on the purpose, it is honest to say that you will want someplace between 10-100 images for a coaching dataset to your Hunyuan LoRA. Excellent outcomes will be obtained even with 15 pictures, as long as the pictures are well-balanced and of fine high quality.
A Hunyuan LoRA will be skilled each on pictures or very quick and low-res video clips, or perhaps a combination of every – though utilizing video clips as coaching knowledge is difficult, even for a 24GB card.
Nonetheless, video clips are solely actually helpful in case your character strikes in such an uncommon means that the Hunyuan Video basis mannequin may not learn about it, or have the ability to guess.
Examples would come with Roger Rabbit, a xenomorph, The Masks, Spider-Man, or different personalities that possess distinctive attribute motion.
Since Hunyuan Video already is aware of how unusual women and men transfer, video clips are usually not needed to acquire a convincing Hunyuan Video LoRA human-type character. So we’ll use static pictures.
Picture Preparation
The Bucket Listing
The TLDR model:
It is best to both use pictures which can be all the identical dimension to your dataset, or use a 50/50 cut up between two completely different sizes, i.e., 10 pictures which can be 512x768px and 10 which can be 768x512px.
The coaching may go nicely even if you happen to do not do that – Hunyuan Video LoRAs will be surprisingly forgiving.
The Longer Model
As with Kohya-ss LoRAs for static generative techniques comparable to Secure Diffusion, bucketing is used to distribute the workload throughout differently-sized pictures, permitting bigger pictures for use with out inflicting out-of-memory errors at coaching time (i.e., bucketing ‘cuts up’ the pictures into chunks that the GPU can deal with, whereas sustaining the semantic integrity of the entire picture).
For every dimension of picture you embody in your coaching dataset (i.e., 512x768px), a bucket, or ‘sub-task’ will likely be created for that dimension. So you probably have the next distribution of pictures, that is how the bucket consideration turns into unbalanced, and dangers that some images will likely be given higher consideration in coaching than others:
2x 512x768px pictures
7x 768x512px pictures
1x 1000x600px picture
3x 400x800px pictures
We will see that bucket consideration is split unequally amongst these pictures:
Subsequently both stick to 1 format dimension, or attempt to maintain the distribution of various sizes comparatively equal.
In both case, keep away from very giant pictures, as that is more likely to decelerate coaching, to negligible profit.
For simplicity, I’ve used 512x768px for all of the images in my dataset.
Disclaimer: The mannequin (individual) used within the dataset gave me full permission to make use of these footage for this function, and exercised approval of all AI-based output depicting her likeness featured on this article.
My dataset consists of 40 pictures, in PNG format (although JPG is okay too). My pictures had been saved at C:UsersMartinDesktopDATASETS_HUNYUANexamplewoman
It’s best to create a cache folder contained in the coaching picture folder:
Now let’s create a particular file that can configure the coaching.
TOML Information
The coaching and pre-caching processes of Hunyuan Video LoRAs obtains the file paths from a flat textual content file with the .toml extension.
For my check, the TOML is positioned at C:UsersMartinDesktopDATASETS_HUNYUANtraining.toml
The contents of my coaching TOML appear to be this:
[general]
decision = [512, 768]
caption_extension = ".txt"
batch_size = 1
enable_bucket = true
bucket_no_upscale = false
[[datasets]]
image_directory = "C:CustomersMartinDesktopDATASETS_HUNYUANexamplewoman"
cache_directory = "C:CustomersMartinDesktopDATASETS_HUNYUANexamplewomancache"
num_repeats = 1
(The double back-slashes for picture and cache directories are usually not all the time needed, however they can assist to keep away from errors in circumstances the place there’s a house within the path. I’ve skilled fashions with .toml recordsdata that used single-forward and single-backward slashes)
We will see within the decision
part that two resolutions will likely be thought-about – 512px and 768px. You may also depart this at 512, and nonetheless acquire good outcomes.
Captions
Hunyuan Video is a textual content+imaginative and prescient basis mannequin, so we want descriptive captions for these pictures, which will likely be thought-about throughout coaching. The coaching course of will fail with out captions.
There are a mess of open supply captioning techniques we may use for this process, however let’s maintain it easy and use the taggui system. Although it’s saved at GitHub, and although it does obtain some very heavy deep studying fashions on first run, it comes within the type of a easy Home windows executable that hundreds Python libraries and a simple GUI.
After beginning Taggui, use File > Load Listing to navigate to your picture dataset, and optionally put a token identifier (on this case, examplewoman) that will likely be added to all of the captions:
(Be sure you flip off Load in 4-bit when Taggui first opens – it would throw errors throughout captioning if that is left on)
Choose a picture within the left-hand preview column and press CTRL+A to pick out all the pictures. Then press the Begin Auto-Captioning button on the correct:
You will notice Taggui downloading fashions within the small CLI within the right-hand column, however provided that that is the primary time you have got run the captioner. In any other case you will notice a preview of the captions.
Now, every photograph has a corresponding .txt caption with an outline of its picture contents:
You possibly can click on Superior Choices in Taggui to extend the size and magnificence of captions, however that’s past the scope of this run-through.
Give up Taggui and let’s transfer on to…
Latent Pre-Caching
To keep away from extreme GPU load at coaching time, it’s essential to create two kinds of pre-cached recordsdata – one to symbolize the latent picture derived from the pictures themselves, and one other to guage a textual content encoding referring to caption content material.
To simplify all three processes (2x cache + coaching), you should use interactive .BAT recordsdata that can ask you questions and undertake the processes when you have got given the required data.
For the latent pre-caching, copy the next textual content into Notepad and put it aside as a .BAT file (i.e., title it one thing like latent-precache.bat), as earlier, making certain that the file sort within the drop down menu within the Save As dialogue is All Information (see picture under):
@echo off
REM Activate the digital setting
name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat
REM Get consumer enter
set /p IMAGE_PATH=Enter the trail to the picture listing:
set /p CACHE_PATH=Enter the trail to the cache listing:
set /p TOML_PATH=Enter the trail to the TOML file:
echo You entered:
echo Picture path: %IMAGE_PATH%
echo Cache path: %CACHE_PATH%
echo TOML file path: %TOML_PATH%
set /p CONFIRM=Do you need to proceed with latent pre-caching (y/n)?
if /i "%CONFIRM%"=="y" (
REM Run the latent pre-caching script
python C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunercache_latents.py --dataset_config %TOML_PATH% --vae C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelspytorch_model.pt --vae_chunk_size 32 --vae_tiling
) else (
echo Operation canceled.
)
REM Maintain the window open
pause
(Just be sure you substitute [Your Profile Name] together with your actual Home windows profile folder title)
Now you may run the .BAT file for computerized latent caching:
When prompted to by the assorted questions from the BAT file, paste or sort within the path to your dataset, cache folders and TOML file.
Textual content Pre-Caching
We’ll create a second BAT file, this time for the textual content pre-caching.
@echo off
REM Activate the digital setting
name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat
REM Get consumer enter
set /p IMAGE_PATH=Enter the trail to the picture listing:
set /p CACHE_PATH=Enter the trail to the cache listing:
set /p TOML_PATH=Enter the trail to the TOML file:
echo You entered:
echo Picture path: %IMAGE_PATH%
echo Cache path: %CACHE_PATH%
echo TOML file path: %TOML_PATH%
set /p CONFIRM=Do you need to proceed with textual content encoder output pre-caching (y/n)?
if /i "%CONFIRM%"=="y" (
REM Use the python executable from the digital setting
python C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunercache_text_encoder_outputs.py --dataset_config %TOML_PATH% --text_encoder1 C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelsllava_llama3_fp16.safetensors --text_encoder2 C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelsclip_l.safetensors --batch_size 16
) else (
echo Operation canceled.
)
REM Maintain the window open
pause
Exchange your Home windows profile title and save this as text-cache.bat (or another title you want), in any handy location, as per the process for the earlier BAT file.
Run this new BAT file, observe the directions, and the required text-encoded recordsdata will seem within the cache folder:
Coaching the Hunyuan Video Lora
Coaching the precise LoRA will take significantly longer than these two preparatory processes.
Although there are additionally a number of variables that we may fear about (comparable to batch dimension, repeats, epochs, and whether or not to make use of full or quantized fashions, amongst others), we’ll save these issues for one more day, and a deeper have a look at the intricacies of LoRA creation.
For now, let’s reduce the alternatives a bit of and practice a LoRA on ‘median’ settings.
We’ll create a 3rd BAT file, this time to provoke coaching. Paste this into Notepad and put it aside as a BAT file, like earlier than, as coaching.bat (or any title you please):
@echo off
REM Activate the digital setting
name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat
REM Get consumer enter
set /p DATASET_CONFIG=Enter the trail to the dataset configuration file:
set /p EPOCHS=Enter the variety of epochs to coach:
set /p OUTPUT_NAME=Enter the output mannequin title (e.g., example0001):
set /p LEARNING_RATE=Select studying fee (1 for 1e-3, 2 for 5e-3, default 1e-3):
if "%LEARNING_RATE%"=="1" set LR=1e-3
if "%LEARNING_RATE%"=="2" set LR=5e-3
if "%LEARNING_RATE%"=="" set LR=1e-3
set /p SAVE_STEPS=How usually (in steps) to avoid wasting preview pictures:
set /p SAMPLE_PROMPTS=What's the location of the text-prompt file for coaching previews?
echo You entered:
echo Dataset configuration file: %DATASET_CONFIG%
echo Variety of epochs: %EPOCHS%
echo Output title: %OUTPUT_NAME%
echo Studying fee: %LR%
echo Save preview pictures each %SAVE_STEPS% steps.
echo Textual content-prompt file: %SAMPLE_PROMPTS%
REM Put together the command
set CMD=speed up launch --num_cpu_threads_per_process 1 --mixed_precision bf16 ^
C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunerhv_train_network.py ^
--dit C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelsmp_rank_00_model_states.pt ^
--dataset_config %DATASET_CONFIG% ^
--sdpa ^
--mixed_precision bf16 ^
--fp8_base ^
--optimizer_type adamw8bit ^
--learning_rate %LR% ^
--gradient_checkpointing ^
--max_data_loader_n_workers 2 ^
--persistent_data_loader_workers ^
--network_module=networks.lora ^
--network_dim=32 ^
--timestep_sampling sigmoid ^
--discrete_flow_shift 1.0 ^
--max_train_epochs %EPOCHS% ^
--save_every_n_epochs=1 ^
--seed 42 ^
--output_dir "C:Customers[Your Profile Name]DesktopMusubiOutput Fashions" ^
--output_name %OUTPUT_NAME% ^
--vae C:/Customers/[Your Profile Name]/Desktop/Musubi/musubi/musubi-tuner/fashions/pytorch_model.pt ^
--vae_chunk_size 32 ^
--vae_spatial_tile_sample_min_size 128 ^
--text_encoder1 C:/Customers/[Your Profile Name]/Desktop/Musubi/musubi/musubi-tuner/fashions/llava_llama3_fp16.safetensors ^
--text_encoder2 C:/Customers/[Your Profile Name]/Desktop/Musubi/musubi/musubi-tuner/fashions/clip_l.safetensors ^
--sample_prompts %SAMPLE_PROMPTS% ^
--sample_every_n_steps %SAVE_STEPS% ^
--sample_at_first
echo The next command will likely be executed:
echo %CMD%
set /p CONFIRM=Do you need to proceed with coaching (y/n)?
if /i "%CONFIRM%"=="y" (
%CMD%
) else (
echo Operation canceled.
)
REM Maintain the window open
cmd /okay
As traditional, you should definitely substitute all cases of [Your Profile Name] together with your appropriate Home windows profile title.
Be sure that the listing C:Customers[Your Profile Name]DesktopMusubiOutput Fashions
exists, and create it at that location if not.
Coaching Previews
There’s a very primary coaching preview characteristic lately enabled for Musubi coach, which lets you power the coaching mannequin to pause and generate pictures based mostly on prompts you have got saved. These are saved in an mechanically created folder referred to as Pattern, in the identical listing that the skilled fashions are saved.
To allow this, you will have to avoid wasting finally one immediate in a textual content file. The coaching BAT we created will ask you to enter the placement of this file; subsequently you may title the immediate file to be something you want, and put it aside anyplace.
Listed below are some immediate examples for a file that can output three completely different pictures when requested by the coaching routine:
As you may see within the instance above, you may put flags on the finish of the immediate that can have an effect on the pictures:
–w is width (defaults to 256px if not set, in keeping with the docs)
–h is top (defaults to 256px if not set)
–f is the variety of frames. If set to 1, a picture is produced; a couple of, a video.
–d is the seed. If not set, it’s random; however you must set it to see one immediate evolving.
–s is the variety of steps in era, defaulting to twenty.
See the official documentation for added flags.
Although coaching previews can shortly reveal some points which may trigger you to cancel the coaching and rethink the info or the setup, thus saving time, do keep in mind that each additional immediate slows down the coaching a bit of extra.
Also, the larger the coaching preview picture’s width and top (as set within the flags listed above), the extra it would sluggish coaching down.
Launch your coaching BAT file.
Query #1 is ‘Enter the trail to the dataset configuration. Paste or sort within the appropriate path to your TOML file.
Query #2 is ‘Enter the variety of epochs to coach’. It is a trial-and-error variable, because it’s affected by the quantity and high quality of pictures, in addition to the captions, and different components. Generally, it is best to set it too excessive than too low, since you may all the time cease the coaching with Ctrl+C within the coaching window if you happen to really feel the mannequin has superior sufficient. Set it to 100 within the first occasion, and see the way it goes.
Query #3 is ‘Enter the output mannequin title’. Title your mannequin! Could also be finest to maintain the title moderately quick and easy.
Query #4 is ‘Select studying fee’, which defaults to 1e-3 (possibility 1). It is a good place to start out, pending additional expertise.
Query #5 is ‘How usually (in steps) to avoid wasting preview pictures. When you set this too low, you will notice little progress between preview picture saves, and it will decelerate the coaching.
Query #6 is ‘What’s the location of the text-prompt file for coaching previews?’. Paste or sort within the path to your prompts textual content file.
The BAT then reveals you the command it would ship to the Hunyuan Mannequin, and asks you if you wish to proceed, y/n.
Go forward and start coaching:
Throughout this time, if you happen to test the GPU part of the Efficiency tab of Home windows Activity Supervisor, you may see the method is taking round 16GB of VRAM.
This will not be an arbitrary determine, as that is the quantity of VRAM obtainable on fairly a number of NVIDIA graphics playing cards, and the upstream code might have been optimized to suit the duties into 16GB for the good thing about those that personal such playing cards.
That stated, it is vitally straightforward to boost this utilization, by sending extra exorbitant flags to the coaching command.
Throughout coaching, you may see within the lower-right aspect of the CMD window a determine for a way a lot time has handed since coaching started, and an estimate of whole coaching time (which is able to fluctuate closely relying on flags set, variety of coaching pictures, variety of coaching preview pictures, and several other different components).
A typical coaching time is round 3-4 hours on median settings, relying on the obtainable {hardware}, variety of pictures, flag settings, and different components.
Utilizing Your Educated LoRA Fashions in Hunyuan Video
Selecting Checkpoints
When coaching is concluded, you’ll have a mannequin checkpoint for every epoch of coaching.
This saving frequency will be modified by the consumer to avoid wasting kind of often, as desired, by amending the --save_every_n_epochs [N]
quantity within the coaching BAT file. When you added a low determine for saves-per-steps when establishing coaching with the BAT, there will likely be a excessive variety of saved checkpoint recordsdata.
Which Checkpoint to Select?
As talked about earlier, the earliest-trained fashions will likely be most versatile, whereas the later checkpoints might provide essentially the most element. The one strategy to check for these components is to run a number of the LoRAs and generate a number of movies. On this means you will get to know which checkpoints are best, and symbolize the most effective stability between flexibility and constancy.
ComfyUI
The most well-liked (although not the one) setting for utilizing Hunyuan Video LoRAs, in the intervening time, is ComfyUI, a node-based editor with an elaborate Gradio interface that runs in your net browser.
Set up directions are simple and obtainable on the official GitHub repository (extra fashions must be downloaded).
Changing Fashions for ComfyUI
Your skilled fashions are saved in a (diffusers) format that isn’t suitable with most implementations of ComfyUI. Musubi is ready to convert a mannequin to a ComfyUI-compatible format. Let’s arrange a BAT file to implement this.
Earlier than working this BAT, create the C:Customers[Your Profile Name]DesktopMusubiCONVERTED
folder that the script is anticipating.
@echo off
REM Activate the digital setting
name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat
:START
REM Get consumer enter
set /p INPUT_PATH=Enter the trail to the enter Musubi safetensors file (or sort "exit" to stop):
REM Exit if the consumer varieties "exit"
if /i "%INPUT_PATH%"=="exit" goto END
REM Extract the file title from the enter path and append 'transformed' to it
for %%F in ("%INPUT_PATH%") do set FILENAME=%%~nF
set OUTPUT_PATH=C:Customers[Your Profile Name]DesktopMusubiOutput ModelsCONVERTEDpercentFILENAMEpercent_converted.safetensors
set TARGET=different
echo You entered:
echo Enter file: %INPUT_PATH%
echo Output file: %OUTPUT_PATH%
echo Goal format: %TARGET%
set /p CONFIRM=Do you need to proceed with the conversion (y/n)?
if /i "%CONFIRM%"=="y" (
REM Run the conversion script with appropriately quoted paths
python C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunerconvert_lora.py --input "%INPUT_PATH%" --output "%OUTPUT_PATH%" --target %TARGET%
echo Conversion full.
) else (
echo Operation canceled.
)
REM Return to start out for one more file
goto START
:END
REM Maintain the window open
echo Exiting the script.
pause
As with the earlier BAT recordsdata, save the script as ‘All recordsdata’ from Notepad, naming it convert.bat (or no matter you want).
As soon as saved, double-click the brand new BAT file, which is able to ask for the placement of a file to transform.
Paste in or sort the trail to the skilled file you need to convert, click on y
, and press enter.
After saving the transformed LoRA to the CONVERTED folder, the script will ask if you need to transform one other file. If you wish to check a number of checkpoints in ComfyUI, convert a collection of the fashions.
When you have got transformed sufficient checkpoints, shut the BAT command window.
Now you can copy your transformed fashions into the modelsloras folder in your ComfyUI set up.
Usually the right location is one thing like:
C:Customers[Your Profile Name]DesktopComfyUImodelsloras
Creating Hunyuan Video LoRAs in ComfyUI
Although the node-based workflows of ComfyUI appear complicated initially, the settings of different extra skilled customers will be loaded by dragging a picture (made with the opposite consumer’s ComfyUI) straight into the ComfyUI window. Workflows will also be exported as JSON recordsdata, which will be imported manually, or dragged right into a ComfyUI window.
Some imported workflows can have dependencies that will not exist in your set up. Subsequently set up ComfyUI-Supervisor, which might fetch lacking modules mechanically.
To load one of many workflows used to generate movies from the fashions on this tutorial, obtain this JSON file and drag it into your ComfyUI window (although there are much better workflow examples obtainable on the varied Reddit and Discord communities which have adopted Hunyuan Video, and my very own is customized from one in every of these).
This isn’t the place for an prolonged tutorial in using ComfyUI, however it’s value mentioning a number of of the essential parameters that can have an effect on your output if you happen to obtain and use the JSON structure that I linked to above.
1) Width and Top
The bigger your picture, the longer the era will take, and the upper the chance of an out-of-memory (OOM) error.
2) Size
That is the numerical worth for the variety of frames. What number of seconds it provides as much as rely upon the body fee (set to 30fps on this structure). You possibly can convert seconds>frames based mostly on fps at Omnicalculator.
3) Batch dimension
The upper you set the batch dimension, the faster the consequence might come, however the higher the burden of VRAM. Set this too excessive and you could get an OOM.
4) Management After Generate
This controls the random seed. The choices for this sub-node are mounted, increment, decrement and randomize. When you depart it at mounted and don’t change the textual content immediate, you’ll get the identical picture each time. When you amend the textual content immediate, the picture will change to a restricted extent. The increment and decrement settings mean you can discover close by seed values, whereas randomize offers you a completely new interpretation of the immediate.
5) Lora Title
You will want to pick out your individual put in mannequin right here, earlier than making an attempt to generate.
6) Token
When you’ve got skilled your mannequin to set off the idea with a token, (comparable to ‘example-person’), put that set off phrase in your immediate.
7) Steps
This represents what number of steps the system will apply to the diffusion course of. Larger steps might acquire higher element, however there’s a ceiling on how efficient this method is, and that threshold will be onerous to seek out. The frequent vary of steps is round 20-30.
8) Tile Measurement
This defines how a lot data is dealt with at one time throughout era. It is set to 256 by default. Elevating it might probably pace up era, however elevating it too excessive can result in a very irritating OOM expertise, because it comes on the very finish of an extended course of.
9) Temporal Overlap
Hunyuan Video era of individuals can result in ‘ghosting’, or unconvincing motion if that is set too low. Generally, the present knowledge is that this must be set to a better worth than the variety of frames, to supply higher motion.
Conclusion
Although additional exploration of ComfyUI utilization is past the scope of this text, group expertise at Reddit and Discords can ease the educational curve, and there are a number of on-line guides that introduce the fundamentals.
First printed Thursday, January 23, 2025