This is probably the most complex workflow I have ever built with open source tools. It took my 4 days. It takes four entries: author, title and style; And generates a complete visual animated story in a click in U/Comfyui. I worked on it for four days. There are still some mistakes, but here is the first preview. Here is a short breakdown: - The four inputs are sent with precise instructions to generate: First, prompt to images and image modifications; Second, input requests for animations; Thirdly, it calls on generating music. - All voices are generated from the text and determined exactly because they determine the length of each animation segment. - The first picture and video are created as a title, but also as a guide for all other images created for the video. - Title and subtitles are also automatically added in a convenient point of view. - I also developed a lot of custom nodes for smaller frame calculations, mainly for audio and video. - The full system is a large loop that creates a picture for each line of text and then a video from this image. The loop was most difficult to build in this workflow so that either a 20-second video or a 2-minute video with the same input can be processed. - There are several combinations of LLMS who try to understand the text best to provide the best input requests for images and videos. - The final video is completely compiled in convenient conditions. - The music is created based on the LLM edition and corresponds to the exact time of complete animation. - Completed! As a reference, this workflow uses many models and only works on an RTX 6000 Pro with a lot of RAM. My goal is not to replace people because I will try later to explain later, this workflow is highly controlled and can be adapted or revised by real artists at a certain point in time! My goal was to create a tool that can encourage the text at once to enable the AI some freedom and to flow strictly at the same time. I still don't know how to share this workflow with people, I still have to polish it properly, but maybe through Patreon. Anyway, I hope you enjoy my research and let us keep pushing! :)
prompts·2 min read20.9.2025
ComfyUI : Text to Full video ( image, video, scene, subtitle, audio, music, etc...)
Source: Original