Create Ai Videos with Consistent Characters! - Midjourney + Kling

Tao Prompts
15 Oct 202406:06

TLDRThis video tutorial walks viewers through the process of creating AI videos featuring consistent characters. It begins with using Midjourney to generate reference photos with detailed prompts, focusing on aspects like hairstyle and clothing. The characters are then brought to life in AI videos, with tips on maintaining consistency and fixing any inconsistencies. The video showcases the characters, Kim and Lisa, on a vacation in Italy, demonstrating how to animate them using AI video tools like Clean AI, which provides dynamic motion and expressive human-like movements.

Takeaways

  • 🖼️ Use an AI image generator like Midjourney to create multiple photos without characters for consistency.
  • 💡 Be detailed in your prompts, especially regarding hairstyle, ethnicity, age, and clothing for consistent character results.
  • 👥 Generate reference photos for each character separately to maintain their attributes.
  • 📸 Use the same camera or film type for both characters to maintain visual consistency.
  • 🏟️ Create 'Base images' by injecting characters into various settings using the exact same prompts as the reference photos.
  • 🖌️ Use the editor tool to fix inconsistencies like body proportions and facial features after injecting characters.
  • 📸 Copy the image URL of the reference character and paste it into the editor to use as a character reference.
  • 👁️‍🗨️ Look for inconsistencies and fix them along the way, such as body parts that don't match or facial expressions.
  • 👗 Adjust prompts to include full body views, like specifying clothing items to force the generation of legs.
  • 🍽️ Change prompts slightly for different scenes, like adding a smiling expression for a dinner scene.
  • 📹 Upscale photos in Midjourney for the highest resolution before using them in the AI video generator.
  • 🎥 Use AI video tools like Clean AI to animate the images, creating dynamic motion and interactions for the characters.
  • 🤖 Cling offers the best expressiveness for human videos, with detailed arm gestures, facial expressions, and body language.
  • ✍️ If consistency is key, consider using Runway or Luma video as alternatives to Cling for fewer artifacts.

Q & A

  • What is the main topic of the video transcript?

    -The main topic of the video transcript is creating AI videos with multiple and consistent characters using tools like Midjourney and Kling.

  • Why is it important to be detailed in the prompts when using AI image generators like Midjourney?

    -Being detailed in the prompts is important to get consistent results, especially with aspects like hairstyle, ethnicity, age, and clothing details.

  • What are 'Base images' mentioned in the transcript?

    -'Base images' are the initial images generated that will have characters injected into them using the same prompts as the reference photos.

  • How does the process of injecting characters into base images work?

    -The process involves copying the image URL of the reference character, using the editor tool to erase the existing character's head in the base image, and pasting the reference photo URL to replace it.

  • Why is it necessary to look for inconsistencies when injecting characters into base images?

    -It is necessary to look for inconsistencies to ensure the characters look natural and fit well within the scene, fixing issues like incorrect body proportions or unwanted accessories.

  • What is the purpose of using the same camera or film type for both characters in the reference photos?

    -Using the same camera or film type ensures consistency in the lighting and style of the images, which helps when injecting the characters into the base images.

  • How does the video transcript suggest improving the results when injecting characters into images?

    -The transcript suggests trying different character reference photos if the initial results are not satisfactory, as it might take a few tries to find ones that inject properly.

  • What is the role of the AI video generator in the process described?

    -The AI video generator, such as Clean AI, is used to animate the final images, creating dynamic motion and bringing the characters to life.

  • Why is it recommended to upscale photos in Midjourney before using them in the AI video generator?

    -Upscaling photos in Midjourney ensures the highest resolution for the images before they are processed by the AI video generator, resulting in better quality videos.

  • What are some of the challenges mentioned when using AI video generators like Kling?

    -Challenges include artifacts, slight blurs, and deformities in the generated videos, which may affect the consistency and realism of the characters.

  • How does the transcript suggest enhancing the understanding of prompting for high-quality human motions in Kling?

    -The transcript suggests watching a separate video that teaches how to prompt for high-quality human motions in Kling for further insights.

Outlines

00:00

🎨 Creating AI-Generated Vacation Photos

This paragraph discusses the process of creating AI-generated vacation photos with consistent characters. The speaker explains how to use an AI image generator like Mid Journey to create multiple photos without characters, and then bring them to life using AI video. The process involves generating reference photos for characters with detailed prompts, focusing on consistent hairstyle, ethnicity, age, clothing, and colors. The speaker provides tips on using the same camera or film type for both characters and saving the prompts for later use. The 'base images' are created by specifying each character using the same prompts as the reference photos. The speaker demonstrates how to inject characters into these base images using the editor tool, fixing inconsistencies along the way. The goal is to create a series of vacation photos that appear natural and consistent.

05:00

🎥 Animating AI-Generated Photos with Video

The second paragraph delves into the next phase of the project: animating the AI-generated photos using video. The speaker praises the expressiveness of the AI video tool, Clean AI, for its ability to create dynamic motion and lifelike human gestures, expressions, and body language. While acknowledging some minor artifacts and inconsistencies, the speaker recommends Clean AI for its effectiveness in bringing characters to life. The paragraph concludes with a narrative of the characters' vacation in Italy, highlighting their spontaneous adventures and the timeless charm of Rome. The speaker also encourages viewers to watch another video for more detailed instructions on prompting high-quality human motions in AI video tools.

Mindmap

Keywords

💡AI videos

AI videos refer to videos created or enhanced using artificial intelligence. In the context of the video, AI is used to generate images and animate characters within those images, bringing them to life. The script discusses the process of creating AI videos with consistent characters, which involves using AI tools like Midjourney and Clean AI to generate and animate images.

💡Midjourney

Midjourney is an AI image generator mentioned in the script. It is used to create multiple photos without characters and then to generate reference photos for the characters. The tool is integral to the process of creating AI videos as it helps in generating base images and reference photos that are later used to inject characters into various scenes.

💡Consistent characters

Consistent characters are characters that maintain the same appearance and attributes throughout a video or series of images. The video's theme revolves around creating AI videos with multiple, consistent characters. The script provides tips on how to ensure characters remain consistent, such as using detailed prompts and the same camera or film type for both reference and base images.

💡AI video generator

An AI video generator is a tool that converts still images into moving videos using artificial intelligence. In the script, Clean AI is used as an AI video generator to animate the characters within the images. This tool is crucial for bringing the static images to life and creating a dynamic video narrative.

💡Reference photos

Reference photos are images used as a guide or template for creating other images or characters. In the video's context, reference photos are generated using Midjourney to ensure that the characters in the AI videos have consistent appearances. The script emphasizes the importance of detailed prompts when generating these photos to maintain consistency in hairstyles, clothing, and other attributes.

💡Prompts

Prompts are the detailed descriptions or instructions given to AI tools to guide the generation of images or videos. In the script, prompts are used to generate reference photos and base images, specifying details like hairstyle, ethnicity, age, clothing, and colors. The effectiveness of the AI-generated content depends on the specificity and detail of the prompts.

💡Base images

Base images are the initial images into which characters are injected using AI tools. The script describes generating base images of scenes, such as the Coliseum, without characters and then using these images as a foundation to add the characters. The base images serve as the backdrop for the AI videos, setting the scene for the characters' interactions.

💡Image to video tool

The image to video tool is a feature within the AI video generator that allows for the transformation of still images into animated videos. In the script, this tool is used to animate the characters within the images of the Coliseum and other scenes, creating a dynamic and engaging video narrative.

💡Expressiveness

Expressiveness refers to the ability of the AI video generator to capture and display a range of emotions and movements in the characters. The script highlights the high level of expressiveness in Clean AI, noting the realistic arm gestures, facial expressions, and body language that the tool can produce, which are crucial for bringing the characters to life in the AI videos.

💡Artifacts and deformities

Artifacts and deformities are imperfections that can occur in AI-generated images or videos, such as blurs or slight distortions. The script mentions that while Clean AI does an excellent job of bringing characters to life, there can be some artifacts and deformities in the final video. These imperfections are a part of the AI video generation process and may require additional editing or selection of different reference photos to minimize.

💡Upscaling

Upscaling is the process of increasing the resolution of an image to improve its quality before further processing or use. In the script, the speaker advises to upscale the photos in Midjourney before putting them through the AI video generator to ensure the highest resolution and quality in the final AI videos.

Highlights

Creating AI videos with consistent characters using Midjourney and Kling.

The process involves generating multiple photos without characters and then bringing them to life with AI video.

Tips and tricks are necessary for achieving the best results in character consistency.

Use AI image generator Midjourney to create reference photos with detailed prompts, including hairstyle, ethnicity, age, and clothing.

For female characters, specify hair length to avoid inconsistencies in generated images.

Use the same camera or film type for both characters to maintain visual consistency.

Generate base images by injecting characters into various settings using the same prompts as reference photos.

Edit images to fix inconsistencies such as facial features and body proportions using the editor tool.

Injecting characters into base images requires attention to detail and may take several attempts for proper alignment.

Clean AI is used for animating the images, creating dynamic motion for characters.

The image to video tool in Clean AI allows for describing interactions and actions for the characters.

Settings in Clean AI can be adjusted to increase the accuracy of the video following the prompt.

Cling offers high expressiveness in human videos, with detailed arm gestures, facial expressions, and body language.

Artifacts and deformities may occur, and alternative tools like Runway or Luma Video can be considered for more consistency.

The video showcases a vacation scenario with characters Kim and Lisa, demonstrating the effectiveness of the process.

The video includes scenarios like strolling through Rome, visiting the Coliseum, shopping, and dining, all with animated characters.

Upscaling photos in Midjourney is recommended for the highest resolution before AI video generation.

For those interested in learning more about prompting high-quality human motions in Cling, a separate video is available.