How to Make the Perfect AI Avatar Video with HeyGen and Eleven Labs
TLDRThis video explains how to create a high-quality AI avatar using HeyGen and Eleven Labs, covering everything from setup to automation. It highlights why realistic lighting, composition, and recording technique are crucial to avoiding the uncanny valley and achieving a convincing digital twin. The creator walks through capturing the right footage, preparing clean audio for voice cloning, and using Make.com to automate script-to-video production. The video also discusses when AI avatars are most effective—especially for business uses like training, sales, and internal updates—and offers practical tips to ensure your avatar looks professional and natural.
Takeaways
- 🎭AI avatars are becoming increasingly common in training, social media, and business videos, and high-quality ones can be hard to distinguish from real footage. Developers are leveraging advanced technologies like the AI lip sync video API to create more realistic and engaging content.
- 🧍♂️ The creator demonstrates that his own video avatar—built with HeyGen—is realistic enough that many viewers can’t immediately tell it’s not him.
- 💸 Creating a polished AI avatar can cost around £1,000 if done with professional-level setup, though platforms like HeyGen offer free versions with limitations.
- 😬 The biggest barrier to mainstream adoption is the ‘uncanny valley,’ where poorly executed avatars feel robotic or unsettling.
- ⏱️ AI avatars work best for short videos, internal comms, training, and sales material—less ideal for long YouTube content where authenticity is key.
- 💡 Proper composition and lighting (especially side lighting and backlighting) dramatically improve avatar quality; ring lights are not recommended.
- 🏠 For small spaces, a chair-mounted green screen combined with background removal tools (e.g., Runway) is a practical workaround.
- 📹 The avatar training video must be a steady, 2-minute continuous take with natural delivery—no swaying, mistakes, or exaggerated gestures.
- 🎤 To avoid uncanny audio, creators can either record their real voice separately or use Eleven Labs to cloneCreate AI Avatar Video their voice with 30+ minutes of clean audio.
- 🤖 HeyGen integrates with Eleven Labs and automation tools like Make.com, allowing users to auto-generate avatar videos directly from written scripts.
- 🗂️ Automation workflows can move scripts from Google Docs → Eleven Labs → HeyGen → Google Drive, streamlining the entire production pipeline.
- 🚀 Once the avatar and workflow are set up, creators can produce high-quality videos quickly without needing to repeatedly prepare lighting, recording, or reshoots.
Q & A
What is an AI avatar as described in the video?
-An AI avatar is a digital representation or "digital twin" of a person generated by tools like HeyGen that can present videos on behalf of the real person. It looks and behaves like the person but is produced by AI.
Which tools does the video focus on for creating avatar videos and voice clones?
-The video focuses on HeyGen for creating the visual AI avatar and Eleven Labs (11 Labs) for cloning the voice. It also mentions Make.com for automation and Google Docs for scripts.
Why haven't avatar videos become universally popular yet?
-Mainly because of the "uncanny valley": poorly made avatars can feel slightly wrong or unsettling. Also concerns about impersonation and the setup complexity slow wider adoption.
When is it appropriate to use an avatar video versus a real-camera video?
-Avatars are great for short videos, internal company training, business updates, sales material, or replacing small clips during fixes. They're less recommended for long, personal YouTube videos (the video suggests avoiding >5 minutes) where a close, authentic connection matters. For those looking to create AI avatars, the Kling Taliking Avatar API offers a powerful solution..
What is the recommended recording input HeyGen requires to build an avatar?
-HeyGen requires about 2 minutes of continuous video material of you presenting (a single take) to create an avatar.
What are the key setup elements for getting a convincing avatar?
-Composition and lighting are most important: a reasonably large room with a pleasing background, a stable camera/tripod, a front LED light placed slightly to the side for soft shadows, and a backlight (edge light) to separate you from the background. Avoid ring lights if possible.
What equipment does the presenter recommend for better audio and lighting?
-For audio, the presenter recommends a good microphone such as the Rode NT-USB (referred to as 'road NT USB') and suggests using a decent front LED lamp for lighting (priced around $200–$250 in the transcript).
Can you use a free HeyGen account to test avatars?
-Yes — you can create an avatar for free, but exports will carry a HeyGen watermark and be limited to a maximum of 720p, which is fine for testing but not ideal for production use.
What are HeyGen pricing notes mentioned in the video?
-The creator plan is mentioned at $29/month. For videos longer than about 5 minutes, the video says you need the team subscription, which is $10/month more (i.e., roughly $39/month).
How should you record footage for creating the avatar to avoid problems?
-Use a fixed camera on a tripod, position yourself centrally, stay relatively still, avoid obvious fluffs or distracting gestures, wear neat clothing (remove dust), and provide a single continuous take of at least two minutes in the style you want your avatar to use.
What are practical tips if you don't have a big filming space?
-Use a small green screen behind your chair, employ compact lights, or use background-removal tools like Runway (the presenter mentions Runway's background removal costs about $15) to replace and blur backgrounds.
How does voice cloning with Eleven Labs work and what are the audio requirements?
-Eleven Labs needs a substantial amount of clear audio to train a realistic clone — at least 30 minutes is required, with 2–3 hours recommended for best results. Clean, edited recordings (remove ums, fluffs) give the best clone.
What trade-offs exist between using your real recorded voice versus a voice clone?
-Recording your own voice for each video yields the most natural audio but prevents full automation. A well-trained Eleven Labs clone enables automation but may never be 100% perfect and can reintroduce uncanny qualities if trained on noisy or unedited audio.
How can you automate the production pipeline for avatar videos?
-The presenter built a two-step automation in Make (Integromat): watch for a Google Docs script saved to a Drive folder, push the script to Eleven Labs (via HTTP API if needed) to generate audio, then send text/audio to HeyGen to create the video; completed videos are saved back to Google Drive via a webhook.
What file-handling tip does the presenter give for transferring large phone video files?
-On iPhone, save the recorded video to Files (instead of directly sharing from the Photos app) so you can access iCloud from a browser and download the high-resolution file to desktop for uploading to HeyGen.
How quickly does HeyGen process and make the avatar available after upload?
-According to the transcript, avatar processing in HeyGen takes only a couple of minutes — it's surprisingly quick.
What are common pitfalls to avoid when preparing audio for a voice clone?
-Avoid leaving in repeated filler words, ums, long hesitations, or pronunciation inconsistencies. Clean recordings (using tools like Audacity or hiring an editor) lead to a better-trained voice model.
What final advice does the presenter give about using avatars on social media?
-Start with short, well-produced avatar videos and be transparent with your audience when appropriate. Avatars can save time and be very effective when used in the right contexts, but watch for uncanny artifacts and use them where the audience and message fit.
Outlines
🤖AI avatar creation guide AI Avatars: The Future of Video Creation
This paragraph introduces the concept of AI avatars, exploring how they are used in training videos, social media, and other digital content. The speaker reveals that the video they are presenting is not them in person, but an AI-generated version created using an application called Hey Gen. They discuss the idea of a 'digital twin' and share their experience with creating AI avatars, noting that while the technology is promising, many people are hesitant due to concerns about impersonation. The speaker also touches on the limitations of avatars, including the uncanny valley effect, and explains that avatars work best for short videos and business use cases where audience size is smaller. They also highlight how avatars can save time and offer convenience for creating content without needing to reshoot videos.
💡 Setting Up Your Avatar: The Right Environment
In this paragraph, the speaker emphasizes the importance of setup in creating a high-quality avatar. They share their personal experience from an online course called On-screen Authority, which taught them the significance of composition and lighting for video creation. The speaker explains how a good background and lighting can drastically improve the quality of an avatar. They provide specific advice on lighting, recommending a round LED front light and a backlight to create an edge effect, as well as tips for small spaces where a green screenAI avatars creation guide might be used. The focus here is on how lighting and composition can enhance the realism of an avatar and the setup required for a successful avatar video shoot.
🎥 Recording Your Avatar: Tips for a Smooth Process
This paragraph focuses on the recording process needed to create a realistic AI avatar. The speaker provides tips for a successful avatar recording, such as using a fixed camera, sitting up straight, and avoiding distractions or errors while recording. They also advise on the importance of wearing clean, wrinkle-free clothing and avoiding excessive movement, which could lead to unnatural avatar behaviors. The speaker mentions the importance of recording for at least two minutes and using the back camera of a phone for the best video quality. Finally, they discuss file management, suggesting ways to upload the footage quickly for avatar processing.
🔊 Voice Cloning: Making Your Avatar Speak Like You
This paragraph discusses the voice cloning process for avatars, detailing the steps to record a high-quality voice and upload it to create a realistic voice clone. The speaker explains that while Hey Gen's voice capture is limited by the short recording time, a better result can be achieved by separately recording voice with a quality microphone, such as the Rode NT USB. They also recommend cleaning up the audio to remove any errors or background noise. The speaker then introduces 11 Labs, a tool for creating a more advanced voice clone, which requires several hours of pre-recorded audio. After explaining the importance of high-quality voice recordings, the speaker gives guidance on the setup and use of 11 Labs to clone a voice and integrate it with the avatar video.
⏩ Automating Avatar Creation: Streamlining the Process
In this paragraph, the speaker dives into automating the avatar creation process using tools like Make and Google Drive. They explain a straightforward two-step automation process that saves time by automatically pushing a script from Google Docs to 11 Labs for voice cloning, and then to Hey Gen for avatar creation. Once the video is generated, it is uploaded to Google Drive. The speaker provides a technical overview of how to set up this automation, including the use of approval codes and web hooks to trigger the next steps. This process, while not complex, can significantly speed up the creation of avatar videos, especially for those familiar with automation tools.
Mindmap
Keywords
💡AI avatar
💡HeyGen
💡Eleven Labs (11 Labs)
💡Uncanny valley
💡Voice clone
💡Composition
💡Lighting setup
💡Green screen / background removal
💡Automation (make.com)
💡Google Docs (script workflow)
💡Free vs paid tiers (watermark / export limits)
💡Use cases and time savings
Highlights
Introduction to creating AI avatars and digital twins using HeyGen.
The uncanny valley and how realistic avatars can avoid looking robotic.
The process of creating a high-quality avatar setup with good composition and lighting.
Using HeyGen to create an avatar and the importance of having a fixed camera for accurate avatar representation.
Key tips for presenting naturally to the camera for the best avatar results.
The benefits of using avatars for business purposes, like training videos and internal communication.
How avatars save time in video production by avoiding re-shoots and creating reusable assets.
The best lighting setup for creating a convincing avatar, including the use of LED lights and backlighting.
How a green screen can help when working with smaller spaces for avatar production.
The importance of using a good microphone for voice recording when creating AI avatars.
How 11 Labs can be used to clone your voice for more realistic AI avatars.
How to record and clean audio for voice cloning, includingCreate AI Avatar Video tips for better quality recordings.
Automation through make.com to streamline the process of generating avatar videos and uploading them to Google Drive.
How HeyGen integrates with 11 Labs for both avatar creation and voice cloning.
The potential for social media use of AI avatars and the benefits of this technology for content creators.