How to Make the Perfect AI Avatar Video with HeyGen and Eleven Labs

Absolutely Agentic
24 Aug 202516:30

TLDRThis video explains how to create a high-quality AI avatar using HeyGen and Eleven Labs, covering everything from setup to automation. It highlights why realistic lighting, composition, and recording technique are crucial to avoiding the uncanny valley and achieving a convincing digital twin. The creator walks through capturing the right footage, preparing clean audio for voice cloning, and using Make.com to automate script-to-video production. The video also discusses when AI avatars are most effective—especially for business uses like training, sales, and internal updates—and offers practical tips to ensure your avatar looks professional and natural.

Takeaways

  • 🎭AI avatars are becoming increasingly common in training, social media, and business videos, and high-quality ones can be hard to distinguish from real footage. Developers are leveraging advanced technologies like the AI lip sync video API to create more realistic and engaging content.
  • 🧍‍♂️ The creator demonstrates that his own video avatar—built with HeyGen—is realistic enough that many viewers can’t immediately tell it’s not him.
  • 💸 Creating a polished AI avatar can cost around £1,000 if done with professional-level setup, though platforms like HeyGen offer free versions with limitations.
  • 😬 The biggest barrier to mainstream adoption is the ‘uncanny valley,’ where poorly executed avatars feel robotic or unsettling.
  • ⏱️ AI avatars work best for short videos, internal comms, training, and sales material—less ideal for long YouTube content where authenticity is key.
  • 💡 Proper composition and lighting (especially side lighting and backlighting) dramatically improve avatar quality; ring lights are not recommended.
  • 🏠 For small spaces, a chair-mounted green screen combined with background removal tools (e.g., Runway) is a practical workaround.
  • 📹 The avatar training video must be a steady, 2-minute continuous take with natural delivery—no swaying, mistakes, or exaggerated gestures.
  • 🎤 To avoid uncanny audio, creators can either record their real voice separately or use Eleven Labs to cloneCreate AI Avatar Video their voice with 30+ minutes of clean audio.
  • 🤖 HeyGen integrates with Eleven Labs and automation tools like Make.com, allowing users to auto-generate avatar videos directly from written scripts.
  • 🗂️ Automation workflows can move scripts from Google Docs → Eleven Labs → HeyGen → Google Drive, streamlining the entire production pipeline.
  • 🚀 Once the avatar and workflow are set up, creators can produce high-quality videos quickly without needing to repeatedly prepare lighting, recording, or reshoots.

Q & A

  • What is an AI avatar as described in the video?

    -An AI avatar is a digital representation or "digital twin" of a person generated by tools like HeyGen that can present videos on behalf of the real person. It looks and behaves like the person but is produced by AI.

  • Which tools does the video focus on for creating avatar videos and voice clones?

    -The video focuses on HeyGen for creating the visual AI avatar and Eleven Labs (11 Labs) for cloning the voice. It also mentions Make.com for automation and Google Docs for scripts.

  • Why haven't avatar videos become universally popular yet?

    -Mainly because of the "uncanny valley": poorly made avatars can feel slightly wrong or unsettling. Also concerns about impersonation and the setup complexity slow wider adoption.

  • When is it appropriate to use an avatar video versus a real-camera video?

    -Avatars are great for short videos, internal company training, business updates, sales material, or replacing small clips during fixes. They're less recommended for long, personal YouTube videos (the video suggests avoiding >5 minutes) where a close, authentic connection matters. For those looking to create AI avatars, the Kling Taliking Avatar API offers a powerful solution..

  • What is the recommended recording input HeyGen requires to build an avatar?

    -HeyGen requires about 2 minutes of continuous video material of you presenting (a single take) to create an avatar.

  • What are the key setup elements for getting a convincing avatar?

    -Composition and lighting are most important: a reasonably large room with a pleasing background, a stable camera/tripod, a front LED light placed slightly to the side for soft shadows, and a backlight (edge light) to separate you from the background. Avoid ring lights if possible.

  • What equipment does the presenter recommend for better audio and lighting?

    -For audio, the presenter recommends a good microphone such as the Rode NT-USB (referred to as 'road NT USB') and suggests using a decent front LED lamp for lighting (priced around $200–$250 in the transcript).

  • Can you use a free HeyGen account to test avatars?

    -Yes — you can create an avatar for free, but exports will carry a HeyGen watermark and be limited to a maximum of 720p, which is fine for testing but not ideal for production use.

  • What are HeyGen pricing notes mentioned in the video?

    -The creator plan is mentioned at $29/month. For videos longer than about 5 minutes, the video says you need the team subscription, which is $10/month more (i.e., roughly $39/month).

  • How should you record footage for creating the avatar to avoid problems?

    -Use a fixed camera on a tripod, position yourself centrally, stay relatively still, avoid obvious fluffs or distracting gestures, wear neat clothing (remove dust), and provide a single continuous take of at least two minutes in the style you want your avatar to use.

  • What are practical tips if you don't have a big filming space?

    -Use a small green screen behind your chair, employ compact lights, or use background-removal tools like Runway (the presenter mentions Runway's background removal costs about $15) to replace and blur backgrounds.

  • How does voice cloning with Eleven Labs work and what are the audio requirements?

    -Eleven Labs needs a substantial amount of clear audio to train a realistic clone — at least 30 minutes is required, with 2–3 hours recommended for best results. Clean, edited recordings (remove ums, fluffs) give the best clone.

  • What trade-offs exist between using your real recorded voice versus a voice clone?

    -Recording your own voice for each video yields the most natural audio but prevents full automation. A well-trained Eleven Labs clone enables automation but may never be 100% perfect and can reintroduce uncanny qualities if trained on noisy or unedited audio.

  • How can you automate the production pipeline for avatar videos?

    -The presenter built a two-step automation in Make (Integromat): watch for a Google Docs script saved to a Drive folder, push the script to Eleven Labs (via HTTP API if needed) to generate audio, then send text/audio to HeyGen to create the video; completed videos are saved back to Google Drive via a webhook.

  • What file-handling tip does the presenter give for transferring large phone video files?

    -On iPhone, save the recorded video to Files (instead of directly sharing from the Photos app) so you can access iCloud from a browser and download the high-resolution file to desktop for uploading to HeyGen.

  • How quickly does HeyGen process and make the avatar available after upload?

    -According to the transcript, avatar processing in HeyGen takes only a couple of minutes — it's surprisingly quick.

  • What are common pitfalls to avoid when preparing audio for a voice clone?

    -Avoid leaving in repeated filler words, ums, long hesitations, or pronunciation inconsistencies. Clean recordings (using tools like Audacity or hiring an editor) lead to a better-trained voice model.

  • What final advice does the presenter give about using avatars on social media?

    -Start with short, well-produced avatar videos and be transparent with your audience when appropriate. Avatars can save time and be very effective when used in the right contexts, but watch for uncanny artifacts and use them where the audience and message fit.

Outlines

00:00

🤖AI avatar creation guide AI Avatars: The Future of Video Creation

This paragraph introduces the concept of AI avatars, exploring how they are used in training videos, social media, and other digital content. The speaker reveals that the video they are presenting is not them in person, but an AI-generated version created using an application called Hey Gen. They discuss the idea of a 'digital twin' and share their experience with creating AI avatars, noting that while the technology is promising, many people are hesitant due to concerns about impersonation. The speaker also touches on the limitations of avatars, including the uncanny valley effect, and explains that avatars work best for short videos and business use cases where audience size is smaller. They also highlight how avatars can save time and offer convenience for creating content without needing to reshoot videos.

05:00

💡 Setting Up Your Avatar: The Right Environment

In this paragraph, the speaker emphasizes the importance of setup in creating a high-quality avatar. They share their personal experience from an online course called On-screen Authority, which taught them the significance of composition and lighting for video creation. The speaker explains how a good background and lighting can drastically improve the quality of an avatar. They provide specific advice on lighting, recommending a round LED front light and a backlight to create an edge effect, as well as tips for small spaces where a green screenAI avatars creation guide might be used. The focus here is on how lighting and composition can enhance the realism of an avatar and the setup required for a successful avatar video shoot.

10:01

🎥 Recording Your Avatar: Tips for a Smooth Process

This paragraph focuses on the recording process needed to create a realistic AI avatar. The speaker provides tips for a successful avatar recording, such as using a fixed camera, sitting up straight, and avoiding distractions or errors while recording. They also advise on the importance of wearing clean, wrinkle-free clothing and avoiding excessive movement, which could lead to unnatural avatar behaviors. The speaker mentions the importance of recording for at least two minutes and using the back camera of a phone for the best video quality. Finally, they discuss file management, suggesting ways to upload the footage quickly for avatar processing.

15:02

🔊 Voice Cloning: Making Your Avatar Speak Like You

This paragraph discusses the voice cloning process for avatars, detailing the steps to record a high-quality voice and upload it to create a realistic voice clone. The speaker explains that while Hey Gen's voice capture is limited by the short recording time, a better result can be achieved by separately recording voice with a quality microphone, such as the Rode NT USB. They also recommend cleaning up the audio to remove any errors or background noise. The speaker then introduces 11 Labs, a tool for creating a more advanced voice clone, which requires several hours of pre-recorded audio. After explaining the importance of high-quality voice recordings, the speaker gives guidance on the setup and use of 11 Labs to clone a voice and integrate it with the avatar video.

⏩ Automating Avatar Creation: Streamlining the Process

In this paragraph, the speaker dives into automating the avatar creation process using tools like Make and Google Drive. They explain a straightforward two-step automation process that saves time by automatically pushing a script from Google Docs to 11 Labs for voice cloning, and then to Hey Gen for avatar creation. Once the video is generated, it is uploaded to Google Drive. The speaker provides a technical overview of how to set up this automation, including the use of approval codes and web hooks to trigger the next steps. This process, while not complex, can significantly speed up the creation of avatar videos, especially for those familiar with automation tools.

Mindmap

Keywords

💡AI avatar

An AI avatar is a computer-generated, video representation of a real person created by artificial intelligence. In the video script, the speaker repeatedly refers to an "AI avatar" as a digital twin that can present on their behalf and save time when producing videos. Examples from the script include using the avatar for short social clips, internal company training, or as a patch when a live shoot has a mistake.

💡HeyGen

HeyGen is the specific platform used in the script to create the visual avatar and generate avatar videos. The narrator explains the steps for uploading footage to HeyGen, creating an avatar with a two-minute recording, and using the app's video creator; they also compare the free tier (watermark and 720p limit) to paid plans like the Creator plan. HeyGen is central to the video's workflow — it transforms the recorded human footage into the talking, moving digital representation.

💡Eleven Labs (11 Labs)

Eleven Labs (11 Labs) is presented as the tool for creating a realistic voice clone to pair with the avatar. The script explains that 11 Labs requires a substantial amount ofAI Avatar Video Guide clean audio (at least 30 minutes, ideally 2–3 hours) to train a convincing clone, and that integrating a cloned voice into HeyGen improves realism and helps avoid the 'uncanny valley'. The narrator also details using 11 Labs via its API and automating the flow from text→11 Labs→HeyGen.

💡Uncanny valley

The uncanny valley describes the unsettling feeling people get when a humanoid or realistic simulation is almost—but not quite—lifelike. The script uses this concept to explain why avatar videos sometimes feel 'robotic and weird' and why longer avatar videos increase the risk of viewers noticing imperfections. The narrator argues that recent advances have helped move past the uncanny valley for short or business-focused avatar use, but cautions against long personal YouTube videos where the effect could harm connection with the audience.

💡Voice clone

A voice clone is an AI-generated imitation of a person's voice created by training on recordings of that voice. In the video, the narrator discusses recording 30 minutes to several hours of high-quality audio (or hiring someone to clean it) so 11 Labs can build a usable voice clone; they note that a cloned voice reduces the need to record each script manually but requires careful editing to avoid capturing filler words or odd pronunciations. The clone is used in the text-to-speech editor and can be selected inside HeyGen for synchronized avatar speech.

💡Composition

Composition refers to how the subject is framed within the camera shot including background elements and positioning. The script emphasizes good composition (e.g., sitting centered, aligning with a door in the background) as a key factor for a convincing avatar because the visual setup used for the avatar recording will determine how natural the avatar appears. The narrator contrasts poor mobile framing with a more polished setup to show how composition affects final avatar quality.

💡Lighting setup

Lighting setup covers the placement and type of lights used to illuminate the subject for recording. The narrator recommends a round newer LED front light positioned to create one side lit and the other slightly in shadow, plus a backlight to create an edge that separates the subject from the background — details that directly improve avatar realism. Examples include avoiding ring lights (because of reflections in the eyes) and using inexpensive lamps for a backlight if necessary.

💡Green screen / background removal

A green screen is a backdrop used to easily separate the subject from the background and replace it in post-production. The script suggests a small green screen for cramped spaces and mentions using Runway (an AI tool) or traditional keying/rotoscoping in After Effects to remove backgrounds and insert a blurred or alternate scene, which helps create a convincing avatar environment when the recording space is limited. This technique enables more consistent, professional-looking avatar videos even in small rooms.

💡Automation (make.com)

Automation here means using make.com to connect apps and automatically produce avatar videos from scripts. The narrator describes a two-step automation: watch a Google Drive folder for a Google Doc script, trigger processing only when an approval code is written, send the script to 11 Labs (via HTTP request) to produce audio, then push that into HeyGen, and finally save the finished video back to Google Drive via webhook. This saves repetitive manual work and speeds up production when making many avatar videos.

💡Google Docs (script workflow)

Google Docs is used as the place to write and approve video scripts within the automation pipeline. In the described workflow, a Google Doc containing the script triggers the automation when a specific approval code (e.g., "approved 12345") is added; make.com then reads that Doc and forwards the text into the TTS/production pipeline. Using Google Docs provides a familiar collaborative place for drafting, editing, and approving scripts before the avatars are generated.

💡Free vs paid tiers (watermark / export limits)

The video explains that HeyGen’s free tier allows testing but includes a watermark and limits exports to 720p, while paid plans (Creator, Team) unlock higher-quality exports and longer video lengths. The narrator recommends the Creator plan ($29/month) for most creators and notes that videos longer than five minutes require the Team subscription; this distinction guides viewers on which plan suits testing versus professional use. The script uses pricing and feature examples to help viewers weigh cost against time-savings benefits.

💡Use cases and time savings

Use cases and time savings are the practical reasons the narrator promotes avatar videos: internal training, company updates, sales material, and short social clips that would otherwise require repeated onsite filming. Throughout the script the narrator emphasizes that avatars save time (once you have a good setup you don't need to reshoot) and allow busy leaders to deliver consistent messages; examples include swapping in an avatar clip to fix a small mistake instead of redoing a full shoot, or using avatars for frequent internal updates from executives.

Highlights

Introduction to creating AI avatars and digital twins using HeyGen.

The uncanny valley and how realistic avatars can avoid looking robotic.

The process of creating a high-quality avatar setup with good composition and lighting.

Using HeyGen to create an avatar and the importance of having a fixed camera for accurate avatar representation.

Key tips for presenting naturally to the camera for the best avatar results.

The benefits of using avatars for business purposes, like training videos and internal communication.

How avatars save time in video production by avoiding re-shoots and creating reusable assets.

The best lighting setup for creating a convincing avatar, including the use of LED lights and backlighting.

How a green screen can help when working with smaller spaces for avatar production.

The importance of using a good microphone for voice recording when creating AI avatars.

How 11 Labs can be used to clone your voice for more realistic AI avatars.

How to record and clean audio for voice cloning, includingCreate AI Avatar Video tips for better quality recordings.

Automation through make.com to streamline the process of generating avatar videos and uploading them to Google Drive.

How HeyGen integrates with 11 Labs for both avatar creation and voice cloning.

The potential for social media use of AI avatars and the benefits of this technology for content creators.