I bought the books. Watched the videos. Still wanted more, especially after he talked about the agent he created.
All that material is publicly available. Enough to build my own Alex Hormozi Brain Agent?
"Hey Jules, how about it?" Jules is my AI coding assistant (Claude Code). Jules ran off, grabbed transcripts of videos, text of books, whatever is available online. Guest podcasts." then turned that into files I uploaded to a Claude Project so I can chat through Claude with Alex Hormozi.
Here's what Jules found - 99 long-form YouTube video transcripts - 3 complete audiobook transcripts - 15 guest podcast transcripts - X threads
What I Did in Four Phases
Phase 1 maps the full source landscape: YouTube channel (4,754 videos), The Game podcast (~900+ episodes), three books, guest podcast appearances, X/Twitter. Figure out what's worth downloading before you start.
Phase 2 downloads and converts. Top 100 longest video transcripts, full audiobook transcripts for all three books, 15 guest podcast transcripts from the highest-view-count appearances, and whatever X/Twitter content the API will give you.
Phase 3 runs voice pattern analysis. Sentence structure, reasoning skeleton, core frameworks, teaching style, verbal signatures. This is where the persona takes shape.
Phase 4 builds the system prompt and optimizes the knowledge base to fit within Claude Projects' limits. Then deploy.
Phase 1: Inventory
The @AlexHormozi YouTube channel has 4,754 videos. That number is misleading. 4,246 of those are Shorts (under 60 seconds or no duration metadata). Filter those out and you have 508 full-length videos. That's the real content library.
Beyond YouTube, the main sources worth pursuing:
- The Game podcast (~900+ episodes). His primary long-form output. The audiobooks for all three books are available free on the podcast and YouTube.
- Guest podcast appearances. DOAC, Impact Theory, School of Greatness, Modern Wisdom, Danny Miranda. Hosts push him off-script and into territory he doesn't cover in his own content. High value per byte.
- X/Twitter threads. Compressed, punchy formulations of his frameworks. Different texture than the long-form material.
- Skool community. Behind a login wall. Low ROI for this project.
- Acquisition.com. No blog. Courses are paywalled. Skip.
Phase 2: Collect
YouTube Transcripts
The first scrape of the YouTube channel only returned 494 videos. The channel has 4,754. The scraper was pulling from the /videos tab, which doesn't surface the full library. Re-running against the full channel URL (@AlexHormozi) returned everything. Easy to miss, significant difference.
After filtering Shorts: 508 full-length videos. I downloaded auto-generated captions for the top 100 longest videos (sorted by duration, so the meatiest content came first). Auto-generated captions from YouTube come as SRT files with timestamps, line numbers, and duplicate lines. Converting those to clean readable text required stripping all the formatting artifacts and deduplicating language variants (English vs English-Original).
Result: 99 transcripts. A few livestreams had no captions available.
Book Audiobook Transcripts
All three Hormozi books have full audiobook uploads on YouTube:
- $100M Offers (~4.4 hours)
- $100M Leads (~7 hours)
- $100M Money Models (~4.3 hours)
Same process as the video transcripts. Download the auto-generated captions, convert to clean text. Three files, 855KB total. These are non-negotiable core material for the knowledge base.
Guest Podcast Transcripts
Searched YouTube for Hormozi guest appearances sorted by view count. The top hit was Diary of a CEO at 4.7M views. Grabbed the 15 highest-view-count appearances.
The guest transcripts are 2.1MB total. Worth every byte. When a host like Steven Bartlett or Tom Bilyeu pushes back on a claim, Hormozi shifts into a different mode. He's more precise and sometimes reveals the edge cases he glosses over on his own channel. You can't get that from watching his channel alone.
X/Twitter Content
X's API rate limits capped the collection at 9 unique tweets. Not ideal, but enough to confirm the voice texture: "Aggressive with effort. Relaxed with outcome." His Twitter is his most compressed format. Each tweet is a framework distilled to a single line.
9 tweets is thin. For a more complete build, you'd want to manually curate 50-100 of his best threads. The API limitations made automated collection impractical.
Phase 3: Analyze
I ran voice analysis across the full corpus, looking at seven dimensions.
Hormozi's sentences are short, punchy declarations. Fragments for emphasis. "And so" as his default transition. Short bursts, then a longer sentence that lands the point. Nearly every argument follows the same five-step skeleton: bold claim, personal story, framework, math, then a reductio ad absurdum that makes the alternative sound insane. Once you see it, you can't unsee it.
The core frameworks are Grand Slam Offer, Value Equation, Supply and Demand, Leverage types, Core Four (lead generation methods), and Money Models. Define all of them precisely in the system prompt.
His default mode is intense-casual. Strategic profanity. He'll get vulnerable for a sentence, then pivot straight to the lesson. Never stays there. The teaching style is concentric repetition: same idea from four different angles in two minutes. Analogy, story, math, then back to the principle.
The verbal tics are critical for the persona. "Right?" as a check-in with the audience. "That's it." as a full stop after a framework. "The reality is..." to pivot from what people think to what's true. "you're like..." to voice the audience's resistance before dismantling it. His analogies pull from physical and competitive domains: poker, fighting, dating, weightlifting.
Coverage Assessment
The collected material captures an estimated 60-70% of his publicly available thinking. Two gaps stood out:
- Guest podcast appearances beyond the top 15. There are dozens more, each with unique material.
- X/Twitter threads. Only 9 tweets collected. His most compressed formulations live here.
Nice to have, not essential. The three books plus 99 video transcripts plus 15 guest appearances cover the core frameworks, teaching style, and reasoning patterns thoroughly.
Phase 4: Build
The System Prompt
The system prompt encodes everything from Phase 3 into a persona specification. It covers:
- Voice patterns and verbal tics (the specific phrases, the rhythm, the profanity style)
- The five-step reasoning structure
- All core frameworks with descriptions
- Teaching style (concentric repetition, the four-angle approach)
- Belief system and values
- Emotional register with examples of how he modulates it
- Seven conversational rules for how the agent handles advice-giving
- Background facts (business history, portfolio companies, personal story beats) to reference naturally
- Anti-patterns: what Hormozi doesn't do. No hedging. No "it depends" without immediately following up with when it does and doesn't depend. No abstract theory without a concrete example within 30 seconds.
Without the anti-patterns list, the model defaults to hedge-everything business coach. That's not Hormozi.
Hitting the Knowledge Limit
First attempt: 47 files, 11.4MB. Claude Projects lets you attach reference documents that persist across conversations, but the knowledge base caps out around 7MB of content. Not close.
The optimization process:
- Three books (855KB): kept as individual files. Non-negotiable.
- All 15 guest appearances merged into one file (2.1MB): unique material, high value per byte.
- Top 12 video transcripts split into two files (4.2MB total): the longest, meatiest content.
- X/Twitter threads (2KB): tiny footprint, worth including for voice calibration.
- System prompt (8KB): the persona specification.
Result: 8 files, 7.0MB. 94% of Claude Projects' capacity. The 84 remaining video transcripts didn't make the cut. The books and guest appearances got priority because they contain the most unique material. Video transcripts have significant overlap with each other (he repeats his frameworks constantly, which is great for learning but redundant in a knowledge base).
NotebookLM Alternative
Before settling on Claude Projects, I also bundled the transcripts for Google's NotebookLM, which has a 50 sources/notebook limit. That required combining 102 individual files into 38 uploadable text files: 3 books as individual files, 14 top video transcripts as individual files, and 21 "Video Bundle" files containing the remaining 84 videos in groups of 4.
Same content, completely different packaging decisions. Claude Projects has the tighter budget but a better conversational agent on the other end. NotebookLM lets you upload more but the agent doesn't use it as flexibly.
Option A: Go Build Yours
Hormozi was mine. Pick whoever matters to your business.
The material is out there for almost anyone with a substantial public body of work. Naval Ravikant, Patrick Bet-David, Seth Godin, Brene Brown. Podcasters, authors, YouTubers. If they've published 100+ hours of content, there's enough to build a useful advisor agent.
The process is the same regardless of who you pick. Inventory the sources. Download the transcripts. Analyze the voice. Package it for your LLM of choice. The whole project took about a day with Claude Code running the collection and analysis.
You still read the books and watch the videos. The agent gives you a different interface to the same material. Pressure-test your specific business problem against their frameworks instead of hoping you remember the right chapter when you need it.
Option B: The Packaging Bottleneck
The work is in the packaging. Auto-generated captions need cleaning. Files need deduplication. A 7MB knowledge limit means hard choices about what makes the cut. Voice analysis requires reading for patterns, not just content volume.
Most major business thinkers have enough publicly available material to build a useful advisor. The information exists. Turning hours of video into a structured knowledge base is where the effort goes.
Option C: What Changes When You Can Ask It
Watching a Hormozi video, you absorb frameworks passively. Whether you remember the right one when you actually need it is a coin flip.
Having a Hormozi brain agent means you can describe your specific offer and get it pressure-tested against his frameworks in real time. "Here's my SaaS pricing page. What would Hormozi say is wrong with this offer?" That's a different interaction than watching a video about pricing.
The questions I find myself asking it: How would you restructure this offer to increase perceived value without changing the deliverable? What's the biggest bottleneck in this lead generation approach? Where am I trading time for money when I should be trading money for time?
The answers aren't magic. They're his frameworks applied to your specifics. What offer would you pressure-test first?
Appendix: Full Reproduction Guide
Everything you need to build your own version. Assumes comfort with the command line.
Tools
- yt-dlp: YouTube metadata extraction and caption downloading. Install via Homebrew:
brew install yt-dlp - Python 3: File processing, deduplication, bundling
- Claude Code (or similar AI coding assistant): Voice analysis, system prompt writing, optimization
- xurl (optional): X/Twitter API search. Any Twitter API client works.
Step 1: Scrape the YouTube Channel
Pull the full video list with metadata:
bash yt-dlp --flat-playlist --print "%(id)s\t%(title)s\t%(duration)s" \ "https://www.youtube.com/@AlexHormozi" > hormozi_all_videos.tsv
Important: use the full channel URL (@AlexHormozi), not the /videos tab URL. The videos tab returns a subset. The full channel URL returns everything.
Step 2: Filter Out Shorts
Shorts are videos under 60 seconds. Filter them with a simple Python script or awk:
```python import csv
with open('hormozi_all_videos.tsv') as f: reader = csv.reader(f, delimiter='\t') full_length = [] for row in reader: try: dur = int(float(row[2])) except (ValueError, IndexError): continue if dur >= 60: full_length.append(row)
Sort by duration, longest first
full_length.sort(key=lambda r: int(float(r[2])), reverse=True)
with open('hormozi_full_length.tsv', 'w') as f: writer = csv.writer(f, delimiter='\t') writer.writerows(full_length) ```
Note: --flat-playlist returns incomplete duration data for some videos (shows as NA or empty). Full-length videos with missing durations will be dropped by this filter. For more complete results, drop --flat-playlist and let yt-dlp load each video page (much slower, but accurate durations).
This should yield ~500+ full-length videos depending on when you run it.
Step 3: Download Transcripts
Download auto-generated captions for your top N videos (we used the top 100 by duration):
```bash
For each video ID in your filtered list:
yt-dlp --write-auto-sub --sub-lang "en.*" --convert-subs srt --skip-download \ -o "transcripts/%(id)s" \ "https://www.youtube.com/watch?v=VIDEO_ID" ```
Use --sub-lang "en.*" to catch language variants (en, en-US, en-orig). The --convert-subs srt flag forces consistent output format. Some videos (especially livestreams) may not have auto-generated captions.
Step 4: Convert SRT/VTT to Clean Text
SRT files contain timestamps, line numbers, and duplicate lines from the auto-caption process. Strip all of that:
```python import re
def srt_to_text(srt_content): # Remove line numbers text = re.sub(r'\d+\s*$', '', srt_content, flags=re.MULTILINE) # Remove timestamps text = re.sub(r'\d{2}:\d{2}:\d{2}[.,]\d{3}\s-->\s\d{2}:\d{2}:\d{2}[.,]\d{3}', '', text) # Remove VTT headers text = re.sub(r'WEBVTT.*$', '', text, flags=re.MULTILINE) # Collapse whitespace lines = [l.strip() for l in text.splitlines() if l.strip()] # Deduplicate consecutive identical lines deduped = [lines[0]] if lines else [] for line in lines[1:]: if line != deduped[-1]: deduped.append(line) return ' '.join(deduped) ```
This deduplication handles the standard YouTube overlap artifact (each caption block repeats the prior line). The result is clean enough for an LLM knowledge base, though not perfectly formatted prose.
Also check for language variant duplicates. YouTube sometimes generates both en and en-orig captions for the same video. Keep one.
Step 5: Download Book Audiobook Transcripts
Search YouTube for the full audiobook uploads: - "$100M Offers full audiobook" (~4.4 hours) - "$100M Leads full audiobook" (~7 hours) - "$100M Money Models full audiobook" (~4.3 hours)
Same download and conversion process as the video transcripts. These three files are the highest-value content per byte.
Step 6: Guest Podcast Transcripts
Search YouTube for guest appearances sorted by view count:
bash yt-dlp --flat-playlist --print "%(id)s\t%(title)s\t%(view_count)s\t%(duration)s" \ "ytsearch100:Alex Hormozi interview podcast" > hormozi_guest_search.tsv
Manually curate the top 15-20 highest-quality appearances. Look for hosts who push back (Steven Bartlett, Tom Bilyeu, Lewis Howes). Download and convert transcripts the same way.
Step 7: X/Twitter Content (Optional)
If you have Twitter API access:
bash xurl search "from:AlexHormozi" -n 50
X's API requires paid access for meaningful timeline search, and even paid tiers have aggressive rate limits. You'll likely get far fewer results than requested. Manually curating tweets from his profile page is more practical for most people.
Step 8: Voice Analysis
Feed a representative sample of transcripts (10-15, mixing books, videos, and guest appearances) to Claude or another LLM with this prompt:
Analyze this person's communication style across these transcripts. Identify: sentence structure patterns, reasoning skeleton (how arguments are built), core recurring frameworks, emotional register and how it shifts, teaching methodology, verbal signatures and verbal tics, preferred analogy domains, and anti-patterns (what they never do).
Use the analysis output to write the system prompt.
Step 9: Build the System Prompt
The system prompt should cover:
- Voice patterns (sentence length, fragment usage, transitions)
- Reasoning structure (the step-by-step argument skeleton)
- All core frameworks with one-paragraph descriptions
- Teaching style (how to explain, re-explain, use examples)
- Emotional register (default mode, when it shifts, how profanity is deployed)
- Conversational rules (how to handle pushback, how to give advice, when to use stories)
- Background facts (career history, portfolio, personal story beats)
- Anti-patterns (what the persona never does, what to avoid)
Test the prompt with questions you know the real person has answered. Compare the agent's response to how they actually answered. Iterate.
Step 10: Package for Claude Projects
Claude Projects lets you attach reference documents to a Claude conversation that persist across sessions. The knowledge base caps out around 7MB of content (token-based under the hood, but ~7MB of clean text is the practical ceiling). If your total content exceeds that:
- Prioritize books (most structured, highest unique value per byte)
- Guest appearances next (unique material not available elsewhere)
- Merge remaining files by category (video bundles, podcast bundles)
- X/Twitter content last (small footprint, useful for voice calibration)
- System prompt as a separate file
Upload all files to a Claude Project. The system prompt goes in the Project Instructions, not as a knowledge file.
For NotebookLM, the limit is 50 sources per notebook with per-source size caps as well. You may need to bundle multiple transcripts into single files to stay under both limits.
Step 11: Test and Iterate
Ask the agent questions across different domains: - Offer construction ("Review this offer and tell me what's wrong") - Lead generation ("What would you change about my lead magnet?") - Business model ("I'm charging $X for Y. What should I change?") - Mindset ("I'm afraid to raise my prices. What am I getting wrong?")
Compare responses to how the real person has addressed similar topics. The system prompt almost always needs 2-3 rounds of refinement before the voice feels right.