[Video Summary App] Follow my journey building a webapp as a non-coder

Hi everyone! This is Robin. I don’t have a programming background. After trying to follow tutorials building projects that I didn’t care about, I thought: why don’t I build something I’m truly excited about? Trust me I’ve been through the ‘omg I have this brilliant idea that will change the world so I can’t tell anyone about it’ phase. What I’ve learned is, ideas are worthless. It’s the execution that matters.

Unlike most ‘Building website with AI’ videos on YouTube, I have very little coding experience. Those videos often edit out all the moments of struggle, which made me think: am I the dumbest person on the planet? Maybe I can’t do this!

I hope if I document my journey of building my first app with Replit from scratch to deployment, two things will happen: 1) I’ll get help from the community, and 2) others will see my journey and know they’re not struggling alone—and most importantly, realize they can do this too!

Over the next few days, I’ll be documenting my journey of building this app. But first, let me tell you what I’m building and why.

I love learning and sharing. I enjoy watching lectures, podcasts, and TED Talks on YouTube. Sometimes the content makes me think of specific friends, and I really want to share the videos with them. I usually type a quick summary of the video and explain why I think it would benefit my friend. However, this process can be time-consuming. I just want to quickly have AI summarize the content so I can send it to my friend.

In my next update, I’ll share how I’m approaching this project and what information I’ve gathered to help me get started.

11 Likes

Step 1 - Scoping Phase
Goal: To define a minimum viable product and understand what programing language and APIs to use.
Tools used: claude.ai

Since I am not a programmer, I have no clue how to get started. So I think: why not let AI help me figure out what to do?

Here is the project instruction I gave to claude

You are an experienced software project manager specializing in web app development. You are going to help turning users needs into a product requirement document for a minimal viable product.

This is the prompt I gave to Claude:

I want to build a web app that can summarize the content of a YouTube video. I want to be able to grab a YouTube URL and paste into this app. The app will be able to extract the audio and then turn it into a transcript. Then the app will provide a summary of the transcript.

Could you help me generate a product requirement documents for an MVP that could achieve this? I want you to list all the required tools (APIs) and also provide me with a clear step-by-step development process.

Claude gave me a super complicated response which I had no idea.

Claude's response (don't actually read it)

I’ll help you create a Product Requirements Document (PRD) for your YouTube video summarization web app MVP. Let me break this down into clear sections.

YouTube Video Summarization Web App - MVP PRD

Product Overview

A web application that allows users to input a YouTube video URL and receive an AI-generated summary of the video content by processing its audio transcript.

Core Features (MVP)

  1. URL Input Interface

    • Simple text input field for YouTube URL
    • URL validation to ensure valid YouTube format
    • Submit button to initiate processing
  2. Video Processing

    • YouTube video data extraction
    • Audio extraction from video
    • Audio-to-text transcription
    • Text summarization
  3. Results Display

    • Display video metadata (title, duration)
    • Display generated transcript
    • Display summarized content
    • Error handling messages

Technical Requirements

Required APIs and Tools

  1. YouTube Data API v3

    • Purpose: Video metadata retrieval and validation
    • Alternative: youtube-dl library for video information extraction
  2. Speech-to-Text Service (options)

    • OpenAI Whisper API
    • Google Cloud Speech-to-Text
    • Azure Speech Services
  3. Text Summarization Service

    • OpenAI GPT API
    • Alternative: Cohere or Anthropic Claude API
  4. Backend Framework

    • Node.js with Express.js
    • Alternative: Python with FastAPI
  5. Frontend Framework

    • React.js with TypeScript
    • TailwindCSS for styling
  6. Database (Optional for MVP)

    • MongoDB for storing processed results
    • Alternative: SQLite for simpler implementation

Development Process

Phase 1: Setup and Basic Infrastructure (1-2 weeks)

  1. Set up development environment
  2. Initialize frontend and backend projects
  3. Configure basic API routing
  4. Implement basic error handling
  5. Set up CI/CD pipeline

Phase 2: YouTube Integration (1 week)

  1. Implement URL input and validation
  2. Integrate YouTube Data API
  3. Set up video metadata retrieval
  4. Implement audio extraction functionality
  5. Add error handling for invalid URLs or unavailable videos

Phase 3: Transcription Service (1-2 weeks)

  1. Integrate chosen Speech-to-Text API
  2. Implement audio processing pipeline
  3. Set up transcript storage
  4. Add progress indicators for long-running processes
  5. Implement error handling for failed transcriptions

Phase 4: Summarization Service (1 week)

  1. Integrate chosen AI summarization API
  2. Implement text processing pipeline
  3. Add summary generation functionality
  4. Implement error handling for failed summarizations

Phase 5: Frontend Development (1-2 weeks)

  1. Design and implement user interface
  2. Create responsive layouts
  3. Add loading states and progress indicators
  4. Implement error message displays
  5. Add basic animations and transitions

Phase 6: Testing and Deployment (1 week)

  1. Conduct unit testing
  2. Perform integration testing
  3. Execute user acceptance testing
  4. Deploy MVP to chosen hosting platform
  5. Monitor initial usage and gather feedback

Technical Considerations

Rate Limiting and Quotas

  • Implement API request throttling
  • Monitor API usage and costs
  • Set up usage quotas per user (if needed)

Error Handling

  • Network connectivity issues
  • Invalid video URLs
  • API failures
  • Processing timeouts

Performance

  • Implement caching for processed videos
  • Consider implementing queue system for multiple requests
  • Optimize API calls to minimize costs

Future Considerations (Post-MVP)

  1. User authentication
  2. History of processed videos
  3. Custom summarization parameters
  4. Multiple language support
  5. Share functionality
  6. API access for other applications

Would you like me to elaborate on any of these sections or provide more specific technical details for any particular component?

I thought this was way too over my head. So I followed up with another prompt:

This seems very complicated. Can you simplify this? Please assume 1 college student are trying to do this project on their own. Use the easiest way to develop this.

Claude's Response (again, don't read it. I will sumarize later)

Ah, you’re right - let me simplify this for a solo student project. Here’s a streamlined version that focuses on getting a working MVP with minimal complexity:

Simplified YouTube Video Summarizer - Student Project

Core Features

  1. Simple webpage with a YouTube URL input
  2. Get video transcript
  3. Generate a summary
  4. Display results

Technology Stack (Simplified)

  1. Frontend: Plain HTML, CSS, JavaScript

    • No need for React or complex frameworks
    • Just create a simple form with vanilla JavaScript
  2. Backend: Python with Flask

    • Flask is easier to learn and set up compared to Node.js
    • Perfect for simple API endpoints
    • Great Python library support for our needs
  3. Key APIs/Libraries:

    • youtube-transcript-api: Gets transcripts directly from YouTube
    • OpenAI API: For text summarization using GPT
    • That’s it! No need for complex speech processing

Step-by-Step Development Process

1. Setup (1 day)

  1. Install Python and required packages:
    pip install flask youtube-transcript-api openai python-dotenv
    
  2. Create a basic project structure:
    youtube-summarizer/
    ├── app.py
    ├── static/
    │   ├── style.css
    │   └── script.js
    └── templates/
        └── index.html
    

2. Basic Frontend (1-2 days)

Create a simple HTML page with:

  • URL input field
  • Submit button
  • Area to display results
  • Basic CSS for styling

3. Backend Development (2-3 days)

  1. Set up Flask server
  2. Create two main endpoints:
    • /get_transcript: Gets YouTube transcript
    • /summarize: Summarizes the transcript

4. Integration (1-2 days)

  1. Connect frontend to backend
  2. Add error handling
  3. Add loading states

Sample Code Structure

Backend (app.py):

from flask import Flask, request, jsonify
from youtube_transcript_api import YouTubeTranscriptApi
import openai

app = Flask(__name__)

@app.route('/get_transcript', methods=['POST'])
def get_transcript():
    video_url = request.json['url']
    video_id = extract_video_id(video_url)  # You'll need to write this function
    transcript = YouTubeTranscriptApi.get_transcript(video_id)
    return jsonify(transcript)

@app.route('/summarize', methods=['POST'])
def summarize_text():
    text = request.json['text']
    summary = get_summary(text)  # Using OpenAI API
    return jsonify({'summary': summary})

if __name__ == '__main__':
    app.run(debug=True)

Frontend (index.html):

<!DOCTYPE html>
<html>
<head>
    <title>YouTube Video Summarizer</title>
    <link rel="stylesheet" href="/static/style.css">
</head>
<body>
    <div class="container">
        <h1>YouTube Video Summarizer</h1>
        <input type="text" id="url-input" placeholder="Enter YouTube URL">
        <button onclick="processVideo()">Get Summary</button>
        <div id="result"></div>
    </div>
    <script src="/static/script.js"></script>
</body>
</html>

Required API Keys

  1. OpenAI API key (for GPT)
    • Sign up at OpenAI’s website
    • Get API key
    • Free credits for testing

Development Tips

  • Start with a single video URL to test
  • Use videos during development
  • Print intermediate results to debug
  • Test with various types of videos
  • Keep the UI simple - focus on functionality

Common Issues to Watch For

  • Very long videos might cost more API credits
  • Some YouTube videos don’t have transcripts
  • API rate limits
  • Network timeouts

Estimated Timeline

  • Total time: 1-2 weeks for basic working version
  • Additional week for refinements and bug fixes

This simplified version:

  • Requires minimal setup
  • Uses fewer dependencies
  • Is easier to debug
  • Can be completed by one person
  • Still delivers core functionality

Would you like me to explain any part in more detail or provide more specific code examples?

From the response, I learned that the core features are:

  • Simple webpage with a YouTube URL input
  • Get video transcript
  • Generate a summary
  • Display results

The key API and library I need to use are:

  • youtube-transcript-api: Gets transcripts directly from YouTube
  • OpenAI API: For text summarization using GPT

The programing languages are:
Frontend: Plain HTML, CSS, JavaScript
Backend: Python with Flask

Okay that’s easy enough. The only thing I didn’t agree with by looking at this response is that: youtube-transcription-api is only extracting transcript/subtitles for a given video. Ideally I want my app to be able to transcribe any video. But for an MVP, I will take that.

It’s always important to remember the goal: get to a functional MVP. Then we can add more features.

I’m pretty happy with this response. So I asked Claude to help me generate a prompt for Replit:

This is awesome. I just want you to re-write the core features and technology stack into a single paragraph AI prompt so I can hand it to Replit to build my prototype

Here is Claude’s response:

Here’s a clear prompt for Replit to build your YouTube video summarizer:

Create a simple web application that takes a YouTube video URL as input and generates a summary of its content. The app should have a single HTML page with a text input field for the URL and a submit button. Use Python with Flask for the backend, and vanilla JavaScript for the frontend. The core functionality should: 1) accept a YouTube URL, 2) fetch the video transcript using the youtube-transcript-api library, 3) use the OpenAI API to generate a summary of the transcript, and 4) display both the transcript and summary on the page. Keep the UI minimal with basic CSS styling. The app should handle errors gracefully, such as invalid URLs or missing transcripts. Include loading states to show when the app is processing. Use environment variables to store the OpenAI API key securely.

Step 1 Goal achieved!

That’s all for this update. No coding has been done yet. Will start that tomorrow!

Fun fact: I was trying to post this but got this error message :rofl::

4 Likes

Step 2: Building MVP with Replit

After entering the prompt, replit agent quickly get to work. It built a very basic UI and asked me to enter an URL to test. After I entered the URL, the prototype says failed to generate a summary. I told Replit to fix it. It returned an error code:

{
“error”: {
“message”: “The model gpt-4o does not exist or you do not have access to it.”,
“type”: “invalid_request_error”,
“param”: null,
“code”: “model_not_found”
}
}

I tried to ask Replit to fix it but then realized that maybe it’s because I didn’t pay OpenAI. After funded the OpenAI API account, the MVP works! I know have a functional app that can take existing transcript from a YouTube video and generate a summary of it!

3 Likes

Step 3: Testing

So now that I have built the MVP, I logged into OpenAI API to see how much it cost me to do a summary. The cost is 1 cent with the current setup for Matt’s video: https://www.youtube.com/watch?v=rGYdyb58wJY (5min ish).

I know there are other ways to achieve the same result. For example, instead of using youtube-transcript-api, why can’t I have openAI Whsiper to extract the transcript from the audio? Would that be more accurate than the auto generated caption?

Well let’s test it!

I asked Replit to create a second column for a second input field and have that uses openAI.

I used the same video again and here is what it looks like:

From first glance, whisper separated all the sentences with punctuation.

1 Like

Step 3 [continue]

I uploaded the transcriptions to claude and ask claude to compare the two and provide a conclusion:

Claude Full Response for Transcription

I’ll help analyze the differences between these two transcripts. Let me compare them systematically:

  1. Formatting and Readability:
  • The Whisper transcript has better punctuation, capitalization, and paragraph breaks
  • The API (YouTube) transcript has more run-on sentences and missing punctuation
  • Whisper generally produces cleaner text formatting with proper spacing
  1. Company Name Representation:
  • API: Shows “repet” throughout
  • Whisper: Correctly shows “Repl.it”
  • This is a significant accuracy difference as Repl.it is the correct company name
  1. Domain Names and Technical Terms:
  • API: “hat tip. rep. apppp”
  • Whisper: “hat tip dot Repl.it dot app”
  • Whisper handles domain names more naturally by writing out “dot” instead of using periods
  1. Acronyms and Technical Terms:
  • API: “DNS” and “txt” are inconsistently formatted
  • Whisper: Maintains consistent capitalization for “DNS” and “TXT” records
  1. Speech Disfluencies:
  • Both transcripts capture filler words like “um” and “uh”
  • Whisper tends to handle these more cleanly and naturally in the sentence structure
  1. Small Word Accuracy:
  • API has more missing articles and prepositions
  • Whisper maintains better grammatical structure with proper articles (“the”, “a”, etc.)

Overall, the Whisper transcript appears to be significantly more accurate, particularly in:

  • Proper company name recognition
  • Technical terminology
  • Punctuation and readability
  • Natural language flow
  • Domain name formatting

The API (YouTube) transcript, while capturing the general content, has more errors in technical terms and formatting, making it slightly harder to follow for technical instructions.

Would you like me to analyze any specific sections in more detail or focus on particular aspects of the transcription differences?

Claude's full response for Summary Comparison

Let me compare these two summaries:

  1. Structure and Organization:
  • Summary 1 is more concisely organized with clear bullet points and a linear flow
  • Summary 2 is more detailed and includes additional context, using nested bullet points for better categorization
  1. Personal Elements:
  • Summary 1 uses a more neutral tone, referring to “the presenter”
  • Summary 2 includes the presenter’s name (Matt) and adds personal touches like his emphasis on patience
  1. Technical Details:
  • Both cover the core technical steps, but present them differently:
    • Summary 1 focuses on the sequential steps with numbered instructions
    • Summary 2 organizes information into thematic sections
  1. Additional Context:
  • Summary 2 provides more context about:
    • The sample project’s functionality (adds hats to images)
    • Matt’s role at Repl.it
    • The philosophical aspects about patience in technical tasks
    • Links to additional resources
  1. Level of Detail:
  • Summary 1 is more procedural and focused on the technical implementation
  • Summary 2 provides a broader context with more background information and supporting details
  1. Length and Comprehensiveness:
  • Summary 1 is more concise and focused on the essential steps
  • Summary 2 is longer and includes more contextual information, making it more comprehensive
  1. Formatting:
  • Summary 1 uses a combination of bullets and numbered lists
  • Summary 2 uses consistent bullet point formatting with clear section headers

The main difference is that Summary 1 serves as a more technical, step-by-step guide, while Summary 2 provides a more comprehensive overview with additional context and background information. Summary 2 might be more useful for beginners who need more context, while Summary 1 might be better for those who just need the technical steps.

It looks like there is no significant difference between the final result. Despite not having punctuation, openAI is able to extract all the relevant information to provide a summary anyway. But what are the cost difference between the 2 method?

Cost:
YouTube-transcript-api: 1 cent
Whisper: 3 cent

The cost difference for the same video is 300% for almost the same result!?

Another thing to test is to see how different length video affect the cost. So I decided to put in a 20min video

For the ‘YouTube-transcript-api’ method, the cost is still 1 cent. For the ‘whisper’ method, 20min seems to be too long. I got an error code:

Failed to transcribe video using Whisper: Error code: 413 - {‘error’: {‘message’: ‘413: Maximum content size limit (26214400) exceeded (26327126 bytes read)’, ‘type’: ‘server_error’, ‘param’: None, ‘code’: None}}

There is no charge for this failed one. It’s worth noting that there is a significant wait time for the ‘whisper’ method.

2 Likes

Another thing to mention is:

While I was waiting for my ISP issue to be resolved (I talked about this in another thread), I found a workaround to achieve my goal of summarizing a video:

  1. Download the audio of the video using yt-dlp api (free)
  2. Upload the video to otter.ai. Otter.ai will generate the full transcript and a summary.
  3. I can also take the full transcript generated by otter.ai and ask Claude or chatgpt to summarize it

This method works but there are the cons:

  1. Otter.ai can only transcribe up to lifetime 3 videos for free membership. After that it’s almost $20 to upgrade to 10 videos per months.
  2. it takes a lot of manual copy and pasting.

Overall, building this tool with Replit allows me to quickly achieve my goal with a click of a button.

Some lessons learned while building this project:

  1. Your work is only as good as your preparation. It’s important to remember what your goal is and not get too caught up by adding all the crazy features on the frontend (guilty as charged).
  2. It’s important to know the options: there are so many different ways to achieve the same goal. Different options come at different cost. It’s important to be aware of the options. For building MVP, you don’t need to worry about the cost too much. But for building an app that scales, the difference in cost becomes significant (as demonstrated with my Step 3)
  3. MVP MVP MVP! Before this project, I have tried using a lot of other AI coding tools. What I learn is that it’s better to build the core functionalities first before adding other features.

Currently deploying the web app and will share the link in the original post.

I have deployed my first app!

The link is here: https://video-insight-generator-robinbobinpopin.replit.app/

I ran into a weird issue: the app works in development mode but once deployed, I was not able to fetch the transcript.

It’s getting late so I am going to troubleshoot tomorrow.

2 Likes

I also get this, especially using APIs. Some APIs throttle or geo-locks (some APIs aren’t allowed in some countries). You might want to check the error codes in the console. Awesome journey, learning from you.

Thank you for kind words and thoughtful response. I did some googling and found others are experiencing the same issue.

From what I gathered, the problems seem to be on YouTube side. YouTube blocks a lot of the public IP from executing the API. That explains the reason why the API works in development environment but fails once it’s deployed.

I’ve found 2 potential fixes:
Option 1: use a proxy with an IP address that hasn’t been banned
Option 2: someone suggests using a different api called youtubei.js

Later today I will give Option 2 a try. I have already sync’d the code on github, now this is a perfect time for me to learn how to fork the code so I don’t affect the deployed code.

1 Like

I created a branch of the code on github and have Replit implement youtubei.js instead of youtube-transcript-api. However, after many iterations, Replit cannot integrate this solution well.

By googling I found another potential options:

Screenshot 2025-01-17 at 2.36.11 PM

Someone suggested using invideoiq/video-transcript-scraper

That’s something I will try later.

What I’ve learned so far:

  • Just because something works in development doesn’t mean it works in production. Need to test more.
  • From this new discovery, I believe OpenAI whisper would have avoided this issue.

I can easily go back and change the youtube-transcript-api to OpenAI Whisper and have the speed to deploy and have the code fully up and running. The issue is that I can’t translate videos that are over 20min long.

If I were to treat this as a real project, I would launch the app using Whisper to get it to customer’s hands ASAP to get feedback, while trying to get the proper solution fix in parallel.

Since I am my own customer, the objective is to learn, I am going to keep going and try the correct solution.

More to come.

2 Likes

Conclusion:

After a few hours of testing, I don’t think there is a free way to fix the deployment issue. One way I am going to try tomorrow is to have my friend deploy this code on his digital ocean server to see if the api will work.

I was able to download the code and have it run on my local machine to do what I needed.

Also when I was showing my friend the app, my friend told me that he found this idea familar and showed me a few links:

All these threads were created way earlier. This really shows that it’s silly to try to hide your ideas because it has probably been done before. The best way to learn is to share.

I am no longer going to be working on this project. Tomorrow I will start a new project and will document and share the process.

If you have been following along, thank you!

Did you fix the deployment issue? It’s not working for me as well.

I didn’t fix the deployment issue. I ended up converting it into a local app using PyInstall. That way i can run the app locally.

I asked my friend to deploy this app on his Digital Ocean to see if it’s a public IP issue but my friend hasn’t got around to do it.

Did you have the agent convert it? I’m thinking of doing this as well. What setup did you use locally?

This is awesome. I love how you’re sharing the whole journey. Thanks for posting it!

1 Like

Thanks for sharing your journey. The concept of minimum viable product is so powerful. I like hearing about your approach.

Does the app you made with PyInstall still work locally? If not, I found an edtech solution today called Diffit that converts YouTube videos into transcripts (as just one of many features). Here’s a video of how it works: https://youtu.be/MUzokR7_ekc?si=qpeGa9dMPo9pEXHk

I registered there today at https://app.diffit.me/ and they gave me two months free with no credit card required.

1 Like

Love your detailed approach, some of the issues you came up against are similar to mine , although for different reasons.

I look forward to seeing what you do next

1 Like

Thank you @CulturaFutura @sallycole @whereisthestart for your kind words! That’s my goal sharing this process! Hearing your feedback made my day!

1 Like

After deploying locally it works great! I was able to launch local app and fetch the transcript as intended! Thank you for your suggestions I will look into it for my future projects!

So awesome, thanks for documenting your journey! I’m definitely going to use some of this. Sounds like you’ve picked up a lot of the dev knowledge along the way!

1 Like