Why I Moved a 950,000-Line Flask App from Replit Deployments to Hetzner
Hey Replit community,
I want to share my journey — not to criticize Replit (I still use it every day for development and love the many things in it, but to help others understand when self-hosting makes sense, and what workarounds I tried before making the switch.
What I’am Building
I build Bidmio — a full ERP/CRM platform for construction businesses, with modules for projects, invoicing, stock management, HR/payroll, accounting, e-invoicing (Peppol BIS), bank payments (ISO 20022), and more. It’s a Flask app with ~950,000 lines of code across Python, HTML, JavaScript, and CSS. 60+ database models, 40+ blueprints, 13,500+ translation keys in 5 languages.
This isn’t a weekend project. Real businesses use it every day.
The Cold Start Problem
Replit Deployments create a new container for every deploy. For a small app, that’s fine — maybe 5-10 seconds. For my app:
-
196 seconds (3+ minutes) from deploy to first page load
-
Gunicorn starts → Python parses ~30,000 lines of data files at import time → loads 60+ SQLAlchemy models → registers 40+ blueprints → runs database migrations and seeding
-
During all of this, users see a loading screen
Every. Single. Deploy. 2-3 Per Day.
And it’s not just deploys — if the container goes idle and Replit recycles it, the next visitor triggers the same 3-minute cold start.
The Workarounds I Built (Before Giving Up)
I didn’t leave immediately. I spent weeks building workarounds:
-
Custom loading page with status polling — A
/startup-statusendpoint that returns 503 until the app is ready, with a frontend that polls and shows a loading animation. Users at least saw something instead of a blank error. -
Lazy imports — Moved ~30,000 lines of Python data (help center articles, glossary terms) from top-level imports to lazy loading. Cut startup from 196s to about 45s.
-
Background initialization thread — Database migrations, seeding, and health checks run in a background thread so gunicorn can at least accept connections while the app bootstraps.
-
Retry logic with exponential backoff — Our database (Neon, US region) sometimes wasn’t reachable when the container started. If the background init failed, it failed permanently. I had to add automatic retries.
-
Database connection timeouts — Added explicit timeouts so a single Neon cold start wouldn’t block all 32 gunicorn threads for 100+ seconds.
-
Aggressive query optimization — A context processor running on every page was taking 80-100 seconds due to N+1 queries hitting a cold Neon instance. I had to rewrite it with eager loading and caching.
All of this engineering effort was just to make deployments survivable — not even fast.
The Move to Hetzner
On Hetzner (Ca €30 for server /month VPS + +ca €7 for Object Storage), the deployment model is fundamentally different:
| Replit Deployments | Hetzner VPS | |
|---|---|---|
| Deploy process | Destroy old container, build new one from scratch | Upload new code, restart app process |
| Cold start | 45-196 seconds (after our optimizations) | 0 — server is always running |
| Deploy downtime | 1-3 minutes of loading screen | ~5 seconds (process restart) |
| Idle behavior | Container may be recycled → cold start on next visit | Always on, always warm |
| DB latency | Neon US (cross-Atlantic for EU users) | Neon EU (same region) |
| Monthly cost | Usage-based (can spike) | Fixed, predictable |
| Setup effort | Managed (easy) | More setup, full control |
The key difference: Hetzner uploads new code to a running server. When a user loads a page after deploy, the new code is already there and running. No container build, no cold start, no loading page.
Database: Neon US → Neon EU
I also migrated our primary default Replit database from Neon US to Neon EU. For European users, this cut database latency dramatically — queries stay within Europe instead of crossing the Atlantic.
But I kept a hybrid setup:
-
Neon EU — Primary database for European users (majority of our customer base)
-
Neon US + original Replit deployment — Still running for certain domains serving users outside Europe
This way, everyone gets the best latency for their location.
When Replit Deployments Are Perfect
I still think Replit Deployments are great for:
-
Smaller apps (under ~50,000 lines)
-
Prototypes and MVPs
-
Apps where a few seconds of cold start doesn’t matter
-
Teams that want zero DevOps overhead
But once your codebase grows to hundreds of thousands of lines and real users depend on instant availability, you’ll hit the same wall I did.
What I Still Use Replit For
I haven’t (YET :D) left Replit — I still use:
-
Replit IDE for all development
-
Replit AI Agent for building features
-
Replit Deployments for non-EU domains
-
The development workflow is still 100% Replit
Replit is incredible for building. For deploying a near-million-line production app to European users, I needed something different.
Hope this helps anyone facing similar scaling decisions.

