How I got GDPR compliance and how self-hosting helped me with a near million-line (still 30% left) production app

Why I Moved a 950,000-Line Flask App from Replit Deployments to Hetzner

Hey Replit community,

I want to share my journey — not to criticize Replit (I still use it every day for development and love the many things in it, but to help others understand when self-hosting makes sense, and what workarounds I tried before making the switch.

What I’am Building

I build Bidmio — a full ERP/CRM platform for construction businesses, with modules for projects, invoicing, stock management, HR/payroll, accounting, e-invoicing (Peppol BIS), bank payments (ISO 20022), and more. It’s a Flask app with ~950,000 lines of code across Python, HTML, JavaScript, and CSS. 60+ database models, 40+ blueprints, 13,500+ translation keys in 5 languages.

This isn’t a weekend project. Real businesses use it every day.

The Cold Start Problem

Replit Deployments create a new container for every deploy. For a small app, that’s fine — maybe 5-10 seconds. For my app:

  • 196 seconds (3+ minutes) from deploy to first page load

  • Gunicorn starts → Python parses ~30,000 lines of data files at import time → loads 60+ SQLAlchemy models → registers 40+ blueprints → runs database migrations and seeding

  • During all of this, users see a loading screen

Every. Single. Deploy. 2-3 Per Day.

And it’s not just deploys — if the container goes idle and Replit recycles it, the next visitor triggers the same 3-minute cold start.

The Workarounds I Built (Before Giving Up)

I didn’t leave immediately. I spent weeks building workarounds:

  1. Custom loading page with status polling — A /startup-status endpoint that returns 503 until the app is ready, with a frontend that polls and shows a loading animation. Users at least saw something instead of a blank error.

  2. Lazy imports — Moved ~30,000 lines of Python data (help center articles, glossary terms) from top-level imports to lazy loading. Cut startup from 196s to about 45s.

  3. Background initialization thread — Database migrations, seeding, and health checks run in a background thread so gunicorn can at least accept connections while the app bootstraps.

  4. Retry logic with exponential backoff — Our database (Neon, US region) sometimes wasn’t reachable when the container started. If the background init failed, it failed permanently. I had to add automatic retries.

  5. Database connection timeouts — Added explicit timeouts so a single Neon cold start wouldn’t block all 32 gunicorn threads for 100+ seconds.

  6. Aggressive query optimization — A context processor running on every page was taking 80-100 seconds due to N+1 queries hitting a cold Neon instance. I had to rewrite it with eager loading and caching.

All of this engineering effort was just to make deployments survivable — not even fast.

The Move to Hetzner

On Hetzner (Ca €30 for server /month VPS + +ca €7 for Object Storage), the deployment model is fundamentally different:



Replit Deployments Hetzner VPS
Deploy process Destroy old container, build new one from scratch Upload new code, restart app process
Cold start 45-196 seconds (after our optimizations) 0 — server is always running
Deploy downtime 1-3 minutes of loading screen ~5 seconds (process restart)
Idle behavior Container may be recycled → cold start on next visit Always on, always warm
DB latency Neon US (cross-Atlantic for EU users) Neon EU (same region)
Monthly cost Usage-based (can spike) Fixed, predictable
Setup effort Managed (easy) More setup, full control

The key difference: Hetzner uploads new code to a running server. When a user loads a page after deploy, the new code is already there and running. No container build, no cold start, no loading page.

Database: Neon US → Neon EU

I also migrated our primary default Replit database from Neon US to Neon EU. For European users, this cut database latency dramatically — queries stay within Europe instead of crossing the Atlantic.

But I kept a hybrid setup:

  • Neon EU — Primary database for European users (majority of our customer base)

  • Neon US + original Replit deployment — Still running for certain domains serving users outside Europe

This way, everyone gets the best latency for their location.

When Replit Deployments Are Perfect

I still think Replit Deployments are great for:

  • Smaller apps (under ~50,000 lines)

  • Prototypes and MVPs

  • Apps where a few seconds of cold start doesn’t matter

  • Teams that want zero DevOps overhead

But once your codebase grows to hundreds of thousands of lines and real users depend on instant availability, you’ll hit the same wall I did.

What I Still Use Replit For

I haven’t (YET :D) left Replit — I still use:

  • Replit IDE for all development

  • Replit AI Agent for building features

  • Replit Deployments for non-EU domains

  • The development workflow is still 100% Replit

Replit is incredible for building. For deploying a near-million-line production app to European users, I needed something different.

Hope this helps anyone facing similar scaling decisions.

6 Likes

Ha, the timing on this is wild. Didn’t coordinate that at all.

The parallels are real though. I was on the default Neon DB almost went to a self hosted database but with me updating gigabytes of data every few days it just didn’t make sense — same shared tenant latency problems on complex queries against a large dataset. Built months of caching infrastructure to compensate, deleted the entire thing the day I moved to self-hosted PostgreSQL with 64GB of RAM where the full dataset fits in memory. Your 196s-to-45s lazy import work is the same pattern — solid engineering solving a problem that comes with scale, not bad code.

That’s really what it comes down to. Replit and managed services like Neon are great until your dataset or codebase hits a size where shared infrastructure can’t keep up. For me it was 1.3 million diamond records and search queries with dozens of filter combinations. For you it was 950K lines and 60+ models with a 3-minute cold start. Different apps, same inflection point.

Still on Replit for app hosting — they’re actually working with me on the plan situation which I appreciate, the actual hosting won’t cost me more than the $20 plan anyway. Everything else is self-managed: database, AI compute, storage, all running on my own hardware behind Tailscale and Cloudflare Tunnel.

Good to see someone else talking about this honestly. 950K lines of production ERP serving real businesses is serious work.

1 Like

Hi there :waving_hand: , I asked the agent to summarize the entire process, so please read the details below. It took me about 8 hours to set everything up, and the Replit agent handled almost all of it—I just reviewed its suggestions and confirmed them after checking and researching.

Our Setup wroted by agent:

What We’re Running on Hetzner

  • VPS: Hetzner Cloud (cheapest tier works fine — ~€30/month)

  • OS: Ubuntu 22.04

  • App Server: Gunicorn (gthread workers, preload_app=True for fast restarts)

  • Reverse Proxy: Caddy (automatic HTTPS, static file serving, zero-downtime buffering)

  • Background Tasks: Celery + Redis

  • Database: Neon PostgreSQL (EU region — Frankfurt)

  • CI/CD: GitHub Actions → SSH → deploy script

  • Storage: S3-compatible object storage for file uploads

The Migration — It Was Surprisingly Straightforward

The actual migration was done through Replit Agent. I gave the Agent SSH access to the Hetzner server via the Hetzner API, and we worked through the entire setup conversationally — the Agent wrote the deployment scripts, systemd services, Caddy config, everything. The process went like this:

Step 1: Set up the VPS (30 min)

  • Created a Hetzner Cloud VPS (Falkenstein, Germany)

  • Ubuntu 22.04, added SSH key

  • Installed Python, pip, Redis, Caddy via apt

Step 2: Push code via GitHub (already had this)

  • Our code was already on GitHub (Replit syncs to GitHub)

  • Created a GitHub Actions workflow: on push to main → SSH into Hetzner → pull latest code → run deploy script

Step 3: The deploy script (deploy/deploy.sh)
This is the heart of it. On every deploy, it:

  1. Pulls latest code from origin/main

  2. Installs/updates Python dependencies (pip install -e ".[prod]")

  3. Runs database migrations (python manage.py migrate)

  4. Runs data seeding (python manage.py seed all)

  5. Safety check: Boots a temporary test Gunicorn on port 5099, hits /health — if it fails, automatic rollback to previous commit

  6. Restarts the real services: bidmio-web, bidmio-celery, bidmio-celery-beat

  7. Verifies production health on port 5000

  8. Updates Caddy with any new domains from the database

Total deploy time: ~15-20 seconds. Compare that to 3+ minutes of cold start on Replit.

Step 4: Caddy as reverse proxy
Caddy is amazing — automatic HTTPS with Let’s Encrypt, zero config. Our Caddyfile is actually generated dynamically from the database. We have a script (deploy/get-db-domains.py) that queries which domains should point to Hetzner (hosting_server = 'vps' in our domain table), and update-caddy.sh regenerates the Caddy config and reloads it. New customer domain? Just add it to the database and deploy.

The killer feature: Caddy has lb_try_duration 15s — during the ~5 seconds when Gunicorn is restarting, Caddy buffers incoming requests and retries them. Users don’t even notice the deploy. Zero downtime.

Step 5: Celery for background tasks
On Replit, we had to use hacky in-process background threads with file-based locks (because there’s no Redis available). On Hetzner, we set up proper Celery + Redis — each background task (email sending, notification checks, data sync) runs as a separate Celery task with proper scheduling via Celery Beat.

The Database Migration (Neon US → Neon EU)

This was the other big win. Steps:

  1. Created a new Neon project in the EU region (Frankfurt)

  2. Used pg_dump / pg_restore to copy the data

  3. Updated the DATABASE_URL in Hetzner’s .env to point to Neon EU

  4. Kept Neon US running for the Replit deployment (still serves certain domains)

Result: Database queries that used to cross the Atlantic (US East ↔ Europe) now stay within Germany. Latency dropped dramatically.

The Hybrid Architecture (We Kept Both!)

We didn’t fully abandon Replit Deployments. We run a hybrid setup:

Domain Server Database Purpose
app.bidmio. dk Hetzner (Germany) Neon EU Danish users
app.bidmio. cz Hetzner (Germany) Neon EU Czechusers
app.bidmio. de Hetzner (Germany) Neon EU German users
Other domains Replit Neon US Non-EU users

The routing is controlled by a hosting_server column in our domain_language_defaults database table. Each domain is either 'vps' or 'replit'. The Hetzner deploy script reads this to generate the Caddy config. It’s all database-driven — no hardcoded domain lists.

What Problems We Had to Solve Post-Migration

It wasn’t 100% smooth. Here’s what we hit and fixed:

  1. SSE (Server-Sent Events) saturating workers — Our notification stream held Gunicorn threads open for hours. With only 8 threads, the whole server ground to a halt. Had to add timeouts and optimize the SSE endpoint.

  2. Version cache busting — After deploying new code, users with cached pages didn’t know. Built a version checker that polls /api/version every 5 minutes and shows a “New version available — Refresh” banner.

  3. Sentry environment tagging — Had to add SENTRY_ENVIRONMENT=hetzner-production so we could tell Hetzner errors apart from Replit errors in our error tracking.

  4. S3 storage migration — Moved 648 uploaded files (screenshots, documents) from Replit Object Storage to S3-compatible storage that both deployments can access.

  5. Subscription plan resurrection — Our deploy script re-seeded deleted subscription plans on every deploy. Had to unify the seeding logic to check a “deleted plans” table.

What We Still Love About Replit

  • Development: 100% done in Replit. The IDE + AI Agent is unbeatable for building features. We’ve merged 30+ tasks through the Agent on this project alone.

  • Quick iteration: For development and testing, Replit’s workflow system is instant.

  • Non-EU domains: Still deployed on Replit, still works fine for those users.

Key Takeaway

The migration wasn’t a dramatic “burn it all down” moment. It was: push code to GitHub → GitHub Action SSHs into Hetzner → runs a deploy script → done. The Replit Agent actually helped us set the whole thing up through chat. The hardest part was the post-migration fixes (SSE, caching, storage), not the migration itself.

3 Likes

I stayed on replit too for deployments, its actually a good easy setup and the front end isn’t that expensive to run when everything else is off the system, its the best of both worlds I think this will be the natural progression of most of the apps until we sell them to someone most likely.

The never-ending problem is that Replit agent cost is too high if you work on the system 10-14 hours per day, but again it’s still cheaper and faster than in the old days :zipper_mouth_face:

With the Agent 4 updates allowing me to run about 10 agents independently, my daily cost exceeds $500-600. :confused:

1 Like

I had to switch to using claude max plan directly, I’ve been working for days straight with almost no timeouts, if you get the $200/month one its essentially infinite coding, i have yet to hit my limit every 5/ hours it resets and they max you out weekly but i never hit my weekly limit and i have been hitting it hard. I was spending thousands also

@RocketMan You nailed the summary — that’s basically it. Steps 1-7 are spot on.

But here’s the twist that might interest you for your compilation: I didn’t do any of those steps manually. The Replit Agent did it all.

I gave the Agent SSH access to the Hetzner server (via the Hetzner Cloud API), and we literally worked through the entire setup conversationally in chat. The Agent:

  • Wrote the deploy.sh script

  • Created the systemd service files (bidmio-web.service, bidmio-celery.service, bidmio-celery-beat.service)

  • Configured Caddy with the dynamic domain setup

  • Set up the GitHub Actions CI/CD workflow

  • Wrote the health check and automatic rollback logic

  • Figured out the SSE worker saturation issue and fixed it

So your wife’s point :smiley: about networking, DNS, SSL, and reverse proxy — those are exactly the things that would have taken me days to figure out on my own. But Caddy eliminates most of that pain:

  • SSL certificates? Caddy handles Let’s Encrypt automatically. Zero config. You just put the domain name in the Caddyfile and it gets a cert.

  • Reverse proxy? Like 3 lines in Caddy. Plus the lb_try_duration 15s trick for zero-downtime deploys.

  • DNS? Just point your domain’s A record to the Hetzner IP. That’s it.

  • Networking? Caddy listens on 443, proxies to Gunicorn on localhost:5000. Done.

Nginx is powerful but Caddy is just simpler for this use case. No messing with sites-available, no manual certbot renewals, no config reload headaches.

To your question about the Replit → GitHub → VS Code setup — no, we stayed fully inside Replit for development. The flow is:

Replit (development)GitHub (sync)GitHub Actions (CI/CD)Hetzner (production)

We never touch VS Code. Replit is the IDE, GitHub is the bridge, and Hetzner is just where the production code lands. The Agent can also SSH into Hetzner for debugging when needed (with my permission — we actually added a guardrail rule that the Agent must ask before making any direct server changes).

The “untangling post-migration stuff” you mentioned — we had that too. The big ones were:

  1. SSE eating all Gunicorn threads — notification streams held connections open for hours, 8 threads total, server becomes unresponsive

  2. Users getting stale cached pages after deploy — had to build a version checker that shows a “New version available” banner

  3. Background tasks — on Replit we used hacky in-process threads with file-based locks. On Hetzner we switched to proper Celery + Redis. But the Agent handled the Celery setup and service files.

The key insight: the migration itself was easy—a few hours with the Agent. Post-migration fixes took a few days, also handled through the Agent. The real challenge of self-hosting isn’t setup, but what surfaces in the first week of production traffic—firewalls, backups, extra servers in other locations, etc.

Thanks for the important post, you’ve saved me a lot of time, I’m only up to 62,000 lines right now, but maybe I’ll catch up with you :slight_smile: