Today proved one thing: most of us donโt build software โ we duct-tape services together.
On 18th November 2025, a lot of us had the same morning:
- X (Twitter) wasnโt loading.
- ChatGPT was throwing errors.
- Spotify, Canva, gaming platforms, government portals โ all shaky or down. Reuters+1
Developers scrambled to check their servers, only to realize: our code was fine.
The problem was further upstream, inside a company most normal users have never heard of: Cloudflare.
This outage was the perfect live demo of an uncomfortable truth:
We donโt really โbuildโ software anymore. We assemble stacks of third-party services, wrap them in code, and hope the duct tape holds.
Letโs unpack what actually happened, and what it says about how we build.
Soโฆ what went wrong at Cloudflare?
Cloudflare later explained the root cause in a postmortem and public statements:
- They maintain an automatically generated configuration file that helps manage โthreat trafficโ (bot mitigation / security filtering). The Cloudflare Blog+1
- Over time, this file grew far beyond its expected size.
- A latent bug โ a bug that only shows up under specific conditions โ existed in the software that reads that file.
- On 18th November, a routine configuration change hit that edge case: the bloated config triggered that bug, causing the traffic-handling service to crash repeatedly. Financial Times+1
Because this service sits in the core path of Cloudflareโs network, the crashes produced:
- HTTP 500 errors
- Timeouts
- Large parts of the web effectively going dark for a few hours The Verge+1
Cloudflare stressed that:
- Thereโs no evidence of a cyberattack
- It was a software + configuration issue in their own systems ABC+1
In very simple language:
One auto-generated file became too big, hit a hidden bug, crashed a critical service, and because that service sits in front of a huge portion of the internet, the whole world felt it.
What is Cloudflare to the average app?
For non-technical readers: Cloudflare is like a traffic cop + bodyguard + highway for your website.
A lot of modern apps use Cloudflare to:
- Speed up content delivery (CDN)
- Protect against attacks (DDoS, WAF)
- Filter bots and suspicious traffic
- Provide DNS and other network plumbing
Roughly one in five websites use Cloudflare in some way. AP News+1
So if your app runs behind Cloudflare and Cloudflare canโt route traffic properly, it doesnโt matter if your code, database, and servers are perfect โ users will still see error pages.
Thatโs exactly what happened.
The uncomfortable mirror: weโre shipping duct tape
Look at a typical โmodernโ SaaS or startup stack:
- DNS / proxy / security: Cloudflare
- Hosting: Vercel, Render, Netlify, AWS, GCP, Azure
- Authentication: Firebase, Auth0, Cognito, โSign in with Google/Appleโ
- Payments: Stripe, PayPal, M-Pesa gateways, Flutterwave, etc.
- Email & notifications: SendGrid, Mailgun, Twilio, WhatsApp APIs
- File storage & media: S3, Cloudinary, Supabase
- Analytics & tracking: 3โ10 different scripts and SDKs
Our own code โ the part weโre proud of โ is often just glue that ties all of this together.
When everything works, that glue feels like a โproductโ.
When one critical service fails, you suddenly see how much of your app is just duct tape between other peopleโs systems.
The Cloudflare incident exposed that:
- Tons of products had no plan for โWhat if Cloudflare is down?โ
- For many businesses, Cloudflare might as well be part of their backend, even though they donโt control it.
- Users donโt care if itโs your bug or Cloudflareโs bug; they just see your app as unreliable.
Single points of failure are everywhere
Cloudflare isnโt the villain here. Honestly, their engineering team is doing brutally hard work at insane scale โ and they published details, owned the mistake, and are rolling out fixes. The Cloudflare Blog+1
The deeper problem is how we architect our systems:
- We centralize huge parts of the internet on a few giants (Cloudflare, AWS, Azure, Stripe, etc.).
- We treat them as if they are infallible, and design our products like theyโll never go down.
- We rarely ask, โIf this service fails, what can my app still do?โ
Thatโs how a single oversized config file in one companyโs infrastructure turned into:
- Broken transit sites
- Broken banking/finance tools
- Broken productivity apps
- Broken AI tools and messaging platforms AP News+1
Not because everyone wrote bad code, but because everyone anchored on the same critical dependency.
What โactually building softwareโ would look like
Weโre not going back to the 90s and self-hosting everything on bare metal. Using third-party infrastructure is smart and necessary.
But we can change how we depend on it.
Here are some practical shifts that move us from duct tape to engineering:
1. Design for failure, not just success
Ask explicitly:
- โWhat happens if Cloudflare is down?โ
- โWhat happens if Stripe is down?โ
- โWhat happens if our auth provider is down?โ
Then design behaviours like:
- A degraded mode where non-critical features that depend on a broken service are temporarily disabled, not crashing the whole app.
- Clear, friendly error messages that say, โPayments are currently unavailable. You can still do X and Y; weโll notify you when Z is back.โ
2. Keep something static and independent
For many businesses:
- Even when the backend is down, people should at least see:
- A simple marketing site
- Contact info
- A status update
You can:
- Host a status page or a minimal static site on a different provider or even a separate domain.
- Use that to communicate during incidents: whatโs down, what still works, and rough timelines.
3. Use timeouts, not blind trust
When we integrate APIs, we often code like this:
โCall service. Wait forever. If it fails, crash the whole page.โ
Instead:
- Set sensible timeouts for each external call.
- Use circuit breakers: if a service is failing repeatedly, automatically stop calling it for a while and show a fallback.
This is boring work. It doesnโt show up nicely in screenshots. But when things break, itโs the difference between:
- โEverything is deadโ vs
- โSome features are temporarily limited, but you can still use most of the app.โ
4. Map your dependencies
Sit with your team and draw a very honest diagram:
- Core app
- Every external service: DNS, CDN, auth, payments, email, logging, analytics, etc.
- For each, ask:
- If this fails totally, what breaks?
- What can we keep working?
- How do we tell users whatโs going on?
Even this basic exercise can reshape your roadmap.
So what should we take away from this?
The Cloudflare outage wasnโt just โsomeone elseโs bugโ.
It was a mirror.
It showed us:
- How dependent we are on a handful of infrastructure providers
- How thin our own โsoftwareโ sometimes is, once you subtract all the external services
- How few of us design for the day the duct tape peels off
Weโre still going to use Cloudflare. And Stripe. And Firebase. And everything else. Thatโs fine.
But maybe, after this, weโll:
- Build just a bit more resilience into our systems
- Think a bit more about failure modes
- Spend one sprint not shipping yet another feature, but hardening the foundations
Because yesterday proved one thing very clearly:
Most of us donโt really build the internet.
We stitch it together. The least we can do is make sure the stitching doesnโt explode the moment one thread snaps.









