May 2026 Outage: What Happened & What's Next

To our Dripos Community,

Beginning late Friday night, Dripos experienced a prolonged service outage that impacted POS connectivity, payment processing, and overall platform stability. Service was restored at 2pm ET Saturday, with lingering impacts through Monday morning as devices updated.

The initial trigger stemmed from AWS hardware failures that cascaded into instability in our Redis cluster, the component that powers real-time communication across the platform. Our own systems ultimately did not perform to the standard we expect when hit with these failures. That is a mistake that we must own.

Most importantly, portions of our offline system did not activate and recover as designed. At our scale, we owe you a platform that can continue operating reliably, even during broader infrastructure failures. This weekend, we fell short of that expectation, and for that we are deeply sorry.

Below is a detailed breakdown of what happened, how our teams responded, and the steps we are taking to minimize the risk of this level of disruption happening again.

Additionally, view our initial response Saturday evening here with further details on compensation below.

Saturday Processing Fees: These have been refunded in their entirety as of this morning. For reference, standard rates of 2.6% + 15c (card present) and 2.6% + 30c (card not present) are 100% reimbursed. These funds have been added to your Dripos balance and will be included in the next payout with your sales. You will see a line item in your payout on the Dripos Dashboard for this amount to account for it.
Subscription Core Fee: This reimbursement is in progress and may take 4-7 business days to process. Funds will be refunded to your card on file.
Failed Offline Transactions: Our team is filtering through these requests and will be sending personalized follow ups as they are processed throughout the week. These will either be processed via a mailed check or a custom Dripos refund transaction inside your account.

Incident Timeline
‍‍

Friday Afternoon & Evening
AWS experienced a thermal event in the US-East region (link), causing physical hardware to overheat in the data center where our primary infrastructure runs. This created cascading instability in Redis, a critical component that powers real-time communication across the Dripos platform.

Early Saturday Morning

12:15 AM ET: We received the first reports of intermittent card reader connectivity issues and POS instability.
1:15 AM ET: Our engineering team formally opened an incident response and began an active investigation.
1 :30 AM - 4:30 AM ET: We deployed multiple mitigation efforts, including reducing socket server load (essentially the tool that helps our platform communicate to itself), disabling non-essential background processing, and stabilizing infrastructure services. In the moment, our on-call team believed all issues were resolved. However, the underlying instability persisted. In hindsight we should have escalated to a full outage sooner, and alerted our entire team internally going into Mother’s Day weekend.
‍

Escalation to Full Outage:

7:15 AM ET: The incident escalated to a widespread outage as shops across the country opened for the morning. Readers failed to connect and POS systems were repeatedly disconnecting.
7:30 AM ET: We initiated our first Status Update via status.dripos.com with information on the outage. Throughout the day we posted an additional ten updates here as the day unfolded and our team worked through resolutions.
8:00 AM - 10:00 AM ET: Our team worked with our infrastructure specialists to isolate the root cause. We made the decision to migrate our entire Redis infrastructure to a new AWS region to bypass the failing hardware. During this window, we recommended that shops force their tablets into Offline Mode.
‍

Recovery and Restoration:

10:15 AM ET: We released a critical POS update (v1.13.86-i) to force stability.
10:30 AM - 1:00 PM ET: While many shops continued offline operation via cash and manual card entry, we identified a significant issue where some users could not successfully push their POS into offline mode. We then began manually correcting dashboard settings for those impacted and disabling mobile ordering to reduce strain as those settings were not accessible during the outage. Our engineering team continued making progress towards a final resolution.
1:00 PM - 2:00 PM ET: To ultimately recover our server instances, we deployed new request-handling logic and brought up additional servers that resolved the failing requests.
2:00 PM - 4:45 PM ET: Primary platform recovery was fully deployed. Reader issues were resolved, and all major systems were safely re-enabled.
‍

Sunday Morning - Monday Morning:

A final lingering impact required manual restarts on some POS systems and card readers to pull the latest infrastructure changes. Some shops also needed to update to the latest V3 card reader. By end of day Monday, all systems were fully stable.

Root Cause Summary
‍

The outage was caused by instability within our Redis cluster running on AWS, which handles real-time socket communication. When the physical hardware failed, it created a feedback loop: our systems attempted to sync at high frequencies under degraded conditions, overwhelming the servers and preventing the POS from maintaining a stable connection.

A socket is essentially an endpoint, think of it like a phone jack. When two programs want to communicate, each opens a socket, they connect, and data flows between them. One side typically listens and the other initiates.

What went wrong with our fallback: Our offline mode is designed to be a safety net. When working as expected, offline mode allows offline swipe, tap, and dip payments via our updated V3 card reader connection. However, the specific way our servers were struggling prevented many tablets from successfully transitioning into offline mode during the outage. We recognize that a fallback is only useful if it works when the main system is down.

What We're Doing Next
‍‍‍

Beyond fixing the symptoms, we are also changing the architecture itself. Our immediate and long-term roadmap now centers around:

Strengthened Offline Reliability: We are re-engineering portions of our offline infrastructure to ensure POS systems can reliably transition into Offline Mode.
Monitoring & Communication: We are expanding our automated recovery systems and strengthening our internal incident response protocols to provide faster, more transparent communication during a crisis. We are introducing quarterly disaster recovery practice, including offline-mode-only operation tests and all hands on deck drills. Additionally, we recommend subscribing to status.dripos.com, if you are not already.
Updated Support Platform: We are transitioning to Zendesk in the coming months to replace our current phone support platform, Quo, which was overwhelmed by Saturday’s volume preventing our team from answering many incoming calls. This upgrade will expand our capacity to handle high-traffic events and ensure our support lines remain reliable during critical situations.
Regional Redundancy: Over the coming months, we are migrating critical real-time systems to a multi-region architecture so a single data center failure cannot take down the platform. For example, we will have fallback systems to switch our AWS instances from their US-East data centers to US-West in the case of an AWS outage.
Reduced Socket Dependency: We are reducing our reliance on high-frequency socket operations for non-critical tasks to lower infrastructure strain.

Checklist to Ensure Stability in your Shop
‍‍

To ensure your POS is running on our most stable architecture, please verify the following settings on your tablets:
‍

1. Transition to V3 Reader (for iPad users only)

Disable V1: Go to POS Settings → Advanced Settings → Toggle “Enable V1 Reader” OFF.
Migrate: Go to Card Reader Settings on the POS → Click the pencil icon next to your reader → Select “Migrate to V3.”
Restart: Restart both your POS app and card reader.
Reconnect: Once restarted, select your host device on the reader to pair.
V3 Reader allows for offline card present payments, has an improved UI, and processes transactions at faster speeds.

2. Reset Offline Status

Go to POS Settings → Advanced Settings → Toggle Off “Enter Offline Mode.”

3. Standardize Fire Ticket Settings

Go to POS Settings → Tickets & KDS → Fire Ticket Settings → Select “Checkout is Complete (Default).”
Fire Ticket when Added to Cart will be fully restored this week.

Additional Note: Supply Chain & Inventory Update

We have disabled real-time supply chain tracking during checkout to prioritize transaction stability and speed. This will remain in effect for the next week as we refactor the feature and prioritize stability work.

Closing Notes
‍‍‍

We understand that an apology alone by no means makes up for the disruption this caused to your businesses, your staff, or your customers. Many of you were forced to troubleshoot operational issues in real time during Mother’s Day weekend, and we recognize the stress, loss, and frustration that created. It is our responsibility as a platform to own these mistakes.

Over the past 48 hours, our teams have worked around the clock not only to restore stability, but to identify the underlying failures that allowed this incident to escalate as it did. This incident is driving critical upgrades to our infrastructure, offline systems, and operational readiness going forward.

Our responsibility is not just to recover quickly when issues arise, but to build systems resilient enough for your shops to continue operating reliably even during broader infrastructure failures. This incident made it very clear that we still have work to do, and that work is already underway.
‍
If you have any lingering issues or pending transactions that failed to process, please reach out to support@dripos.com. Our team is actively prioritizing all follow-up cases related to the incident and will work directly with you to resolve them as quickly as possible.

Thank you for your patience, your feedback, and for continuing to hold us to a higher standard.

Sincerely,
Jack Pawlik
CEO, Dripos

Incident Timeline
‍‍