A few days ago I created this post https://discord.com/channels/564160730845151244/1072905050399191082/threads/1435303434495594698 where after the migration to 1.8.0 the database got deadlocked during the periodic backup.
I thought it was due to a very large audit table that wasn't cleaned up properly (the worker for that had errors at each attempt), and was 2.23 GB in size. During the Appwrite migration, a process crashed due to OOM.
So I supposed that was the problem and now also migrated the second application that I manage to 1.8.0 (which has much smaller db size). Suddenly user support requests started coming in that nothing was working anymore. Also here, the database got deadlocked with the same issue as in my first post.
I also remembered where i got the backup commands from which i'm using, it was this post here: https://appwrite.io/blog/post/how-to-back-up-your-appwrite-data, but something doesn't seem to be working like that anymore with 1.8.0.
Not sure what changed with this update, but all those months up to now, the backup script i was using was working fine. I would suggest to update that blog post, so people don't use the wrong commands if they implement a backup strategy (which everyone should of course).
See the logs of the current issue with this second application in the thread below... Attached the "old" (problematic) and fixed backup.sh scripts, which are run through a cronjob.
Some AI based investigation:
Technical Report: Production Outage Caused by mysqldump Database Deadlock in Appwrite 1.8.0
Environment:
- Appwrite Version: 1.8.0
- MariaDB Version: 10.11.11 (InnoDB)
- Total Database Size: 440.27 MB
- InnoDB Buffer Pool Size: 128 MB (undersized: 3.4x smaller than database)
- Audit Tables:
_2_audit(31.67 MB),_1_audit(10.66 MB),_console_audit(1.70 MB) - Backup Method: Hourly cron job using
mysqldump --all-databases
MariaDB Configuration (Relevant Timeouts):
innodb_lock_wait_timeout: 50 seconds
lock_wait_timeout: 86400 seconds (24 hours!)
wait_timeout: 28800 seconds (8 hours)
interactive_timeout: 28800 seconds (8 hours)
Timeline of Events:
Nov 6-9: Normal Operation
Backups completing successfully every 4 hours:
[2025-11-06 16:00:01] [INFO] Starting hourly database backup
[2025-11-06 16:00:13] [INFO] Database backup completed successfully (94M)
[2025-11-07 00:00:02] [INFO] Starting hourly database backup
[2025-11-07 00:00:14] [INFO] Database backup completed successfully (95M)
[2025-11-08 12:00:01] [INFO] Starting hourly database backup
[2025-11-08 12:00:14] [INFO] Database backup completed successfully (100M)
[2025-11-09 08:00:01] [INFO] Starting hourly database backup
[2025-11-09 08:00:15] [INFO] Database backup completed successfully (107M)
Normal completion time: 12-15 seconds
Nov 9, 12:00 - Critical Failure Event:
[2025-11-09 12:00:01] [INFO] Starting hourly database backup to /var/www/appwrite/backups/hourly/appwrite_db_2025-11-09_12-00-01.sql.gz
[2025-11-09 13:55:55] [INFO] Database backup completed successfully: /var/www/appwrite/backups/hourly/appwrite_db_2025-11-09_12-00-01.sql.gz
[2025-11-09 13:55:55] [INFO] Backup size: 4.0K
Critical Observations:
- Duration: 1 hour 55 minutes (normal: 12-15 seconds)
- File size: 20 bytes (normal: 95-107 MB)
- File is corrupt/incomplete
Backup File Evidence:
-rwxr-xr-x 1 <project> <project> 107M Nov 9 08:00 appwrite_db_2025-11-09_08-00-01.sql.gz ← Last successful
-rw-rw-r-- 1 <project> <project> 20 Nov 9 13:55 appwrite_db_2025-11-09_12-00-01.sql.gz ← FAILED/CORRUPT
Root Cause Analysis:
1. Problematic Backup Command:
The backup script was using this command:
docker compose exec -T mariadb sh -c 'exec mysqldump --all-databases --add-drop-database -u"$MYSQL_USER" -p"$MYSQL_PASSWORD"'
Critical Missing Flags:
- ❌ No
--single-transaction→ Forces table locks instead of consistent MVCC snapshot - ❌ No
--skip-lock-tables→ Explicitly requestsLOCK TABLESon all tables - ❌ No
timeout→ Can hang indefinitely (up to 24 hours due tolock_wait_timeout) - ❌ Includes audit tables → Adds 43+ MB of non-critical data
2. Lock Acquisition Behavior:
When mysqldump runs without --single-transaction, it issues:
LOCK TABLES `_1__metadata` READ /*!32311 LOCAL */,
`_1__metadata_perms` READ /*!32311 LOCAL */,
`_1_audit` READ /*!32311 LOCAL */,
`_1_database_1_collection_154` READ /*!32311 LOCAL */,
`_1_database_1_collection_155` READ /*!32311 LOCAL */,
... [repeating for ~400+ tables]
This command attempts to acquire metadata locks on ALL tables simultaneously.
3. Database State During Deadlock:
When we investigated, SHOW PROCESSLIST revealed:
Id User Host db Command Time State Info
--- ---- -------------- ------- ------- ----- ---------------------------------- ------------------------------------------------
74 user localhost appwrite Query 6904 Waiting for table metadata lock LOCK TABLES `_1__metadata` READ /*!32311 LOCAL */,`_1__metadata_perms` READ...
75 user localhost appwrite Query 6904 Waiting for table metadata lock LOCK TABLES `_1__metadata` READ /*!32311 LOCAL */,`_1__metadata_perms` READ...
40 user 172.19.0.19 appwrite Query 5540 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
52 user 172.19.0.14 appwrite Query 6895 Waiting for table metadata lock INSERT INTO `appwrite`.`_2_stats` (`_createdAt`, `_permissions`, `_uid`...
53 user 172.19.0.19 appwrite Query 5460 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
72 user 172.19.0.19 appwrite Query 5372 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
78 user 172.19.0.19 appwrite Query 5291 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
82 user 172.19.0.19 appwrite Query 5215 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
Id User Host db Command Time State Info
--- ---- -------------- ------- ------- ----- ---------------------------------- ------------------------------------------------
86 user 172.19.0.19 appwrite Query 5130 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
90 user 172.19.0.19 appwrite Query 5053 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
94 user 172.19.0.19 appwrite Query 4976 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
98 user 172.19.0.19 appwrite Query 4900 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
102 user 172.19.0.19 appwrite Query 4824 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
106 user 172.19.0.19 appwrite Query 4747 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
Id User Host db Command Time State Info
--- ---- -------------- ------- ------- ----- ---------------------------------- ------------------------------------------------
110 user 172.19.0.19 appwrite Query 4660 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
114 user 172.19.0.19 appwrite Query 4579 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
118 user 172.19.0.19 appwrite Query 4497 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
122 user 172.19.0.19 appwrite Query 4413 Waiting for table metadata lock SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
Total: 18+ queries stuck waiting for metadata locks for 4,000-6,900 seconds (1.1-1.9 hours)
Additional Context:
- 6 concurrent
mysqldumpprocesses were running (duplicate cron jobs) - All legitimate application queries were blocked behind the
LOCK TABLESrequests
4. Appwrite Application Layer Impact:
Starting at 12:01 (1 minute after backup started), Appwrite began logging database timeouts:
[Error] Timestamp: 2025-11-09T12:01:08+00:00
[Error] Method: PATCH
[Error] URL: /v1/databases/:databaseId/collections/:collectionId/documents/:documentId
[Error] Type: Utopia\Database\Exception\Timeout
[Error] Message: Query timed out
[Error] File: /usr/src/code/vendor/utopia-php/database/src/Database/Adapter/MariaDB.php
[Error] Line: 1778
[Error] Timestamp: 2025-11-09T12:01:38+00:00
[Error] Method: POST
[Error] URL: /v1/functions/:functionId/executions
[Error] Type: Utopia\Database\Exception\Timeout
[Error] Message: Query timed out
[Error] File: /usr/src/code/vendor/utopia-php/database/src/Database/Adapter/MariaDB.php
[Error] Line: 1778
[Error] Timestamp: 2025-11-09T12:02:25+00:00
[Error] Method: PATCH
[Error] URL: /v1/databases/:databaseId/collections/:collectionId/documents/:documentId
[Error] Type: Utopia\Database\Exception\Timeout
[Error] Message: Query timed out
[Error] File: /usr/src/code/vendor/utopia-php/database/src/Database/Adapter/MariaDB.php
[Error] Line: 1778
[... continuing every 30-60 seconds for nearly 2 hours]
5. Swoole Worker Pool Exhaustion:
By 13:50 (near the end of the hang), all Swoole workers were exhausted:
swoole_dispatch: Risky branch: did not find a idle worker, picking random worker 74
swoole_dispatch: Risky branch: did not find a idle worker, picking random worker 66
swoole_dispatch: Risky branch: did not find a idle worker, picking random worker 73
swoole_dispatch: Risky branch: did not find a idle worker, picking random worker 69
swoole_dispatch: Risky branch: did not find a idle worker, picking random worker 65
[... repeated 100+ times]
[Database] SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
[Database] Lost connection detected. Reconnecting...
[Database] SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
All Swoole workers were blocked waiting for database queries that were stuck behind metadata locks → Complete API unresponsiveness.
Why This Deadlock Occurs:
InnoDB Metadata Lock Contention Spiral:
mysqldumpwithout--single-transactionrequests metadata locks on ~400 tables- Active transactions/queries hold implicit metadata locks on tables they're accessing
- With undersized buffer pool (128MB vs 440MB database), queries experience disk I/O waits
- Disk-bound queries hold metadata locks longer than memory-bound queries would
mysqldumpcannot acquire all 400 locks simultaneously → enters wait state (up to 24 hours!)- All new queries queue behind the blocked
LOCK TABLESrequest (metadata locks are FIFO) - Swoole workers exhaust as they wait for their queries
- No new requests can be processed → complete deadlock
Why Buffer Pool Size Matters:
- 128 MB buffer can cache ~6,000-8,000 InnoDB pages (16KB each)
- 440 MB database requires ~28,000 pages
- Cache hit rate: ~28% when cold
- Every cache miss requires disk I/O (adding 5-15ms per page)
- Long-running queries → long-held metadata locks → deadlock vulnerability
The Fix:
Updated backup command:
timeout 1800 docker compose exec -T mariadb sh -c 'exec mysqldump \
--single-transaction \
--skip-lock-tables \
--ignore-table=appwrite._1_audit \
--ignore-table=appwrite._2_audit \
--ignore-table=appwrite._console_audit \
--all-databases \
--add-drop-database \
-u"$MYSQL_USER" -p"$MYSQL_PASSWORD"'
How Each Flag Prevents the Deadlock:
--single-transaction:- Uses
START TRANSACTION WITH CONSISTENT SNAPSHOT - Leverages InnoDB's MVCC (Multi-Version Concurrency Control)
- No table locks required - reads from consistent point-in-time snapshot
- Zero impact on concurrent writes/reads
- Uses
--skip-lock-tables:- Explicitly prevents
LOCK TABLEScommands - Safety net in case
--single-transactionisn't sufficient for some tables
- Explicitly prevents
--ignore-tablefor audit logs:- Excludes 43+ MB of non-critical audit data
- Audit logs can be truncated/cleaned without backup
- Reduces backup time by ~30-40%
timeout 1800(30 minutes):- Kills backup if it hangs
- Prevents indefinite metadata lock holds
- Maximum disruption window: 30 minutes instead of 24 hours
Recommended threads
- help with domain cookie!
I'm having trouble with cookie domain configuration for cross-subdomain authentication. The session cookies are being scoped to the Appwrite endpoint domain ins...
- Message: Invalid document structure: Unk...
Hi, I Restored my appwrite database and im facing this error.. im not able to do anything ``` [Error] Timestamp: 2025-11-09T11:59:47+00:00 [Error] Method: POS...
- Relation not showing collections list
When i'm trying to create new relationship attribute it is showing 0 collections list. I'm on version 1.7.4