Skip to content
Back

Second application got bricked after 1.7.4 > 1.8.0 migration at mysqldump backup

  • 0
  • Self Hosted
Ludwig
9 Nov, 2025, 14:25

A few days ago I created this post https://discord.com/channels/564160730845151244/1072905050399191082/threads/1435303434495594698 where after the migration to 1.8.0 the database got deadlocked during the periodic backup.

I thought it was due to a very large audit table that wasn't cleaned up properly (the worker for that had errors at each attempt), and was 2.23 GB in size. During the Appwrite migration, a process crashed due to OOM.

So I supposed that was the problem and now also migrated the second application that I manage to 1.8.0 (which has much smaller db size). Suddenly user support requests started coming in that nothing was working anymore. Also here, the database got deadlocked with the same issue as in my first post.

I also remembered where i got the backup commands from which i'm using, it was this post here: https://appwrite.io/blog/post/how-to-back-up-your-appwrite-data, but something doesn't seem to be working like that anymore with 1.8.0.

Not sure what changed with this update, but all those months up to now, the backup script i was using was working fine. I would suggest to update that blog post, so people don't use the wrong commands if they implement a backup strategy (which everyone should of course).

See the logs of the current issue with this second application in the thread below... Attached the "old" (problematic) and fixed backup.sh scripts, which are run through a cronjob.

TL;DR
Developers faced deadlocks during the backup process after upgrading to version 1.8.0. By adjusting the backup command to include `--single-transaction`, `--skip-lock-tables`, and a timeout, the deadlock issue was resolved. The root cause was traced to metadata lock contention due to missing flags in the backup command. Ensure to update backup strategies with the new commands for version 1.8.0.
Ludwig
9 Nov, 2025, 14:36

Some AI based investigation:

Technical Report: Production Outage Caused by mysqldump Database Deadlock in Appwrite 1.8.0

Environment:

  • Appwrite Version: 1.8.0
  • MariaDB Version: 10.11.11 (InnoDB)
  • Total Database Size: 440.27 MB
  • InnoDB Buffer Pool Size: 128 MB (undersized: 3.4x smaller than database)
  • Audit Tables: _2_audit (31.67 MB), _1_audit (10.66 MB), _console_audit (1.70 MB)
  • Backup Method: Hourly cron job using mysqldump --all-databases

MariaDB Configuration (Relevant Timeouts):

TypeScript
innodb_lock_wait_timeout: 50 seconds
lock_wait_timeout: 86400 seconds (24 hours!)
wait_timeout: 28800 seconds (8 hours)
interactive_timeout: 28800 seconds (8 hours)

Timeline of Events:

Nov 6-9: Normal Operation

Backups completing successfully every 4 hours:

TypeScript
[2025-11-06 16:00:01] [INFO] Starting hourly database backup
[2025-11-06 16:00:13] [INFO] Database backup completed successfully (94M)

[2025-11-07 00:00:02] [INFO] Starting hourly database backup
[2025-11-07 00:00:14] [INFO] Database backup completed successfully (95M)

[2025-11-08 12:00:01] [INFO] Starting hourly database backup
[2025-11-08 12:00:14] [INFO] Database backup completed successfully (100M)

[2025-11-09 08:00:01] [INFO] Starting hourly database backup
[2025-11-09 08:00:15] [INFO] Database backup completed successfully (107M)

Normal completion time: 12-15 seconds

Nov 9, 12:00 - Critical Failure Event:

TypeScript
[2025-11-09 12:00:01] [INFO] Starting hourly database backup to /var/www/appwrite/backups/hourly/appwrite_db_2025-11-09_12-00-01.sql.gz
[2025-11-09 13:55:55] [INFO] Database backup completed successfully: /var/www/appwrite/backups/hourly/appwrite_db_2025-11-09_12-00-01.sql.gz
[2025-11-09 13:55:55] [INFO] Backup size: 4.0K
Ludwig
9 Nov, 2025, 14:37

Critical Observations:

  • Duration: 1 hour 55 minutes (normal: 12-15 seconds)
  • File size: 20 bytes (normal: 95-107 MB)
  • File is corrupt/incomplete

Backup File Evidence:

TypeScript
-rwxr-xr-x 1 <project> <project> 107M Nov  9 08:00 appwrite_db_2025-11-09_08-00-01.sql.gzLast successful
-rw-rw-r-- 1 <project> <project>  20  Nov  9 13:55 appwrite_db_2025-11-09_12-00-01.sql.gzFAILED/CORRUPT

Root Cause Analysis:

1. Problematic Backup Command:

The backup script was using this command:

TypeScript
docker compose exec -T mariadb sh -c 'exec mysqldump --all-databases --add-drop-database -u"$MYSQL_USER" -p"$MYSQL_PASSWORD"'

Critical Missing Flags:

  • ❌ No --single-transaction → Forces table locks instead of consistent MVCC snapshot
  • ❌ No --skip-lock-tables → Explicitly requests LOCK TABLES on all tables
  • ❌ No timeout → Can hang indefinitely (up to 24 hours due to lock_wait_timeout)
  • ❌ Includes audit tables → Adds 43+ MB of non-critical data

2. Lock Acquisition Behavior:

When mysqldump runs without --single-transaction, it issues:

TypeScript
LOCK TABLES `_1__metadata` READ /*!32311 LOCAL */,
            `_1__metadata_perms` READ /*!32311 LOCAL */,
            `_1_audit` READ /*!32311 LOCAL */,
            `_1_database_1_collection_154` READ /*!32311 LOCAL */,
            `_1_database_1_collection_155` READ /*!32311 LOCAL */,
            ... [repeating for ~400+ tables]

This command attempts to acquire metadata locks on ALL tables simultaneously.

Ludwig
9 Nov, 2025, 14:39

3. Database State During Deadlock:

When we investigated, SHOW PROCESSLIST revealed:

TypeScript
Id   User  Host            db       Command  Time   State                               Info
---  ----  --------------  -------  -------  -----  ----------------------------------  ------------------------------------------------
74   user  localhost       appwrite Query    6904   Waiting for table metadata lock     LOCK TABLES `_1__metadata` READ /*!32311 LOCAL */,`_1__metadata_perms` READ...
75   user  localhost       appwrite Query    6904   Waiting for table metadata lock     LOCK TABLES `_1__metadata` READ /*!32311 LOCAL */,`_1__metadata_perms` READ...
40   user  172.19.0.19     appwrite Query    5540   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
52   user  172.19.0.14     appwrite Query    6895   Waiting for table metadata lock     INSERT INTO `appwrite`.`_2_stats` (`_createdAt`, `_permissions`, `_uid`...
53   user  172.19.0.19     appwrite Query    5460   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
72   user  172.19.0.19     appwrite Query    5372   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
78   user  172.19.0.19     appwrite Query    5291   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
82   user  172.19.0.19     appwrite Query    5215   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
Ludwig
9 Nov, 2025, 14:39
TypeScript
Id   User  Host            db       Command  Time   State                               Info
---  ----  --------------  -------  -------  -----  ----------------------------------  ------------------------------------------------
86   user  172.19.0.19     appwrite Query    5130   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
90   user  172.19.0.19     appwrite Query    5053   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
94   user  172.19.0.19     appwrite Query    4976   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...

98   user  172.19.0.19     appwrite Query    4900   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
102  user  172.19.0.19     appwrite Query    4824   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
106  user  172.19.0.19     appwrite Query    4747   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
Ludwig
9 Nov, 2025, 14:39
TypeScript
Id   User  Host            db       Command  Time   State                               Info
---  ----  --------------  -------  -------  -----  ----------------------------------  ------------------------------------------------
110  user  172.19.0.19     appwrite Query    4660   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
114  user  172.19.0.19     appwrite Query    4579   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
118  user  172.19.0.19     appwrite Query    4497   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...
122  user  172.19.0.19     appwrite Query    4413   Waiting for table metadata lock     SELECT `main`.* FROM `appwrite`.`_console_keys` AS `main` WHERE `main`.`_uid`...

Total: 18+ queries stuck waiting for metadata locks for 4,000-6,900 seconds (1.1-1.9 hours)
Ludwig
9 Nov, 2025, 14:39

Additional Context:

  • 6 concurrent mysqldump processes were running (duplicate cron jobs)
  • All legitimate application queries were blocked behind the LOCK TABLES requests

4. Appwrite Application Layer Impact:

Starting at 12:01 (1 minute after backup started), Appwrite began logging database timeouts:

TypeScript
[Error] Timestamp: 2025-11-09T12:01:08+00:00
[Error] Method: PATCH
[Error] URL: /v1/databases/:databaseId/collections/:collectionId/documents/:documentId
[Error] Type: Utopia\Database\Exception\Timeout
[Error] Message: Query timed out
[Error] File: /usr/src/code/vendor/utopia-php/database/src/Database/Adapter/MariaDB.php
[Error] Line: 1778

[Error] Timestamp: 2025-11-09T12:01:38+00:00
[Error] Method: POST
[Error] URL: /v1/functions/:functionId/executions
[Error] Type: Utopia\Database\Exception\Timeout
[Error] Message: Query timed out
[Error] File: /usr/src/code/vendor/utopia-php/database/src/Database/Adapter/MariaDB.php
[Error] Line: 1778

[Error] Timestamp: 2025-11-09T12:02:25+00:00
[Error] Method: PATCH
[Error] URL: /v1/databases/:databaseId/collections/:collectionId/documents/:documentId
[Error] Type: Utopia\Database\Exception\Timeout
[Error] Message: Query timed out
[Error] File: /usr/src/code/vendor/utopia-php/database/src/Database/Adapter/MariaDB.php
[Error] Line: 1778

[... continuing every 30-60 seconds for nearly 2 hours]
Ludwig
9 Nov, 2025, 14:40

5. Swoole Worker Pool Exhaustion:

By 13:50 (near the end of the hang), all Swoole workers were exhausted:

TypeScript
swoole_dispatch: Risky branch: did not find a idle worker, picking random worker 74
swoole_dispatch: Risky branch: did not find a idle worker, picking random worker 66
swoole_dispatch: Risky branch: did not find a idle worker, picking random worker 73
swoole_dispatch: Risky branch: did not find a idle worker, picking random worker 69
swoole_dispatch: Risky branch: did not find a idle worker, picking random worker 65
[... repeated 100+ times]

[Database] SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
[Database] Lost connection detected. Reconnecting...
[Database] SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

All Swoole workers were blocked waiting for database queries that were stuck behind metadata locks → Complete API unresponsiveness.

Why This Deadlock Occurs:

InnoDB Metadata Lock Contention Spiral:

  1. mysqldump without --single-transaction requests metadata locks on ~400 tables
  2. Active transactions/queries hold implicit metadata locks on tables they're accessing
  3. With undersized buffer pool (128MB vs 440MB database), queries experience disk I/O waits
  4. Disk-bound queries hold metadata locks longer than memory-bound queries would
  5. mysqldump cannot acquire all 400 locks simultaneously → enters wait state (up to 24 hours!)
  6. All new queries queue behind the blocked LOCK TABLES request (metadata locks are FIFO)
  7. Swoole workers exhaust as they wait for their queries
  8. No new requests can be processed → complete deadlock

Why Buffer Pool Size Matters:

  • 128 MB buffer can cache ~6,000-8,000 InnoDB pages (16KB each)
  • 440 MB database requires ~28,000 pages
  • Cache hit rate: ~28% when cold
  • Every cache miss requires disk I/O (adding 5-15ms per page)
  • Long-running queries → long-held metadata locks → deadlock vulnerability
Ludwig
9 Nov, 2025, 14:41

The Fix:

Updated backup command:

TypeScript
timeout 1800 docker compose exec -T mariadb sh -c 'exec mysqldump \
  --single-transaction \
  --skip-lock-tables \
  --ignore-table=appwrite._1_audit \
  --ignore-table=appwrite._2_audit \
  --ignore-table=appwrite._console_audit \
  --all-databases \
  --add-drop-database \
  -u"$MYSQL_USER" -p"$MYSQL_PASSWORD"'

How Each Flag Prevents the Deadlock:

  1. --single-transaction:
    • Uses START TRANSACTION WITH CONSISTENT SNAPSHOT
    • Leverages InnoDB's MVCC (Multi-Version Concurrency Control)
    • No table locks required - reads from consistent point-in-time snapshot
    • Zero impact on concurrent writes/reads
  2. --skip-lock-tables:
    • Explicitly prevents LOCK TABLES commands
    • Safety net in case --single-transaction isn't sufficient for some tables
  3. --ignore-table for audit logs:
    • Excludes 43+ MB of non-critical audit data
    • Audit logs can be truncated/cleaned without backup
    • Reduces backup time by ~30-40%
  4. timeout 1800 (30 minutes):
    • Kills backup if it hangs
    • Prevents indefinite metadata lock holds
    • Maximum disruption window: 30 minutes instead of 24 hours
Reply

Reply to this thread by joining our Discord

Reply on Discord

Need support?

Join our Discord

Get community support by joining our Discord server.

Join Discord

Get premium support

Join Appwrite Pro and get email support from our team.

Learn more