Inspired by the very similar thread about school incidents.

  • nick@midwest.social
    link
    fedilink
    arrow-up
    2
    ·
    2 months ago

    INC-224, never forget.

    I am an infra engineer at a fairly large scale (not like Amazon, but we have some BIG customers) SaaS company; despite our scale, we are only like 250 people and of them only about 90 engineers. We store a bunch of data in MySQL.

    15:30:00, I get a page “MySQL table is full.” I immediately know my day is ruined, since I’ve never heard of this error before, but know it ain’t great.

    15:30:10, every Pagerduty escalation policy in the entire company gets bombarded with pages.

    I look at the database instance. The table size is “only” 16TiB, so it’s a bit confusing.

    We are hard down for several hours as we scramble to delete data or somehow free up space. Turns out, google backs ClpudSQL MySQL instances with ext4 disks instead of zfs, and the max file size on ext4 is… you guessed it, 16TiB.

    We learned a LOT of lessons from this, and are now offloading a shitload of json into either MongoDB or gcs, depending on the requirements. The largest table is down to 3TiB now :D