You can run a NAS with any Linux distro - your limiting factor is having enough drive storage. You might want to consider something that’s great at using virtual machines (e.g., Proxmox) if you don’t like Docker, but I have almost everything I want running in Docker and haven’t needed to spin up a single virtual machine.
If you want to generate audiobooks using your own / a hosted TTS server, check out one of these options:
If you don’t have a decent GPU, Kokoro is a great option as it’s fast enough to run on CPU and still sounds very good.
If you’re going to use Kokoro, Audiblez (posted by another commenter) looks like it makes that more of an all-in-one option.
If you want something that you can use without an upfront building of the audiobook, of the above options, only OpenReader-WebUI supports that. RealtimeTTS is a library that handles that, but I don’t know if there are already any apps out there that integrate it.
If you have the audiobook generation handled and just want to be able to follow along with text / switch between text and audio, check out https://storyteller-platform.gitlab.io/storyteller/