Hi all,
I’m having an issue with an NFS mount that I use for serving podcasts through audibookshelf. The issue has been ongoing for months, and I’m not sure where the problem is and how to start debugging.
My setup:
- Unraid with NFS share “podcasts” set up
- Proxmox on another machine, with VM running Fedora Server 40.
- Storage set up in Fedora to mount the “podcasts” share on boot, works fine
- docker container on the same Fedora VM has Audiobookshelf configured with the “podcasts” mount passed through in the docker-compose file.
The issue:
NFS mount randomly drops. When it does, I need to manually mount it again, then restart the Audiobookshelf container (or reboot the VM, but I have other services).
There doesn’t seem to be any rhyme or reason to the unmount. It doesn’t coincide to any scheduled updates or spikes in activity. No issue on the Unraid side that I can see. Sometimes it drops over night, sometimes mid day. Sometimes it’s fine for a week, other times I’m remounting twice a day. What has finally forced me to seek help is the other day I was listening to a podcast, paused for 10-15 mins and couldn’t restart the episode until I went through the manual mount procedure. I checked and it was not due to the disk sinning down.
I’ve tried updating everything I could, issue persists. I only just updated to Fedora 40. It was on 38 previously and initially worked for many months without issue, then randomly started dropping the NFS mounts (I tried setting up other share mounts and same problem). Update to 39, then 40 and issue persists.
I’m not great with logs but I’m trying to learn. Nothing sticks out so far.
Does anyone have any ideas how I can debug and hopefully fix this?
I’m going to have to cut up my nerd card here, but I had similar issues with NFS exports from my roll-your-own build.
After a month of troubleshooting I decided that working is better than purity so I just mounted the SMB shares instead and everything just worked going forward.
Best I can tell, NFS is just very very finnicky when it comes to hardware accessibility (drive spun down, etc.), network reliability, and is just a lot less robust than other options. I never was able to trace why NFS was the one and only thing that never seemed to work right, but at least there’s other options as a workaround?
I did the hackiest, lamest thing back in the day… I had my client write the current date and time to a file on the share every two minutes as a Cron job… Kept it working for months! I saw it on a forum somewhere, tried it, and… Shocked Pikachu face I don’t know if I ever disabled that Cron job! Haha!
And as a bonus, presumably you have a nice file filled with historic dates and times!
I checked, it’s still there! (It doesn’t append, it overwrites, so no, I just have a file with the current date and time accurate to within two minutes.)
NFSv3 (udp, stateless) was always as reliable as the network infra under Linux, I found. NFSv4 made things a bit more complicated.
You don’t want any NAT / stateful connection tracking in the network path (anything that could hiccup and forget), and wired connections only for permanent storage mounts, of course.
Yeah it was NAS -> DAC -> Switch -> endpoints and for whatever reason, for some use cases, it would just randomly hiccup and break shit.
I could never figure out what the problem was and as far as I could tell there was nothing in the network path that stopped working or flapped or whatever unless it did it so fast it didn’t trigger any monitoring stuff, yet somehow still broke NFS (and only NFS).
Figured after a bit that since everything else seemed fine, and the data was being exported via like 6 other methods, that meh, I’ll just use something else.