• 0 Posts
  • 37 Comments
Joined 2 years ago
cake
Cake day: July 30th, 2023

help-circle




    • I never said anything about EFI not supporting multi boot. I said that the had to be kept in lockstep during updates. I recognize the term “manual” might have been a bit of a misnomer there, since I included systems where the admin has to take action to enable replication. ESXi (my main hardware OS for now) doesn’t even have software RAID for single-server datastores (only vSAN). Windows and Linux both can do it, but its a non-default manual process of splicing the drives together with no apparent automatic replacement mechanism - full manual admin intervention. With a hardware RAID, you just have to plop the new disk in and it splices the drive back into the array automatically (if the drive matches)
    • Dell and HPe both have had RAM caching for reads and writes since at least 2011. That’s why the controllers have batteries :)
      • also, I said it only had to handle the boot disk. Plus you’re ignoring the fact that all modern filesystems will do page caching in the background regardless of the presence of hardware cache. That’s not unique to ZFS, Windows and Linux both do it.
    • mdadm and hardware RAID offer the same level of block consistency validation to my current understanding- you’d need filesystem-level checksumming no matter what, and as both mdadm and hardware RAID are both filesystem agnostic, they will almost equally support the same filesystem-level features (Synology implements BTRFS on top of mdadm - I saw a small note somewhere that they had their implementation request block rebuild from mdadm if btrfs detected issues, but I have been unable to verify this claim so I do not consider it (yet) as part of my hardware vs md comparison)

    Hardware RAID just works, and for many, that’s good enough. In more advanced systems, all its got to handle is a boot partition, and if you’re doing your job as a sysadmin there’s zero important data in there that can’t be easily rebuilt or restored.


  • I never said I didn’t use software RAID, I just wanted to add information about hardware RAID controllers. Maybe I’m blind, but I’ve never seen a good implementation of software RAID for the EFI partition or boot sector. During boot, most systems I’ve seen will try to always access one partition directly and a second in order, which is bypassing the concept of a RAID, so the two would need to be kept manually in sync during updates.

    Because of that, there’s one notable place where I won’t - I always use hardware RAID for at minimum the boot disk because Dell firmware natively understands everything about it from a detect/boot/replace perspective. Or doesn’t see anything at all in a good way. All four of my primary servers have a boot disk on either a Startech RAID card similar to a Dell BOSS or have an array to boot off of directly on the PERC. It’s only enough space to store the core OS.

    Other than that, at home all my other physical devices are hypervisors (VMware ESXi for now until I can plot a migration), dedicated appliance devices (Synology DSM uses mdadm), or don’t have a redundant disks (my firewall - backed up to git, and my NUC Proxmox box, both firewalls and the PVE are all running ZFS for features).

    Three of my four ESXi servers run vSAN, which is like Ceph and replaces RAID. Like Ceph and ZFS, it requires using an HBA or passthrough disks for full performance. The last one is my standalone server. Notably, ESXi does not support any software RAID natively that isn’t vSAN, so both of the standalone server’s arrays are hardware RAID.

    When it comes time to replace that Synology it’s going to be on TrueNAS


  • For recovering hardware RAID: most guaranteed success is going to be a compatible controller with a similar enough firmware version. You might be able to find software that can stitch images back together, but that’s a long shot and requires a ton of disk space (which you might not have if it’s your biggest server)

    I’ve used dozens of LSI-based RAID controllers in Dell servers (of both PERC and LSI name brand) for both work and homelab, and they usually recover the old array to the new controller pretty well, and also generally have a much lower failure rate than the drives themselves (I find myself replacing the cache battery more often than the controller itself)

    Only twice out of the handful of times I went to a RAID controller from a different generation

    • first time from a mobi failed R815 (PERC H700) physically moving the disks to an R820 (PERC H710, might’ve been an H710P) and they were able to foreign import easily
    • Second time on homelab I went from an H710 mini mono to an H730P full size in the same chassis (don’t do that, it was a bad idea), but aside from iDRAC being very pissed off, the card ran for years with the same RAID-1 array imported.

    As others have pointed out, this is where backups come into play. If you have to replace the server with one from a different generation, you run the risk that the drives won’t import. At that point, you’d have to sanitize the super block of the array and re-initialize it as a new array, then restore from backup. Now, the array might be just fine and you never notice a difference (like my users that had to replace a failed R815 with an 820), but the result pattern is really to the extremes of work or fault with no in between.

    Standalone RAID controllers are usually pretty resilient and fail less often than disks, but they are very much NOT infallible as you are correct to assess. The advantage to software systems like mdadm, ZFS, and Ceph is that it removed the precise hardware compatibility requirements, but by no means does it remove the software compatible requirements - you’ll still have to do your research and make sure the new version is compatible with the old format, or make sure it’s the same version.

    All that’s said, I don’t trust embedded motherboard RAIDs to the same degree that I trust standalone controllers. A friend of mine about 8-10 years ago ran a RAID-0 on a laptop that got it’s super block borked when we tried to firmware update the SSDs - stopped detecting the array at all. We did manage to recover data, but it needed multiple times the raw amount of storage to do so.

    • we made byte images of both disks in ddrescue to a server that had enough spare disk space
    • found a software package that could stitch together images with broken super blocks if we knew the order the disks were in (we did), which wrote a new byte images back to the server
    • copied the result again and turned it into a KVM VM to network attach and copy the data off (we could have loop mounted the disk to an SMB share and been done, but it was more fun and rewarding to boot the recovered OS afterwards as kind of a TAKE THAT LENOVO…we were younger)
    • took in total a bit over 3TB to recover the 2x500GB disks to a usable state - and took about a week of combined machine and human time to engineer and cook, during which my friend opted to rebuild his laptop clean after we had images captured - to one disk windows, one disk Linux, not RAID-0 this time :P



  • I don’t have a full answer to snapshots right now, but I can confirm Nextcloud has VFS support on Windows. I’ve been working on a project to move myself over to it from Syno drive. Client wise, the two have fairly similar features with one exception - Nextcloud generates one Explorer sidebar object per connection, which I think Synology handles as shortcuts in the one directory. If prefer if NC did the later or allowed me to choose, but I’m happier with what I got for now.

    As for the snapshotting, you should be able to snapshot the underlying FS/DB at the same time, but I haven’t poked deeply at that. Files I believe are plain (I will disassemble my nextcloud server to confirm this tonight and update my comment), but some do preserve version history so I want to be sure before I give you final confirmation. The Nextcloud root data directory is broken up by internal user ID, which is an immutable field (you cannot change your username even in LDAP), probably because of this filesystem.

    One thing that may interest you is the external storage feature, which I’ve been working on migrating a large data set I have to:

    • can be configured per-user or system-wide
    • password can be per-user, system-wide, or re-use the login password on the fly
    • data is stored raw on an external file server - supports a bunch of protocols, off hand SMB, S3, WebDAV, FTP
    • shows up as a normal-ish folder in the base user folder
    • can template names, such as including your username as part of the share name
    • Nextcloud does not independently contribute versioning data to the backend file server, so the only version control is what your backing server natively implements

    Admin docs for reference: https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/external_storage_configuration_gui.html

    I use LDAP user auth to my nextcloud, with two external shares to my NAS using a pass-through session password (the NAS is AD joined to the same domain as Nextcloud uses for LDAPS). I don’t know if/how the “store password in database” option is encrypted, but if anyone knows I would be curious, because using session passwords prevents the user from sharing the folder to at least a federated destination (I tried with my friend’s NC server, haven’t tried with a local user yet but I assume the same limitations apply). If that’s your vibe, then this is a feature XD.

    One of my two external storage mounts is a “common” share with multiple users accessing the same directory, and the second share is \\nas.example.com\home\nextcloud. Internally, these I believe is handled by PHP spawning smbclient subprocesses, so if you have lots of remote files and don’t want to nuke your Nextcloud, you will probably need to increase the PHP child limits (that too me too long to solve lol)

    That funny sub-mount name above handles an edge case where Nextcloud/DAV can’t handle directories with certain characters - notably the # that Synology uses to expose their #recycle and #snapshot structures. This means that remote mount to SMB has a limitation at the moment where you can’t mount the base share of a Synology NAS that has this feature enabled. I tried a server-side Nextcloud plugin to try to filter this out before it exposed to DAV, but it was glitchy. Unsure if this was because I just had too many files for it to handle thanks to the way Synology snapshots are exposed or if it actually was something else - either way I worked around the problem for now by not ever mounting a base share of my Synology NAS. Other snapshot exposure methods may be affected - I have a ZFS TrueNAS Core, so maybe I’ll throw that at it and see if I can break Nextcloud again :P

    Edit addon: OP just so I answer your real question when I get to this again this evening - when you said that Nextcloud might not meet your needs, was your concern specifically the server-side data format? I assume from the rest of your questions that you’re concerned with data resilience and the ability to get your data back without any vendor tools - that it will just be there when you need it.



  • Adding on one aspect to things others have mentioned here.

    I personally have both ports/URLs opened and VPN-only services.

    IMHO, it also depends on the exposure tolerance the software has or risk of what could get compromised if an attacker were to find the password.

    Start by thinking of the VPN itself (Taliscale, Wireguard, OpenVPN, IPSec/IKEv2, Zerotier) as a service just like the service your considering exposing.

    Almost all (working on the all part lol) of my external services require TOTP/2FA and are required to be directly exposed - i.e. VPN gateway, jump host, file server (nextcloud), git server, PBX, music reflector I used for D&D, game servers shared with friends. Those ones I either absolutely need to be external (VPN, jump) or are external so that I don’t have to deal with the complicated networking of per-user firewalls so my friends don’t need to VPN to me to get something done.

    The second part for me is tolerance to be external and what risk it is if it got popped. I have a LOT of things I just don’t want on the web - my VM control panels (proxmox, vSphere, XCP), my UPS/PDU, my NAS control panel, my monitoring server, my SMB/RDP sessions, etc. That kind of stuff is super high risk - there’s a lot of damage that someone could do with that, a LOT of attack surface area, and, especially in the case of embedded firmware like the UPSs and PDUs, potentially software that the vendor hasn’t updated in years with who-knows-what bugs lurking in it.

    So there’s not really a one size fits all kind of situation. You have to address the needs of each service you host on a case by case basis. Some potential questions to ask yourself (but obviously a non-exhaustive list):

    • does this service support native encryption?
      • does the encryption support reasonably modern algorithms?
      • can I disable insecure/broken encryption types?
      • if it does not natively support encryption, can I place it behind a reverse proxy (such as nginx or haproxy) to mitigate this?
    • does this service support strong AAA (Authentication, Authorization, Auditing)?
      • how does it log attempts, successful and failed?
      • does it support strong credentials, such as appropriately complex passwords, client certificate, SSH key, etc?
      • if I use an external authenticator (such as AD/LDAP), does it support my existing authenticator?
      • does it support 2FA?
    • does the service appear to be resilient to internet traffic?
      • does the vendor/provider indicate that it is safe to expose?
      • are there well known un-patched vulnerabilities or other forum/social media indicators that hosting even with sane configuration is a problem?
      • how frequently does the vendor release regular patches (too few and too many can be a problem)?
      • how fast does the vendor/provider respond to past security threats/incidents (if information is available)?
    • is this service required to be exposed?
      • what do I gain/lose by not exposing it?
      • what type of data/network access risk would an attacker gain if they compromised this service?
      • can I mitigate a risk to it by placing a well understood proxy between the internet and it? (for example, a well configured nginx or haproxy could mitigate some problems like a TCP SYN DoS or an intermediate proxy that enforces independent user authentication if it doesn’t have all the authentication bells and whistles)
      • what VLAN/network is the service running on? (*if you have several VLANs you can place services on and each have different access classes)
      • do I have an appropriate alternative means to access this service remotely than exposing it? (Is VPN the right option? some services may have alternative connection methods)

    So, as you can see, it’s not just cut and dry. You have to think about each service you host and what it does.

    Larger well known products - such as Guacamole, Nextcloud, Owncloud, strongswan, OpenVPN, Wireguard - are known to behave well under these circumstances. That’s going to factor in to this too. Many times the right answer will be to expose a port - the most important thing is to make an active decision to do so.


  • I’m not the commenter but I can take a guess - I would assume “data source” refers to a machine readable database or aggregator.

    Making the system capable of turning off a generic external service in an automated way isn’t necessarily trivial, but it’s doable given appropriate systems.

    Knowing when to turn a service off is going to be the million dollar question. It not only has to determine what the backend application version is during its periodic health check, it also needs to then make an autonomous decision that a vulnerability exists and is severe enough to take action.

    Home Assistant probably provides a “safe list” of versions that instances regularly pull down and automatically disconnect if they determine themselves to be affected, or, of the remote UI connection passes through the Home Assistant Central servers, the Central servers could maintain that safety database and off switch. (Note - I don’t have a home assistant so I can’t check myself)


  • Somewhat halfway between practical use and just messing around for fun.

    Several years ago I built a GPS NTP clock out of an RPi3 and an Adafruit GPS hat. Once I had the PPS driver installed, it’s precision/drift got pretty good. According to its own self measurements, I got pretty dang close to NIST stratum 1 NTP servers, but those are hundreds of miles away so that measurement isn’t super precise. It’s still running today, clocking nearly 24/7 operation since (checks shopping history) 2017, though I replaced the breadboard and mini module with a full sized hat with the same chipset in 2021.

    Recently I acquired a proper hardware GPS clock and I stacked the two against each other and found out my RPi did not half bad and can get between 0.5-10ms of the professionals (literally I’m pretty sure I’d need more precise measuring equipment to tell the difference between the two at this point than a regular computer). Now my homelab has fully redundant internet-disconneted stratum 1 time. Been half considering if I could write a GPSD driver for it as a joke, but I know upstream won’t accept it because it doesn’t offer SOOO many features they’d need.

    As for what else - I just kind of keep an eye out for projects related to GPS and high precision time, like the open source atomic PCI card that was released a few years ago. Finding out what people are doing to get better and better time is just downright interesting.

    Outside of the time world, it’s just fun to see what projects people come up with relating to maps and navigation. Stretch goal once I have enough server horsepower is to make a render-capable Open Street Map server with my home region loaded to start with, but eventually I’d like to get it to the point where I can load and process world.osm. That… Requires a LOT of CPU and SSD space.



  • As others said, depends on your use case. There are lots of good discussions here about mirroring vs single disks, different vendors, etc. Some backup systems may want you to have a large filesystem available that would not be otherwise attainable without a RAID 5/6.

    Enterprise backups tend to fall along the recommendation called 3-2-1:

    • 3 copies of the data, of which
    • 2 are backups, and
    • 1 is off-site (and preferably offline)

    On my home system, I have 3-2-0 for most data and 4-3-0 for my most important virtual machines. My home system doesn’t have an off-site, but I do have two external hard drives connected to my NAS.

    • All devices are backed up to the NAS for fast recovery access between 1w and 24h RPO
    • The NAS backs up various parts of itself to the external hard drives every 24h
      • Data is split up by role and convenience factor - just putting stuff together like Tetris pieces, spreading out the NAS between the two drives
      • The most critical data for me to have first during a recovery is backed up to BOTH external disks
    • Coincidentally, both drives happen to be from different vendors, but I didn’t initially plan it that way, the Seagate drive was a gift and the WD drive was on sale

    Story time

    I had one of my two backup drives fail a few months ago. Literally actually nothing of value was lost, just went down to the electronics shop and bought a bigger drive from the same vendor (preserving the one on each vendor approach). Reformatted the disk, recreated the backup job, then ran the first transfer. Pretty much not a big deal, all the data was still in 2 other places - the source itself, and the NAS primary array.

    The most important thing to determine about a backup when you plan one - think about how much the data is valuable to you. That’s how much you might be willing to spend on keeping that data safe.


  • Running nextcloud (non docker version) and I don’t see near so many client updates - usually once every few weeks, which would be a reasonable expected pace. Server updates are less frequent.

    On Windows (all of my primary devices), I just install the NC client update and skip the explorer restart, pending full reboot later. Tis the nature of literally anything that deeply integrates with Explorer. I’ve seen explorer “death” during updates from several vendors that have similar explorer plugins, not just NC. Explorer sometimes just decides to nope out even without NC updating.

    Now on one device I hadn’t opened for a while, I saw NC run two updates in a row, but that was my fault for procrastinating the first one.

    Here’s the desktop release history: https://github.com/nextcloud/desktop/releases
    I don’t see a “one every day” within the block of time between Dec 6 and today, unless you had the release candidate builds which may have been more frequent in a few spots.


  • In the IT world, we just call that a server. The usual golden rule for backups is 3-2-1:

    • 3 copies of the data total, of which
    • 2 are backups (not the primary access), and
    • 1 of the backups is off-site.

    So, if the data is only server side, it’s just data. If the data is only client side, it’s just data. But if the data is fully replicated on both sides, now you have a backup.

    There’s a related adage regarding backups: “if there’s two copies of the data, you effectively have one. If there’s only one copy of the data, you can never guarantee it’s there”. Basically, it means you should always assume one copy somewhere will fail and you will be left with n-1 copies. In your example, if your server failed or got ransomwared, you wouldn’t have a complete dataset since the local computer doesn’t have a full replica.

    I recently had a a backup drive fail on me, and all I had to do was just buy a new one. No data loss, I just regenerated the backup as soon as the drive was spun up. I’ve also had to restore entire servers that have failed. Minimal data loss since the last backup, but nothing I couldn’t rebuild.

    Edit: I’m not saying what your asking for is wrong or bad, I’m just saying “backup” isn’t the right word to ask about. It’ll muddy some of the answers as to what you’re really looking for.



  • I don’t have an immediate answer for you on encryption. I know most of the communication is encrypted in flight for AD, and on disk passwords are stored hashed unless the “use reversible encryption field is checked”. There are (in Microsoft terms) gMSAs (group-managed service accounts) but other than using one for ADFS (their oath provider), I have little knowledge of how it actually works on the inside.

    AD also provides encryption key backup services for Bitlocker (MS full-partition encryption for NTFS) and the local account manager I mentioned, LAPS. Recovering those keys requires either a global admin account or specific permission delegation. On disk, I know MS has an encryption provider that works with the TPM, but I don’t have any data about whether that system is used (or where the decryptor is located) for these accounts types with recoverable credentials.

    I did read a story recently about a cyber security firm working with an org who had gotten their way all the way down to domain admin, but needed a biometric unlocked Bitwarden to pop the final backup server to “own” the org. They indicated that there was native windows encryption going on, and managed to break in using a now-patched vulnerability in Bitwarden to recover a decryption key achievable by resetting the domain admin’s password and doing some windows magic. On my DC at home, all I know is it doesn’t need my password to reboot so there’s credentials recovery somewhere.

    Directly to your question about short term use passwords: I’m not sure there’s a way to do it out of the box in MS AD without getting into some overcomplicated process. Accounts themselves can have per-OU password expiration policies that are nanosecond accurate (I know because I once accidentally set a password policy to 365 nanoseconds instead of a year), and you can even set whole account expiry (which would prevent the user from unlocking their expired password with a changed one). Theoretically, you could design/find a system that interacts with your domain to set, impound/encrypt, and manage the account and password expiration of a given set of users, but that would likely be add on software.