• 7 Posts
  • 240 Comments
Joined 2 years ago
cake
Cake day: June 2nd, 2023

help-circle

  • I think you’d be right that the direct cost of running the crawler and index would not be the issue. But fighting SEO to keep your results decent is probably a cost that dwarfs the basic technical cost of running the crawler and index.

    And you’d need a technical security team on top of things as link farms aren’t your only risk, I’m sure there are countless ways to manipulate the algorithm to put your site on top that Google probably have multiple teams working on fighting it full time.

    Many of these things would likely not be a problem for a startup, though. No one is paying SEO firms big money to get into a search index no one has heard of and hardly anyone uses, so these costs probably grow exponentially over time as you become more well known.


  • I’m not disputing that you might be right, but the internet archive runs a very different service. Mainly that Google needs to continuously prune their 400 billion page index because of link rot. The Internet Archive has the opposite aim, they are preserving sites that no longer exist.

    I’m also not sure they even crawl. Do sites get added on user request? When looking at a medium popularity page, you see it only has a couple of scrapes a year.

    None of them. At least, none that I’m aware of. I just don’t think that direct expenses are the reason that there are are only two major web search tools. I also don’t think Google and bing are good examples to point at when estimating the cost of running a complete search engine.

    I would suggest direct expenses are the barrier, but perhaps crawling is not the main expense. I would be interested to know any speculations you have outside of expenses that cause a barrier?


  • That website claims they add 3-5 billion pages a month. Google is doing that in a day or three, as recency of information is very important in search. Plus that site claims 100 billion pages to Googles 400 billion. It’s still an impressive project.

    Size isn’t everything, so the real question is: what search site uses only the common crawl index and has results on par with bing or google?





  • Remember sync isn’t a good backup. You’re thinking of loss of drives but if this is important data you need to also consider mistakes.

    If you accidentally delete files you shouldn’t, you don’t want this deletion to sync to all your copies so it’s gone for good and the backup doesn’t help.

    Personally I use borgmatic to keep incremental, deduplicated backups. Then I can go back to previous states.

    If you install nextcloud all in one, it comes with a backup solution (also borg based). Then devices don’t need a copy of every file. But you’ll want your server to have a backup drive for this.

    I then sync my borg backup to a backblaze b2 bucket for offsite, encrypted backup using rclone. That then meets the 3 2 1 backup plan.

    I notice you mention Jellyfin. I don’t back up my Jellyfin media, the cloud storage for that could get very expensive and I could get it again if I needed it.










  • I think it’s reasonable to respond with something like “I’m really not a kid person, I don’t much enjoy talking about kids or being around kids. I’m still happy to meet for coffee, but maybe we plan to keep it a short chat and see how it goes?”

    They’re mostly just going to be the focus of the occasion because they need constant attention, and I don’t really like kids in general. And, if they cry or act up and attract attention I will hate that.

    Many places will have toy areas for kids, maybe you can find one (or ask if they can suggest one since they are more likely to know which ones nearby have that). A 2 year old can probably keep themselves mostly entertained off and on for 30 mins or an hour, depending on the specific kid and if there are a good selection of toys. The 6 month old will need more attention but may well spend a lot of the time sleeping.

    An old friend/aquaintance I’ve not spoken to in a few years popped up recently and we got chatting a little over text.

    I don’t want to put you off, but I’d probably have a plan for what you’re going to do if they start a MLM pitch.