this post was submitted on 22 Jun 2026
25 points (96.3% liked)
Selfhosted
60210 readers
1015 users here now
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil.
-
No spam.
-
Posts are to be related to self-hosting.
-
Don't duplicate the full text of your blog or readme if you're providing a link.
-
Submission headline should match the article title.
-
No trolling.
-
Promotion posts require active participation, with an account that is at least 30 days old. F/LOSS without a paywall has exceptions, with requirements. See the rules link for details.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I use ArchiveBox occasionally to archive websites into a browsable, offline copy, regardless of the data disappearing online, and independently of whether or not ArchiveBox is in operation after the archiving finishes, if of course you persist the data locally. I've archived several self-hosted sites because they contained data I would like to conserve for personal use at a later date. It does it quite thoroughly, tho obviously large sites would take a little time to ingest. It might be worth spinning up a Docker instance and run it through it's paces to see if it would fit your criteria.
I wonder if an authorised remote user (ie an affiliated researcher) can easily instruct ArchiveBox to store a URL and later retrieve it. Also, ideally a random user should be able to retrieve the archived web page or file (eg a PDF, CSV etc). The idea is that authorised researchers can get URLs archived, and then any user reading our reports can click on a citation and get our archived source if the original is not available any more. I'll need to run it and see, but it looks promising.
Keeping the archive alive for years later, possibly after funding dries up, is another challenge but there are public repositories that may be suitable for that.
Once you download the data and persist it on local storage, it's available to whomever has access to that drive or server.
For rando access, you could put the data on a public ftp server, or even get fancier with html styled pages. If I understand you correctly, you want a random user to be reading your report that has citations, so that when a rando user clicks the citation, they are presented with whatever you downloaded with ArchiveBox. Kind of Wikipedia style. Speaking of which, a wiki framework might be just the ticket you are looking for.
Download the data, integrate it in to a selfhosted wiki, and it would be available to rando users. Of course your wiki server will have to have all the accoutrements of security so you don't get hacked by a bazillion bots.
A wiki is a good idea. Putting a Singlefile or similar all-in-one file in a repository and provide index numbers organised as a look-up table would also work for easy retrieval by a random research user. Both require some admin and more effort from the researchers.
I wish there was a hostable version of archive.is for near-zero maintenance. You just submit a URL over the internet and the web page is cached once along with a screenshot. Then, anyone can access the archived version. This can be done already with archive.is but we have no control over its future, which is critical for long-term dependable archiving.
Did a little digging this morning. I honestly can't find a selfhosted, archive.is alternative. All the solutions I came up with are either paid for and online use only, or free, but still online use only.
Thanks for doingthe digging. An archivist may know something more. Or the archive.is people.
It might be worthwhile to run your scenario by the folks at https://lemmy.world/c/datahoarder