this post was submitted on 22 Jun 2026
25 points (96.3% liked)

Selfhosted

60210 readers
1022 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Detailed Rules Post

  1. Be civil.

  2. No spam.

  3. Posts are to be related to self-hosting.

  4. Don't duplicate the full text of your blog or readme if you're providing a link.

  5. Submission headline should match the article title.

  6. No trolling.

  7. Promotion posts require active participation, with an account that is at least 30 days old. F/LOSS without a paywall has exceptions, with requirements. See the rules link for details.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS
 

I am one of a network of academic researchers from around the world working on collecting media market data. One problem is that referenced sources often disappear which makes validation later difficult or impossible. So, I thought I would recommend self-hosting something like archive.org that would allow affiliated researchers to submit their web references and have their sources efficiently archived in a central project repository. That would allow validation and continuity for when web-hosted text and files disappear or researchers leave.

I have been looking at ArchiveBox. If you have experience of this or a similar solution, would that fit the bill? The important thing is efficiency for researchers submitting/retrieving pages and files, and openness in structure and formats so that the archive would remain useful if ArchiveBox or similar disappears. FOSS of course means you can't be locked out anyway.

you are viewing a single comment's thread
view the rest of the comments
[–] moonpiedumplings@programming.dev 3 points 4 days ago* (last edited 4 days ago) (1 children)

Check out Zotero: https://www.zotero.org/

Zotero is an open source bibliography manager. It's my main go to tool for generating works cited pages, like during essays.

But, it also has a browser extension, which can download, and archive sites or academic articles you are adding to the sources. I would then use the fulltext search that zotero provides for easy searching of sources.

Unfortunately, it's not hosted, which would make it difficult to share.

EDIT: It does look like the server component is open source, AGPLv3: https://github.com/zotero/dataserver/

But, I cannot find any deployment instructions. But, it looks like their hosted version lets users create groups of shared items, including sharing archived snapshots of the various items.

[–] Stopwatch1986@lemmy.ml 1 points 4 days ago (1 children)

I have been using Zotero every day for more than two decades and somehow it hasn't cross my mind. You may be on to something.

Zotero supports public and private shared bibliographies that you can subscribe to through the client or their web interface. Each entry contains the bibliographical details, notes attachments, file attachments and links to local files. It also captures webpages and metadata through the browser addon. The local database can be backed up and, if self-hosted, you have control. The best part is that academic researchers will be familiar with the software and process. One downside is that the cached file is not independently archived so it could be tampered with. Thanks for the idea.

One downside is that the cached file is not independently archived so it could be tampered with. Thanks for the idea.

You could have multiple researchers archive it and store copies independently. Then tampering would show up accross copies.

Unfortunately, central hosting doesn't guarantee that it is tamper free. The host could be hacked, or could be malicious. Archive.is was caught tampering with their archived pages:

https://en.wikipedia.org/wiki/Wikipedia:Archive.today_guidance