#1020217 snapshot.debian.org: write a generic file driver supporting multiple backend (such as object-storage) #1020217
- Package:
- snapshot.debian.org
- Source:
- snapshot.debian.org
- Submitter:
- Baptiste Beauplat
- Date:
- 2024-08-14 12:54:01 UTC
- Severity:
- normal
Having files stored into an object-like storage could improve cost and performence over the conventional storage that snapshot is currently using. (Currently using over 130 TB!) Multiple componants needs to access snapshot farm. The importer, the web app (if not redirected) and multiple other scripts. We should write a generic file driver to allow all those component to access/update/delete file from a config-defined backend. This driver would be usable in at least two langauges: ruby and python. I'm not sure what is the best course of action here. Some kind of bindings or maintaining two separate drivers. Note that there is also some C program as part for snapshot (the fsck program). I was thinking for writing at least two backend for the driver: - a standard flat filesystem storage (what we have currently) - an object-like storage. S3 would be a good candidate since a couple of opensource storage solution provide S3 compatible API. That would allow a two step transision: start using the driver, then switch the backend.
Hi Baptiste, I was wondering if you made some progress on this? Your plan looks very good. I agree that a S3 backend would make a lot of sense (usable both with self-hosted solutions like MinIO, or with managed services). Let me know if I can help somehow. Lucas
The Ruby part (in charge of importing data) already has an abstraction layer: https://salsa.debian.org/snapshot-team/snapshot/-/blob/master/snapshot#L59 The Python part (in charge of the web app) doesn't: https://salsa.debian.org/snapshot-team/snapshot/-/blob/master/web/app/snapshot/controllers/archive.py#L192 Lucas
Hi Bastian, I'm playing with the idea of a S3-backed snapshot.d.o implementation (see #1020217). Could we use the Debian AWS account to host that service? It would require one fairly powerful VM, and a large S3 bucket (approximately 150-200 TB). Best, Lucas
Hi Lucas I would assume that a service like snapshot would be within the scope for our AWS usage. Noah? 200 TB should be no problem. However we need to talk about that "one […] VM", because this sounds like you intend to use AWS as VM hosting, which it is not. Please think about this in form of services and there should be at least two: - the injestor, which can only exist once and writes, and - the web frontend, which should be able to exist several times and only reads. So you want to plan with running the multiple web frontends with load balancers and maybe even cloudfront. Regards, Bastian
Hi Lucas, Unfortunately I have not made any progress regarding this feature, nor do I plan on working on it anytime soon (due to a lack of time). You're very welcome to have a go at it. Best,
Hi,
I'm copying here, for reference, an IRC log from #debian-admin:
19:38 <@jcristau> lucas: right, so the importer would indeed somehow need to learn about copying stuff to other locations/backends.
the python/web bit in practice doesn't matter, i think, because it's not used for /file/... requests, those are
serviced by apache directly
(https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/production/modules/roles/templates/snapshot/snapshot.debian.org.conf.erb#L76)
19:42 <@jcristau> lucas: the way the copying of stuff to the replica happens currently is very async
(https://salsa.debian.org/snapshot-team/snapshot/-/blob/master/master/import-run#L47-L52) which means we just hope
things are eventually consistent, but there's a window where the replica would 404 for new files. that might be ok,
because in practice users probably aren't requesting the very latest mirror run, but
19:42 <@jcristau> it might be nice to somehow keep track of what's synced where and what's new, also to reduce how much time we're
spending walking directories.
21:38 < lucas> ah right, apache could just redirect to S3 or act as a proxy, or the web app could point to S3 directly
21:40 < lucas> re replica, I wonder if we shouldn't instead aim for several parallel instances of snapshot (at least 2), each living
its own life (not in a master/slave setup)
03:46 <@pabs> lucas: then you would get different snapshot timestamps?
07:47 < lucas> pabs: do we care?
07:52 <@pabs> having multiple sets of timestamps served from different servers seems confusing for end-users
08:38 < lucas> ah, my idea was more that they would be exposed as different services, with one being used for snapshot.debian.org
08:40 < lucas> the snapshot services could actually share timestamps afterwards (for example to cover a period of downtime)
15:26 <@jcristau> lucas: my idea would be to add the s3 bucket as a backend in
https://salsa.debian.org/dsa-team/mirror/cdn-fastly/-/blob/master/services/snapshot.yaml and use that for /file/
21:14 < lucas> jcristau: I'm not familiar with how CDNs work. could you tell fastly to fallback to other backends if the first one
replied with 404?
21:57 <@Mithrandir> we can do that
21:57 <@Mithrandir> it might not be wise
It makes sense and I will look into it. Let's not start anything until we hear definitive confirmation. Do we have a sense of how much outgoing traffic the current snapshot service generates? Agreed. I agree that it would be best to design something more cloud-oriented. However, if there's an existing infrastructure that can be moved as a "lift & shift" into AWS now, with architectural refactoring happening later, that's an OK place to start. noah
Hi, From #debian-admin: <Mithrandir> lucas: https://munin.debian.org/debian.org/sallinen.debian.org/ip_193_62_202_27.html and https://munin.debian.org/debian.org/sallinen.debian.org/ip_2001_630_206_4000_1a1a_0_c13e_ca1b.html I think, so average of 35Mbit/sec over the last week. replacing the filesystem-backed storage backend to an S3-backed on. Then look at other aspects. Lucas
OK, let's do it. noah
Hi, I made some minor progress on this, and I thought I'd report back (I'll try to attend the meeting tomorrow, but I'm not sure I'll manage). # What I did: - I got the OK to host a S3-backed snapshot mirror using the Debian AWS account (see thread in #1020217) - I got access to the account, and set up a VM with a Debian mirror. - I could run the file-backed snapshot importer on it - I modified the snapshot importer code to make it import to S3 (basically it means creating an S3Backend class that inherits from StockageBackend), and tested it by importing the debian-security archive. # What I plan to work on: - Set up a real development environment. I plan to use Vagrant, which is not a perfect solution for many reasons, but anyway the provisioning scripts will likely be re-usable with something else. - Change the web frontend to allow using S3. - Improve (parallelize) the importer code, specifically the sha1-hashing (to process multiple files in parallel, one per core) and the file copying/uploading-to-S3 (this is especially important for S3 because, to achieve good throughput, you need many transfers in parallel). # Open questions ## What to do with this? Assuming all this works and we can have a S3-backed snapshot service, there's the question of what to do with it. We have several options I think: ### A. s3-snapshot as a mirror of snapshot.debian.org The imports would continue to be done on snapshot.debian.org, but everything would be mirrored on a regular basis to S3. That would allow faster access to the data, but would not help with the performance of imports. ### B. dual-stack snapshot.debian.org The importer on snapshot.debian.org would import both to local stockage, and to s3. The web app could proxy requests to both. That would allow more resilience, but does not help with the performance of imports (on the contrary). ### C. s3-snapshot as a fork of snapshot.debian.org After an initial import of snapshot.debian.org data, s3-snapshot would live its own independent life. The main downside is that both databases will become out of sync (not the same mirror runs; they might each miss some packages, but not the same ones). ### D. do both at the same time Do C, but also make sure that every file that ever gets stored in snapshot.debian.org gets imported in the bucket used for s3-snapshot, to be able to expose a full read-only mirror of the snapshot.debian.org DB. ### E. Nice experiment, but let's forget about it (That should be mentioned as well) In any case, it probably makes sense to keep at least two different instances of the snapshot service (and data) on preferably different implementations, to make sure that we don't lose everything in case of catastrophic incident. I plan to aim for C as a first step. ## How to do an initial import of snapshot.debian.org data? That's more a technical question. The PostgreSQL DB should not be a problem, as it's quite small (~ 20 GB). For the data itself, it could probably be uploaded directly from local storage on the snapshot.d.o hosts to a S3 bucket. I could upload from an EC2 VM to S3 at about 10 Gbps (limited by the bandwidth of local storage). I don't know about the performance (storage, network) of snapshot.debian.org, but that probably means that an import into S3 is doable in a couple of weeks in the worst case. In any case, that's a question to keep in mind, but that does not need to be resolved now. Lucas
Lucas Nussbaum <lucas@debian.org> writes: Is this s3 bucket public, or will it be? I have been worried about the state of snapshot and I am mirroring its data into local Git LFS. Since snapshot.debian.org doesn't support rsync and don't make the postgres database dumps available (so that I can identify SHA1 objects and speed up downloads), I am using HTML web scraping to find out what files exists to snapshot.d.o. My goal has been to put all the Git LFS objects in a publicly-accessible S3 bucket too. While imports were running I didn't work on the bucket side, and I suspect my download will take months to complete at current speeds. I publish Git LFS versions of archive.debian.org, ftp.debian.org and ftp.ports.debian.org already, though, so perhaps I could start on the bucket publishing part for them and see about adding an incremental snapshot.d.o copy while it is still working. /Simon
It's my plan to make it public, yes
If you are a DD, you could:
ssh lw08.debian.org psql service=snapshot-guest -c '\\dt'
List of relations
Schema | Name | Type | Owner
--------+---------------------+-------+----------
public | archive | table | snapshot
public | binpkg | table | snapshot
public | config | table | snapshot
public | directory | table | snapshot
public | farm_journal | table | snapshot
public | file | table | snapshot
public | file_binpkg_mapping | table | snapshot
public | file_srcpkg_mapping | table | snapshot
public | indexed_mirrorrun | table | snapshot
public | mirrorrun | table | snapshot
public | node | table | snapshot
public | removal_affects | table | snapshot
public | removal_log | table | snapshot
public | srcpkg | table | snapshot
public | symlink | table | snapshot
(15 rows)
The 'file' table is the one that lists all known hashes.
Lucas
Hi there! I'm hereby cc-ing our DPL, to get him involved an eventual storage cluster purchase for the project. I have been mentioning such an object storage driver, so we could use OpenStack swift for snapshot.d.o for years. I am happy that it finally brings traction, and that Lucas is implementing this. Thanks Lucas. However, it is disappointing to see it moving toward an s3 implementation, which is a protocol from a closed-source service. I already wrote multiple times that my company (Infomaniak) was willing to sponsor storage space on Swift for it. FYI, we currently manage more than 110 PB of storage over 7500+ HDD and growing, so I am not scared at all about storage space. Some clusters we manage are around 40PB, with billions of files. Though I do not envision *any* sponsor to provide the storage space, but rather, Debian maintaining its own storage cluster. To give you a rough idea of what this would represent, let me give you some idea of what type of hardware involved, and it pricing. I would currently recommend this type of 2U server: https://www.aicipc.com/en/productdetail/51224 They provide 24 HDD storage, plus 2x SSD for the system. Equipped with a decent amount of RAM (128 GB) and a CPU, the cost is around 4000 EUR per server without the HDDs. Currently, 22TB Seagate HDDs are at around 350 EUR per piece. So such a server fully equipped with HDD would be at around 12000 EUR per server. If we want 6 of them (which is IMO the bare minimum for redundancy, as each file is stored 3 times), we're talking of around 75000 EUR, plus 3 smaller servers to act as auth server (ie: Keytsone), at let's say 4000 each (which is average price for a decent server with 128 GB of RAM and 2x SSD system, plus 32 cores CPU), we would end up spending around 90kEUR for such a storage cluster. This would provide 1 PB of redundant (ie: copied 3 times) storage space. This would need 15U of rack space, plus an eventual switch. Though if we want to be safe, we could purchase at least one spare node and a few HDDs. So all together, we're looking at a 100kEUR spending. Note that this type of swift cluster could also be used for artifact storage for Salsa (gitlab has a swift backend storage driver). Also note that we're currently (at Infomaniak) using these AIC chassis with amd64, but we're looking at replacing the boards with some Gigabyte motherboard using Ampere CPU (ie: ARM64 based, with 80 cores). If we need to save on costs at first, we could lower the amount of HDDs (let's say half), and add more HDDs later on. But you got my point, it's not *that* expensive, and for sure, something we could afford (we do have the budget). I am hereby volunteering to setup such an OpenStack swift cluster for snapshot.d.o, or other Debian use. It'd be easy to find other people interested in helping me maintain this (I know some persons that already volunteered to help me when I'm away, in holidays or otherwise). Your thoughts? Would the DPL agree on such a spending? Do we have somewhere to host this? At UBC? What would be the DSA opinion about this? Would they get involved? (IMO, we can do without DSA if they don't want to get involved, but I'd prefer if they would...) Cheers, Thomas Goirand (zigo) P.S: Please CC me.
Hi Thomas,
Am Sun, Jun 09, 2024 at 12:17:05PM +0200 schrieb Thomas Goirand:
As far as I understand the problem you see is that a valuable service of
Debian might get dependant from a closed-source service, right? I would
like to hear the opinion of the poeple who are actually working on this.
Kind regards
Andreas.
Hi Thomas, It looks like you see the work on object-storage backend as a procurement/infrastructure question. I don't think that this is the main issue. Based on what I've done so far (and I still plan to continue working on this, but I have limited time for Debian nowadays), the code also needs deep changes because, if you want an object-storage-based backend to perform adequately, you need to more parallelism for backend-related operations. This is true whether the storage service is AWS S3, OpenStack Swift, Azure Blob Storage, or Ceph Object Storage, or whatever. If you increase the latency between the importer/indexer and the storage service, you need parallelism to hide it and stay with a bandwidth-bound problem. To work on this, you need an object storage backend, but I suspect that once it works with one of them, porting it to another one will be trivial, as the S3-specific bits are really minimal. (and Swift is S3-compatible anyway) Help is welcomed -- my code is at https://salsa.debian.org/lucas/snapshot/-/commits/s3snap/?ref_type=heads Typically a good way to test this is to try to import a small archive (e.g. debian-security with one architecture only) and see if you can get an import time on object storage that is similar to the one on file-based storage. Lucas
Sponsored cloud services are a very dangerous drugs to be addicted to. Someone else computer, like the FSFE puts it!
As for the work, again, that'd be on me for the setup and maintenance (though not physical setup, but we can pay a service provider for that if we have no other option...).
Thomas
Sent from Workspace ONE Boxer
On Jun 9, 2024 8:10 PM, Andreas Tille <tille@debian.org> wrote:
Hi Thomas,
Am Sun, Jun 09, 2024 at 12:17:05PM +0200 schrieb Thomas Goirand:
As far as I understand the problem you see is that a valuable service of
Debian might get dependant from a closed-source service, right? I would
like to hear the opinion of the poeple who are actually working on this.
Kind regards
Andreas.
FYI, I stopped working on that, since the FS-backed service is back in a good state. My work is pushed to the above git repo and I cleaned up the infrastructure bits I set up on AWS. Lucas