#1020217 snapshot.debian.org: write a generic file driver supporting multiple backend (such as object-storage)

#1020217#3
Date:
2022-09-18 09:29:43 UTC
From:
To:
Having files stored into an object-like storage could improve cost and
performence over the conventional storage that snapshot is currently
using. (Currently using over 130 TB!)

Multiple componants needs to access snapshot farm. The importer, the web
app (if not redirected) and multiple other scripts.

We should write a generic file driver to allow all those component to
access/update/delete file from a config-defined backend.

This driver would be usable in at least two langauges: ruby and python.
I'm not sure what is the best course of action here. Some kind of
bindings or maintaining two separate drivers.

Note that there is also some C program as part for snapshot (the fsck
program).

I was thinking for writing at least two backend for the driver:

- a standard flat filesystem storage (what we have currently)
- an object-like storage. S3 would be a good candidate since a couple of
  opensource storage solution provide S3 compatible API.

That would allow a two step transision: start using the driver, then
switch the backend.

#1020217#8
Date:
2023-09-21 16:26:27 UTC
From:
To:
Hi Baptiste,

I was wondering if you made some progress on this?

Your plan looks very good. I agree that a S3 backend would make a lot of
sense (usable both with self-hosted solutions like MinIO, or with
managed services).

Let me know if I can help somehow.

Lucas

#1020217#13
Date:
2023-09-21 17:01:16 UTC
From:
To:
The Ruby part (in charge of importing data) already has an abstraction
layer:
https://salsa.debian.org/snapshot-team/snapshot/-/blob/master/snapshot#L59

The Python part (in charge of the web app) doesn't:
https://salsa.debian.org/snapshot-team/snapshot/-/blob/master/web/app/snapshot/controllers/archive.py#L192

Lucas

#1020217#18
Date:
2023-09-22 06:42:10 UTC
From:
To:
Hi Bastian,

I'm playing with the idea of a S3-backed snapshot.d.o implementation
(see #1020217).

Could we use the Debian AWS account to host that service? It would
require one fairly powerful VM, and a large S3 bucket (approximately
150-200 TB).

Best,

Lucas

#1020217#23
Date:
2023-09-22 15:12:21 UTC
From:
To:
Hi Lucas

I would assume that a service like snapshot would be within the scope
for our AWS usage.  Noah?

200 TB should be no problem.

However we need to talk about that "one […] VM", because this sounds
like you intend to use AWS as VM hosting, which it is not.

Please think about this in form of services and there should be at least
two:
- the injestor, which can only exist once and writes, and
- the web frontend, which should be able to exist several times and only
  reads.

So you want to plan with running the multiple web frontends with load
balancers and maybe even cloudfront.

Regards,
Bastian

#1020217#28
Date:
2023-09-23 09:26:21 UTC
From:
To:
Hi Lucas,

Unfortunately I have not made any progress regarding this feature, nor
do I plan on working on it anytime soon (due to a lack of time).

You're very welcome to have a go at it.

Best,

#1020217#33
Date:
2023-09-23 15:30:09 UTC
From:
To:
Hi,

I'm copying here, for reference, an IRC log from #debian-admin:

19:38 <@jcristau> lucas: right, so the importer would indeed somehow need to learn about copying stuff to other locations/backends.
                  the python/web bit in practice doesn't matter, i think, because it's not used for /file/... requests, those are
                  serviced by apache directly
(https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/production/modules/roles/templates/snapshot/snapshot.debian.org.conf.erb#L76)
19:42 <@jcristau> lucas: the way the copying of stuff to the replica happens currently is very async
                  (https://salsa.debian.org/snapshot-team/snapshot/-/blob/master/master/import-run#L47-L52) which means we just hope
                  things are eventually consistent, but there's a window where the replica would 404 for new files.  that might be ok,
                  because in practice users probably aren't requesting the very latest mirror run, but
19:42 <@jcristau> it might be nice to somehow keep track of what's synced where and what's new, also to reduce how much time we're
                  spending walking directories.
21:38 < lucas> ah right, apache could just redirect to S3 or act as a proxy, or the web app could point to S3 directly
21:40 < lucas> re replica, I wonder if we shouldn't instead aim for several parallel instances of snapshot (at least 2), each living
               its own life (not in a master/slave setup)
03:46 <@pabs> lucas: then you would get different snapshot timestamps?
07:47 < lucas> pabs: do we care?
07:52 <@pabs> having multiple sets of timestamps served from different servers seems confusing for end-users
08:38 < lucas> ah, my idea was more that they would be exposed as different services, with one being used for snapshot.debian.org
08:40 < lucas> the snapshot services could actually share timestamps afterwards (for example to cover a period of downtime)
15:26 <@jcristau> lucas: my idea would be to add the s3 bucket as a backend in
https://salsa.debian.org/dsa-team/mirror/cdn-fastly/-/blob/master/services/snapshot.yaml and use that for /file/
21:14 < lucas> jcristau: I'm not familiar with how CDNs work. could you tell fastly to fallback to other backends if the first one
               replied with 404?
21:57 <@Mithrandir> we can do that
21:57 <@Mithrandir> it might not be wise

#1020217#38
Date:
2023-09-24 23:09:31 UTC
From:
To:
It makes sense and I will look into it.  Let's not start anything until
we hear definitive confirmation.  Do we have a sense of how much
outgoing traffic the current snapshot service generates?

Agreed.

I agree that it would be best to design something more cloud-oriented.
However, if there's an existing infrastructure that can be moved as a
"lift & shift" into AWS now, with architectural refactoring happening
later, that's an OK place to start.

noah

#1020217#43
Date:
2023-09-25 11:03:29 UTC
From:
To:
Hi,

From #debian-admin:

<Mithrandir> lucas:
https://munin.debian.org/debian.org/sallinen.debian.org/ip_193_62_202_27.html
and
https://munin.debian.org/debian.org/sallinen.debian.org/ip_2001_630_206_4000_1a1a_0_c13e_ca1b.html
I think, so average of 35Mbit/sec over the last week.
replacing the filesystem-backed storage backend to an S3-backed on.
Then look at other aspects.

Lucas

#1020217#48
Date:
2023-10-04 14:12:45 UTC
From:
To:
OK, let's do it.

noah

#1020217#53
Date:
2024-05-05 19:31:38 UTC
From:
To:
Hi,

I made some minor progress on this, and I thought I'd report back (I'll
try to attend the meeting tomorrow, but I'm not sure I'll manage).


# What I did:

- I got the OK to host a S3-backed snapshot mirror using the Debian AWS
  account (see thread in #1020217)
- I got access to the account, and set up a VM with a Debian mirror.
- I could run the file-backed snapshot importer on it
- I modified the snapshot importer code to make it import to S3
  (basically it means creating an S3Backend class that inherits from
  StockageBackend), and tested it by importing the debian-security
  archive.


# What I plan to work on:

- Set up a real development environment. I plan to use Vagrant, which is
  not a perfect solution for many reasons, but anyway the provisioning
  scripts will likely be re-usable with something else.
- Change the web frontend to allow using S3.
- Improve (parallelize) the importer code, specifically the sha1-hashing
  (to process multiple files in parallel, one per core) and the file
  copying/uploading-to-S3 (this is especially important for S3 because,
  to achieve good throughput, you need many transfers in parallel).


# Open questions

## What to do with this?

Assuming all this works and we can have a S3-backed snapshot
service, there's the question of what to do with it.
We have several options I think:

### A. s3-snapshot as a mirror of snapshot.debian.org

The imports would continue to be done on snapshot.debian.org, but
everything would be mirrored on a regular basis to S3.
That would allow faster access to the data, but would not help with the
performance of imports.

### B. dual-stack snapshot.debian.org

The importer on snapshot.debian.org would import both to local stockage,
and to s3. The web app could proxy requests to both.
That would allow more resilience, but does not help with the performance
of imports (on the contrary).

### C. s3-snapshot as a fork of snapshot.debian.org

After an initial import of snapshot.debian.org data, s3-snapshot would
live its own independent life.
The main downside is that both databases will become out of sync
(not the same mirror runs; they might each miss some packages, but not
the same ones).

### D. do both at the same time

Do C, but also make sure that every file that ever gets stored in
snapshot.debian.org gets imported in the bucket used for s3-snapshot, to
be able to expose a full read-only mirror of the snapshot.debian.org DB.

### E. Nice experiment, but let's forget about it

(That should be mentioned as well)



In any case, it probably makes sense to keep at least two different
instances of the snapshot service (and data) on preferably different
implementations, to make sure that we don't lose everything in case of
catastrophic incident.

I plan to aim for C as a first step.


## How to do an initial import of snapshot.debian.org data?

That's more a technical question. The PostgreSQL DB should not be a
problem, as it's quite small (~ 20 GB). For the data itself, it could
probably be uploaded directly from local storage on the snapshot.d.o
hosts to a S3 bucket. I could upload from an EC2 VM to S3 at about 10
Gbps (limited by the bandwidth of local storage). I don't know about the
performance (storage, network) of snapshot.debian.org, but that probably
means that an import into S3 is doable in a couple of weeks in the worst
case.  In any case, that's a question to keep in mind, but that does not
need to be resolved now.

Lucas

#1020217#58
Date:
2024-05-06 12:40:16 UTC
From:
To:
Lucas Nussbaum <lucas@debian.org> writes:

Is this s3 bucket public, or will it be?

I have been worried about the state of snapshot and I am mirroring its
data into local Git LFS.  Since snapshot.debian.org doesn't support
rsync and don't make the postgres database dumps available (so that I
can identify SHA1 objects and speed up downloads), I am using HTML web
scraping to find out what files exists to snapshot.d.o.

My goal has been to put all the Git LFS objects in a publicly-accessible
S3 bucket too.  While imports were running I didn't work on the bucket
side, and I suspect my download will take months to complete at current
speeds.  I publish Git LFS versions of archive.debian.org,
ftp.debian.org and ftp.ports.debian.org already, though, so perhaps I
could start on the bucket publishing part for them and see about adding
an incremental snapshot.d.o copy while it is still working.

/Simon

#1020217#63
Date:
2024-05-06 13:12:49 UTC
From:
To:
It's my plan to make it public, yes

If you are a DD, you could:

ssh lw08.debian.org psql service=snapshot-guest -c '\\dt'
                List of relations
 Schema |        Name         | Type  |  Owner
--------+---------------------+-------+----------
 public | archive             | table | snapshot
 public | binpkg              | table | snapshot
 public | config              | table | snapshot
 public | directory           | table | snapshot
 public | farm_journal        | table | snapshot
 public | file                | table | snapshot
 public | file_binpkg_mapping | table | snapshot
 public | file_srcpkg_mapping | table | snapshot
 public | indexed_mirrorrun   | table | snapshot
 public | mirrorrun           | table | snapshot
 public | node                | table | snapshot
 public | removal_affects     | table | snapshot
 public | removal_log         | table | snapshot
 public | srcpkg              | table | snapshot
 public | symlink             | table | snapshot
(15 rows)

The 'file' table is the one that lists all known hashes.

Lucas

#1020217#68
Date:
2024-06-09 10:17:05 UTC
From:
To:
Hi there!

I'm hereby cc-ing our DPL, to get him involved an eventual storage
cluster purchase for the project.

I have been mentioning such an object storage driver, so we could use
OpenStack swift for snapshot.d.o for years. I am happy that it finally
brings traction, and that Lucas is implementing this. Thanks Lucas.

However, it is disappointing to see it moving toward an s3
implementation, which is a protocol from a closed-source service. I
already wrote multiple times that my company (Infomaniak) was willing to
sponsor storage space on Swift for it.

FYI, we currently manage more than 110 PB of storage over 7500+ HDD and
growing, so I am not scared at all about storage space. Some clusters we
manage are around 40PB, with billions of files.

Though I do not envision *any* sponsor to provide the storage space, but
rather, Debian maintaining its own storage cluster. To give you a rough
idea of what this would represent, let me give you some idea of what
type of hardware involved, and it pricing.

I would currently recommend this type of 2U server:
https://www.aicipc.com/en/productdetail/51224

They provide 24 HDD storage, plus 2x SSD for the system. Equipped with a
decent amount of RAM (128 GB) and a CPU, the cost is around 4000 EUR per
server without the HDDs. Currently, 22TB Seagate HDDs are at around 350
EUR per piece. So such a server fully equipped with HDD would be at
around 12000 EUR per server. If we want 6 of them (which is IMO the bare
minimum for redundancy, as each file is stored 3 times), we're talking
of around 75000 EUR, plus 3 smaller servers to act as auth server (ie:
Keytsone), at let's say 4000 each (which is average price for a decent
server with 128 GB of RAM and 2x SSD system, plus 32 cores CPU), we
would end up spending around 90kEUR for such a storage cluster. This
would provide 1 PB of redundant (ie: copied 3 times) storage space.

This would need 15U of rack space, plus an eventual switch.

Though if we want to be safe, we could purchase at least one spare node
and a few HDDs.

So all together, we're looking at a 100kEUR spending. Note that this
type of swift cluster could also be used for artifact storage for Salsa
(gitlab has a swift backend storage driver).

Also note that we're currently (at Infomaniak) using these AIC chassis
with amd64, but we're looking at replacing the boards with some Gigabyte
motherboard using Ampere CPU (ie: ARM64 based, with 80 cores).

If we need to save on costs at first, we could lower the amount of HDDs
(let's say half), and add more HDDs later on. But you got my point, it's
not *that* expensive, and for sure, something we could afford (we do
have the budget).

I am hereby volunteering to setup such an OpenStack swift cluster for
snapshot.d.o, or other Debian use. It'd be easy to find other people
interested in helping me maintain this (I know some persons that already
volunteered to help me when I'm away, in holidays or otherwise).

Your thoughts? Would the DPL agree on such a spending? Do we have
somewhere to host this? At UBC? What would be the DSA opinion about
this? Would they get involved? (IMO, we can do without DSA if they don't
want to get involved, but I'd prefer if they would...)

Cheers,

Thomas Goirand (zigo)

P.S: Please CC me.

#1020217#73
Date:
2024-06-09 18:10:34 UTC
From:
To:
Hi Thomas,

Am Sun, Jun 09, 2024 at 12:17:05PM +0200 schrieb Thomas Goirand:

As far as I understand the problem you see is that a valuable service of
Debian might get dependant from a closed-source service, right?  I would
like to hear the opinion of the poeple who are actually working on this.

Kind regards
    Andreas.

#1020217#78
Date:
2024-06-10 11:31:02 UTC
From:
To:
Hi Thomas,

It looks like you see the work on object-storage backend as a
procurement/infrastructure question. I don't think that this is the main
issue. Based on what I've done so far (and I still plan to continue
working on this, but I have limited time for Debian nowadays), the code
also needs deep changes because, if you want an object-storage-based
backend to perform adequately, you need to more parallelism for
backend-related operations.

This is true whether the storage service is AWS S3, OpenStack Swift,
Azure Blob Storage, or Ceph Object Storage, or whatever. If you increase
the latency between the importer/indexer and the storage service,
you need parallelism to hide it and stay with a bandwidth-bound problem.

To work on this, you need an object storage backend, but I suspect that
once it works with one of them, porting it to another one will be
trivial, as the S3-specific bits are really minimal. (and Swift is
S3-compatible anyway)

Help is welcomed -- my code is at
https://salsa.debian.org/lucas/snapshot/-/commits/s3snap/?ref_type=heads

Typically a good way to test this is to try to import a small archive
(e.g. debian-security with one architecture only) and see if you can get
an import time on object storage that is similar to the one on
file-based storage.

Lucas

#1020217#83
Date:
2024-06-10 16:44:49 UTC
From:
To:
Sponsored cloud services are a very dangerous drugs to be addicted to. Someone else computer, like the FSFE puts it!


As for the work, again, that'd be on me for the setup and maintenance (though not physical setup, but we can pay a service provider for that if we have no other option...).


Thomas


Sent from Workspace ONE Boxer


On Jun 9, 2024 8:10 PM, Andreas Tille <tille@debian.org> wrote:

Hi Thomas,

Am Sun, Jun 09, 2024 at 12:17:05PM +0200 schrieb Thomas Goirand:

As far as I understand the problem you see is that a valuable service of
Debian might get dependant from a closed-source service, right?  I would
like to hear the opinion of the poeple who are actually working on this.

Kind regards
    Andreas.

#1020217#88
Date:
2024-08-14 12:39:48 UTC
From:
To:
FYI, I stopped working on that, since the FS-backed service is back in a
good state. My work is pushed to the above git repo and I cleaned up the
infrastructure bits I set up on AWS.

Lucas