some container layers fail to download #800

Open
opened 2 weeks ago by dachary · 26 comments

Update 18/11/2022: checkout the workaround of forgejo.


This package fails to download like so:

$ docker pull codeberg.org/codename/codename:2.1.2
2.1.2: Pulling from codename/codename
ca7dd9ec2225: Retrying in 1 second 
9b23cba192ec: Download complete 
976964cb3cc4: Download complete 
aca243fe15cb: Download complete 
2c8382c3dc4d: Download complete 
b9ff04336109: Download complete 
507c6f5b4499: Download complete 
4f4fb700ef54: Retrying in 2 seconds 

Any idea how to debug this?

Update 18/11/2022: [checkout the workaround of forgejo](https://codeberg.org/forgejo/forgejo/issues/2#issuecomment-690018). --- [This package](https://codeberg.org/codename/-/packages/container/codename/2.1.2) fails to download like so: ``` $ docker pull codeberg.org/codename/codename:2.1.2 2.1.2: Pulling from codename/codename ca7dd9ec2225: Retrying in 1 second 9b23cba192ec: Download complete 976964cb3cc4: Download complete aca243fe15cb: Download complete 2c8382c3dc4d: Download complete b9ff04336109: Download complete 507c6f5b4499: Download complete 4f4fb700ef54: Retrying in 2 seconds ``` Any idea how to debug this?
Poster

Asked in the Gitea general chatroom for advice:

@KN4CK3R: I'm having an issue at Codeberg downloading some container layers. Do you have a tip that would help figure out what happens?

Asked in the [Gitea general chatroom](https://matrix.to/#/!bXIgKPtRdSXHBnVBaz:osgeo.org/$Lk1Qwp76W3D-GNYqVOYPxJD3cvyf0aLLWU96OYuPdrE?via=libera.chat&via=t2bot.io&via=matrix.org) for advice: > @KN4CK3R: I'm having [an issue at Codeberg downloading some container layers](https://codeberg.org/Codeberg/Community/issues/800). Do you have a tip that would help figure out what happens?
Collaborator

I have experienced this issues as well when building the weblate container in woodpecker ci. It works correctly when built locally and pushed again

Cc @6543

I have experienced this issues as well when building the weblate container in woodpecker ci. It works correctly when built locally and pushed again Cc @6543

According to journalctl -u docker --since 2022-11-16 (after activating debug), there is an error 500:

Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.587116167+01:00" level=debug msg="Calling POST /v1.41/images/create?fromImage=codeberg.org%2Fcodename%2Fcodename&tag=2.1.2"
Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.594154921+01:00" level=debug msg="hostDir: /etc/docker/certs.d/codeberg.org"
Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.594186647+01:00" level=debug msg="Trying to pull codeberg.org/codename/codename from https://codeberg.org v2"
Nov 16 08:28:02 oliskull dockerd[4492]: time="2022-11-16T08:28:02.785803667+01:00" level=debug msg="(*service).Write started" expected="sha256:09dd9700022bd1beb0f7284006712113e50ee5c10ac0b0e0291a762a27e76b9c" ref="manifest-sha256:09dd9700022bd1beb0f7284006712113e50ee5c10ac0b0e0291a762a27e76b9c" total=1992
Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.813745865+01:00" level=debug msg="Pulling ref from V2 registry: codeberg.org/codename/codename:2.1.2"
Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.814164764+01:00" level=debug msg="pulling blob \"sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce\""
Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.814170446+01:00" level=debug msg="pulling blob \"sha256:9b23cba192ecb9bbbfcdb85655f7a360cd9f911dfa8cedfb75126344b39a3ace\""
Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.814175367+01:00" level=debug msg="pulling blob \"sha256:976964cb3cc4d4acd39eff0efe66af10fcfc29b3cbc607f59040db9f1368507a\""
Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.895044190+01:00" level=debug msg="Downloaded 976964cb3cc4 to tempfile /var/lib/docker/tmp/GetImageBlob3969790121"
Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.895160308+01:00" level=debug msg="pulling blob \"sha256:aca243fe15cb005835ad276cd207cf216d0964aec6ea161cbc73a58f73c88b5e\""
Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.929114792+01:00" level=info msg="Download failed, retrying (1/5): received unexpected HTTP status: 500 Internal Server Error"
Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.974716399+01:00" level=debug msg="Downloaded aca243fe15cb to tempfile /var/lib/docker/tmp/GetImageBlob1130057441"
Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.974820785+01:00" level=debug msg="pulling blob \"sha256:2c8382c3dc4de5dae3a6239e11f6d50e9fae63001f5dd1a68a1144618fc2dea2\""
Nov 16 08:28:03 oliskull dockerd[4479]: time="2022-11-16T08:28:03.826190269+01:00" level=debug msg="Downloaded 9b23cba192ec to tempfile /var/lib/docker/tmp/GetImageBlob943543718"
Nov 16 08:28:03 oliskull dockerd[4479]: time="2022-11-16T08:28:03.826260453+01:00" level=debug msg="pulling blob \"sha256:b9ff04336109416c29a33d2e8fb706394e1191755c678fac6408341924b72793\""
Nov 16 08:28:04 oliskull dockerd[4479]: time="2022-11-16T08:28:04.377105169+01:00" level=debug msg="Downloaded b9ff04336109 to tempfile /var/lib/docker/tmp/GetImageBlob3021174715"
Nov 16 08:28:04 oliskull dockerd[4479]: time="2022-11-16T08:28:04.377161281+01:00" level=debug msg="pulling blob \"sha256:507c6f5b4499de1e91632653addf146bca8dd2f9e6b638ead201e83fc97bf366\""
Nov 16 08:28:04 oliskull dockerd[4479]: time="2022-11-16T08:28:04.471737172+01:00" level=debug msg="Downloaded 507c6f5b4499 to tempfile /var/lib/docker/tmp/GetImageBlob296954574"
Nov 16 08:28:04 oliskull dockerd[4479]: time="2022-11-16T08:28:04.471805211+01:00" level=debug msg="pulling blob \"sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1\""
Nov 16 08:28:04 oliskull dockerd[4479]: time="2022-11-16T08:28:04.650626186+01:00" level=info msg="Download failed, retrying (1/5): received unexpected HTTP status: 500 Internal Server Error"
Nov 16 08:28:05 oliskull dockerd[4479]: time="2022-11-16T08:28:05.592367548+01:00" level=debug msg="Downloaded 2c8382c3dc4d to tempfile /var/lib/docker/tmp/GetImageBlob2548748740"
Nov 16 08:28:07 oliskull dockerd[4479]: time="2022-11-16T08:28:07.929789496+01:00" level=debug msg="pulling blob \"sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce\""
Nov 16 08:28:08 oliskull dockerd[4479]: time="2022-11-16T08:28:08.031011321+01:00" level=info msg="Download failed, retrying (2/5): received unexpected HTTP status: 500 Internal Server Error"
Nov 16 08:28:09 oliskull dockerd[4479]: time="2022-11-16T08:28:09.651064493+01:00" level=debug msg="pulling blob \"sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1\""
Nov 16 08:28:09 oliskull dockerd[4479]: time="2022-11-16T08:28:09.753537949+01:00" level=info msg="Download failed, retrying (2/5): received unexpected HTTP status: 500 Internal Server Error"
Nov 16 08:28:18 oliskull dockerd[4479]: time="2022-11-16T08:28:18.032175957+01:00" level=debug msg="pulling blob \"sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce\""
Nov 16 08:28:18 oliskull dockerd[4479]: time="2022-11-16T08:28:18.136759490+01:00" level=info msg="Download failed, retrying (3/5): received unexpected HTTP status: 500 Internal Server Error"
Nov 16 08:28:19 oliskull dockerd[4479]: time="2022-11-16T08:28:19.753637890+01:00" level=debug msg="pulling blob \"sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1\""
Nov 16 08:28:19 oliskull dockerd[4479]: time="2022-11-16T08:28:19.857011581+01:00" level=info msg="Download failed, retrying (3/5): received unexpected HTTP status: 500 Internal Server Error"
Nov 16 08:28:28 oliskull dockerd[4479]: time="2022-11-16T08:28:28.940802587+01:00" level=error msg="Not continuing with pull after error: context canceled"
According to `journalctl -u docker --since 2022-11-16` ([after activating debug](https://github.com/docker/cli/issues/336#issuecomment-317014956)), there is an error 500: ``` Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.587116167+01:00" level=debug msg="Calling POST /v1.41/images/create?fromImage=codeberg.org%2Fcodename%2Fcodename&tag=2.1.2" Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.594154921+01:00" level=debug msg="hostDir: /etc/docker/certs.d/codeberg.org" Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.594186647+01:00" level=debug msg="Trying to pull codeberg.org/codename/codename from https://codeberg.org v2" Nov 16 08:28:02 oliskull dockerd[4492]: time="2022-11-16T08:28:02.785803667+01:00" level=debug msg="(*service).Write started" expected="sha256:09dd9700022bd1beb0f7284006712113e50ee5c10ac0b0e0291a762a27e76b9c" ref="manifest-sha256:09dd9700022bd1beb0f7284006712113e50ee5c10ac0b0e0291a762a27e76b9c" total=1992 Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.813745865+01:00" level=debug msg="Pulling ref from V2 registry: codeberg.org/codename/codename:2.1.2" Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.814164764+01:00" level=debug msg="pulling blob \"sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce\"" Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.814170446+01:00" level=debug msg="pulling blob \"sha256:9b23cba192ecb9bbbfcdb85655f7a360cd9f911dfa8cedfb75126344b39a3ace\"" Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.814175367+01:00" level=debug msg="pulling blob \"sha256:976964cb3cc4d4acd39eff0efe66af10fcfc29b3cbc607f59040db9f1368507a\"" Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.895044190+01:00" level=debug msg="Downloaded 976964cb3cc4 to tempfile /var/lib/docker/tmp/GetImageBlob3969790121" Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.895160308+01:00" level=debug msg="pulling blob \"sha256:aca243fe15cb005835ad276cd207cf216d0964aec6ea161cbc73a58f73c88b5e\"" Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.929114792+01:00" level=info msg="Download failed, retrying (1/5): received unexpected HTTP status: 500 Internal Server Error" Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.974716399+01:00" level=debug msg="Downloaded aca243fe15cb to tempfile /var/lib/docker/tmp/GetImageBlob1130057441" Nov 16 08:28:02 oliskull dockerd[4479]: time="2022-11-16T08:28:02.974820785+01:00" level=debug msg="pulling blob \"sha256:2c8382c3dc4de5dae3a6239e11f6d50e9fae63001f5dd1a68a1144618fc2dea2\"" Nov 16 08:28:03 oliskull dockerd[4479]: time="2022-11-16T08:28:03.826190269+01:00" level=debug msg="Downloaded 9b23cba192ec to tempfile /var/lib/docker/tmp/GetImageBlob943543718" Nov 16 08:28:03 oliskull dockerd[4479]: time="2022-11-16T08:28:03.826260453+01:00" level=debug msg="pulling blob \"sha256:b9ff04336109416c29a33d2e8fb706394e1191755c678fac6408341924b72793\"" Nov 16 08:28:04 oliskull dockerd[4479]: time="2022-11-16T08:28:04.377105169+01:00" level=debug msg="Downloaded b9ff04336109 to tempfile /var/lib/docker/tmp/GetImageBlob3021174715" Nov 16 08:28:04 oliskull dockerd[4479]: time="2022-11-16T08:28:04.377161281+01:00" level=debug msg="pulling blob \"sha256:507c6f5b4499de1e91632653addf146bca8dd2f9e6b638ead201e83fc97bf366\"" Nov 16 08:28:04 oliskull dockerd[4479]: time="2022-11-16T08:28:04.471737172+01:00" level=debug msg="Downloaded 507c6f5b4499 to tempfile /var/lib/docker/tmp/GetImageBlob296954574" Nov 16 08:28:04 oliskull dockerd[4479]: time="2022-11-16T08:28:04.471805211+01:00" level=debug msg="pulling blob \"sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1\"" Nov 16 08:28:04 oliskull dockerd[4479]: time="2022-11-16T08:28:04.650626186+01:00" level=info msg="Download failed, retrying (1/5): received unexpected HTTP status: 500 Internal Server Error" Nov 16 08:28:05 oliskull dockerd[4479]: time="2022-11-16T08:28:05.592367548+01:00" level=debug msg="Downloaded 2c8382c3dc4d to tempfile /var/lib/docker/tmp/GetImageBlob2548748740" Nov 16 08:28:07 oliskull dockerd[4479]: time="2022-11-16T08:28:07.929789496+01:00" level=debug msg="pulling blob \"sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce\"" Nov 16 08:28:08 oliskull dockerd[4479]: time="2022-11-16T08:28:08.031011321+01:00" level=info msg="Download failed, retrying (2/5): received unexpected HTTP status: 500 Internal Server Error" Nov 16 08:28:09 oliskull dockerd[4479]: time="2022-11-16T08:28:09.651064493+01:00" level=debug msg="pulling blob \"sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1\"" Nov 16 08:28:09 oliskull dockerd[4479]: time="2022-11-16T08:28:09.753537949+01:00" level=info msg="Download failed, retrying (2/5): received unexpected HTTP status: 500 Internal Server Error" Nov 16 08:28:18 oliskull dockerd[4479]: time="2022-11-16T08:28:18.032175957+01:00" level=debug msg="pulling blob \"sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce\"" Nov 16 08:28:18 oliskull dockerd[4479]: time="2022-11-16T08:28:18.136759490+01:00" level=info msg="Download failed, retrying (3/5): received unexpected HTTP status: 500 Internal Server Error" Nov 16 08:28:19 oliskull dockerd[4479]: time="2022-11-16T08:28:19.753637890+01:00" level=debug msg="pulling blob \"sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1\"" Nov 16 08:28:19 oliskull dockerd[4479]: time="2022-11-16T08:28:19.857011581+01:00" level=info msg="Download failed, retrying (3/5): received unexpected HTTP status: 500 Internal Server Error" Nov 16 08:28:28 oliskull dockerd[4479]: time="2022-11-16T08:28:28.940802587+01:00" level=error msg="Not continuing with pull after error: context canceled" ```

Debugging with https://gist.github.com/nithu0115/5a10dbb647c7457f1f75d3ad6681c982, the offending request is the following:

https://codeberg.org/v2/codename/codename/blobs/sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce (needs an authorization header)

> GET /v2/codename/codename/blobs/sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce HTTP/2
> Host: codeberg.org
> user-agent: curl/7.86.0
> accept: */*
> authorization: Bearer ...
Debugging with https://gist.github.com/nithu0115/5a10dbb647c7457f1f75d3ad6681c982, the offending request is the following: https://codeberg.org/v2/codename/codename/blobs/sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce (needs an `authorization` header) ``` > GET /v2/codename/codename/blobs/sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce HTTP/2 > Host: codeberg.org > user-agent: curl/7.86.0 > accept: */* > authorization: Bearer ... ```
Owner

In the past, we had the issue that common containers are cached, but get broken. This was a Gitea bug IIRC, and I'm not sure if it was fixed.

The thread is a little messy, because it was only reported in another thread: Start reading here #745

In the past, we had the issue that common containers are cached, but get broken. This was a Gitea bug IIRC, and I'm not sure if it was fixed. The thread is a little messy, because it was only reported in another thread: Start reading here https://codeberg.org/Codeberg/Community/issues/745#issuecomment-648589
Owner

... I cannot explain this problem, but my suspicion is still that shared container layers also only share one db record. If a user deletes one, it's no longer accessible, because reference counting is missing. If one package has restricted access rights, no other user can reuse the base image, because the db record for that layer restricts its access.

I hope I'm wrong, because this would mean that the container registry on Gitea is pretty broken. On the other hand, I know that there were many many bugs reported with it, so it wouldn't really surprise me.

Can someone with more knowledge of the codebase or with more time confirm, e.g. by uploading two containers with a single base, then removing the first? Or playing with access rights?

... I cannot explain this problem, but my suspicion is still that shared container layers also only share one db record. If a user deletes one, it's no longer accessible, because reference counting is missing. If one package has restricted access rights, no other user can reuse the base image, because the db record for that layer restricts its access. I hope I'm wrong, because this would mean that the container registry on Gitea is pretty broken. On the other hand, I know that there were many many bugs reported with it, so it wouldn't really surprise me. Can someone with more knowledge of the codebase or with more time confirm, e.g. by uploading two containers with a single base, then removing the first? Or playing with access rights?
Collaborator

That theory sounds a bit unlikely to me. When I build from the same docker file locally and push it to gitea the image works. When I let it build via woodpecker it's broken. (And it's broken instantly.)

That theory sounds a bit unlikely to me. When I build from the same docker file locally and push it to gitea the image works. When I let it build via woodpecker it's broken. (And it's broken instantly.)
Poster

It would be useful to have the output in both cases for debugging, if at all possible? Maybe all layers SHA are different from any existing layers when you push from your local build? While some layers SHA may be identical to existing ones when it happens in woodpecker?

Just a thought...

It would be useful to have the output in both cases for debugging, if at all possible? Maybe **all** layers SHA are different from any existing layers when you push from your local build? While some layers SHA may be identical to existing ones when it happens in woodpecker? Just a thought...
Collaborator

Maybe, but sounds highly unlikely, doesn't it? Both use the same debian python base image at least.

I can try getting the layer Shas for both cases.

Maybe, but sounds highly unlikely, doesn't it? Both use the same debian python base image at least. I can try getting the layer Shas for both cases.
Poster

I confess that I'm indeed speculating 😄

The Gitea server logs for the 500 error would provide much more valuable information.

I confess that I'm indeed speculating 😄 The Gitea server logs for the 500 error would provide much more valuable information.

I should have refreshed this page to save some time.
Yes, https://codeberg.org/v2/codename/codename/blobs/sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce leeds to a 500 error. The Gitea log may show the reason.

... I cannot explain this problem, but my suspicion is still that shared container layers also only share one db record. If a user deletes one, it's no longer accessible, because reference counting is missing. If one package has restricted access rights, no other user can reuse the base image, because the db record for that layer restricts its access.

The hiearchy is
Package 1 : n PackageVersion 1 : n PackageFile 1 : 1 PackageBlob
Multiple PackageFiles can share the same PackageBlob. PackageBlobs are only deleted if there is no referenced PackageFile.

I should have refreshed this page to save some time. Yes, https://codeberg.org/v2/codename/codename/blobs/sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce leeds to a 500 error. The Gitea log may show the reason. > ... I cannot explain this problem, but my suspicion is still that shared container layers also only share one db record. If a user deletes one, it's no longer accessible, because reference counting is missing. If one package has restricted access rights, no other user can reuse the base image, because the db record for that layer restricts its access. The hiearchy is Package 1 : n PackageVersion 1 : n PackageFile 1 : 1 PackageBlob Multiple PackageFiles can share the same PackageBlob. PackageBlobs are only deleted if there is no referenced PackageFile.
Collaborator

While I had this problem initially, this was the log of the 500 error:

2022/10/16 20:18:50 ...uters/common/repo.go:38:func1() [E] [634c4b0a-29] ServeBlob: Close: io: read/write on closed pipe
2022/10/16 20:19:06 ...ntainer/container.go:84:apiError() [E] [634c4b1a-53] open /mnt/ceph-cluster/git/data/packages/4f/4f/4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: no such file or directory
2022/10/16 20:19:06 ...ntainer/container.go:84:apiError() [E] [634c4b1a-54] open /mnt/ceph-cluster/git/data/packages/4f/4f/4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: no such file or directory
2022/10/16 20:19:08 ...ntainer/container.go:416:GetBlob() [E] [634c4b1a-37] Error whilst copying content to response: write tcp 127.0.0.1:3000->127.0.0.1:58740: write: broken pipe
2022/10/16 20:19:13 ...uters/common/repo.go:38:func1() [E] [634c4b21-32] ServeBlob: Close: io: read/write on closed pipe
2022/10/16 20:19:13 ...uters/common/repo.go:38:func1() [E] [634c4b21-51] ServeBlob: Close: io: read/write on closed pipe
2022/10/16 20:19:24 ...ntainer/container.go:84:apiError() [E] [634c4b2c-42] open /mnt/ceph-cluster/git/data/packages/4f/4f/4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: no such file or directory
2022/10/16 20:19:24 ...ntainer/container.go:84:apiError() [E] [634c4b2c-48] open /mnt/ceph-cluster/git/data/packages/4f/4f/4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: no such file or directory

Which I'm not sure how helpful that is. This sounds like it's in a broken state and this is just the result of that.

While I had this problem initially, this was the log of the 500 error: ``` 2022/10/16 20:18:50 ...uters/common/repo.go:38:func1() [E] [634c4b0a-29] ServeBlob: Close: io: read/write on closed pipe 2022/10/16 20:19:06 ...ntainer/container.go:84:apiError() [E] [634c4b1a-53] open /mnt/ceph-cluster/git/data/packages/4f/4f/4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: no such file or directory 2022/10/16 20:19:06 ...ntainer/container.go:84:apiError() [E] [634c4b1a-54] open /mnt/ceph-cluster/git/data/packages/4f/4f/4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: no such file or directory 2022/10/16 20:19:08 ...ntainer/container.go:416:GetBlob() [E] [634c4b1a-37] Error whilst copying content to response: write tcp 127.0.0.1:3000->127.0.0.1:58740: write: broken pipe 2022/10/16 20:19:13 ...uters/common/repo.go:38:func1() [E] [634c4b21-32] ServeBlob: Close: io: read/write on closed pipe 2022/10/16 20:19:13 ...uters/common/repo.go:38:func1() [E] [634c4b21-51] ServeBlob: Close: io: read/write on closed pipe 2022/10/16 20:19:24 ...ntainer/container.go:84:apiError() [E] [634c4b2c-42] open /mnt/ceph-cluster/git/data/packages/4f/4f/4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: no such file or directory 2022/10/16 20:19:24 ...ntainer/container.go:84:apiError() [E] [634c4b2c-48] open /mnt/ceph-cluster/git/data/packages/4f/4f/4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: no such file or directory ``` Which I'm not sure how helpful that is. This sounds like it's in a broken state and this is just the result of that.

There is a problem with parallel uploads of the same layer I have not found a solution yet. https://github.com/go-gitea/gitea/issues/19586

The problem should be gone if you limit the upload with max-concurrent-uploads.

There is a problem with parallel uploads of the same layer I have not found a solution yet. https://github.com/go-gitea/gitea/issues/19586 The problem should be gone if you limit the upload with [`max-concurrent-uploads`](https://docs.docker.com/engine/reference/commandline/dockerd/).
Poster

File an issue against plugin-docker-buildx to mitigate the issue as suggested.

File [an issue against plugin-docker-buildx](https://codeberg.org/woodpecker-plugins/plugin-docker-buildx/issues/42) to mitigate the issue as suggested.
Poster

@KN4CK3R as @Bubu noticed, setting max-concurrent-uploads does not appear to always solve the problem, although it sometimes does. Do you have a suggested reliable workaround while a proper fix/diagnostic is being worked on?

@KN4CK3R as @Bubu noticed, setting `max-concurrent-uploads` does not appear to always solve the problem, although it sometimes does. Do you have a suggested reliable workaround while a proper fix/diagnostic is being worked on?
Poster

@KN4CK3R for input and hopefully ideas on the root cause, I did the following:

  • set /etc/docker/daemon.json on my laptop with { "max-concurrent-uploads": 1 }
  • systemctl restart docker
  • docker push codeberg.org/forgejo/forgejo:1.18.0-rc0

I could see the layers being uploaded one after the other and not in parallel.

$ docker push codeberg.org/forgejo/forgejo:1.18.0-rc0
The push refers to repository [codeberg.org/forgejo/forgejo]
6356b893a0b3: Pushed 
b2e1ea320493: Pushed 
2fa961a0ac4b: Pushed 
48613f6e153e: Pushed 
f95d5510106c: Pushed 
578f9fb06180: Pushed 
05e092db705f: Pushed 
e5e13b0c77cb: Pushed 
1.18.0-rc0: digest: sha256:023d19dfff53a8a608bf1076ed6ff04e55ba91476cc185937dc520b7ec27eadb size: 1999

When trying to pull I got the following:

$ docker image prune --all
$ docker images --all
REPOSITORY   TAG       IMAGE ID   CREATED   SIZE
$ docker pull codeberg.org/forgejo/forgejo:1.18.0-rc0
1.18.0-rc0: Pulling from forgejo/forgejo
ca7dd9ec2225: Retrying in 2 seconds 
1ab0a22b7948: Download complete 
2610385728d3: Download complete 
5a60798e3989: Download complete 
a47f8476c66e: Download complete 
2c23285999fc: Download complete 
6a56d7cda38d: Download complete 
b869862c3ed8: Download complete 
@KN4CK3R for input and hopefully ideas on the root cause, I did the following: * set `/etc/docker/daemon.json` on my laptop with `{ "max-concurrent-uploads": 1 }` * `systemctl restart docker` * `docker push codeberg.org/forgejo/forgejo:1.18.0-rc0` I could see the layers being uploaded one after the other and not in parallel. ``` $ docker push codeberg.org/forgejo/forgejo:1.18.0-rc0 The push refers to repository [codeberg.org/forgejo/forgejo] 6356b893a0b3: Pushed b2e1ea320493: Pushed 2fa961a0ac4b: Pushed 48613f6e153e: Pushed f95d5510106c: Pushed 578f9fb06180: Pushed 05e092db705f: Pushed e5e13b0c77cb: Pushed 1.18.0-rc0: digest: sha256:023d19dfff53a8a608bf1076ed6ff04e55ba91476cc185937dc520b7ec27eadb size: 1999 ``` When trying to pull I got the following: ``` $ docker image prune --all $ docker images --all REPOSITORY TAG IMAGE ID CREATED SIZE $ docker pull codeberg.org/forgejo/forgejo:1.18.0-rc0 1.18.0-rc0: Pulling from forgejo/forgejo ca7dd9ec2225: Retrying in 2 seconds 1ab0a22b7948: Download complete 2610385728d3: Download complete 5a60798e3989: Download complete a47f8476c66e: Download complete 2c23285999fc: Download complete 6a56d7cda38d: Download complete b869862c3ed8: Download complete ```
Poster

The layer that fails is ca7dd9ec2225 which comes from docker.io/library/golang:

docker build --disable-content-trust=false -t gitea/gitea:latest .
Sending build context to Docker daemon  313.2MB
Step 1/27 : FROM golang@sha256:d171aa333fb386089206252503bc6ab545072670e0286e3d1bbc644362825c6e AS build-env
docker.io/library/golang@sha256:d171aa333fb386089206252503bc6ab545072670e0286e3d1bbc644362825c6e: Pulling from library/golang
ca7dd9ec2225: Pull complete 
c41ae7ad2b39: Pull complete 
15b491bbc2fe: Pull complete 
8cce5c603bfa: Pull complete 
Digest: sha256:d171aa333fb386089206252503bc6ab545072670e0286e3d1bbc644362825c6e
Status: Downloaded newer image for golang@sha256:d171aa333fb386089206252503bc6ab545072670e0286e3d1bbc644362825c6e
 ---> d2745094e3b1
Step 2/27 : ARG GOPROXY
...

Which suggests all container images in codeberg.org that depend on docker.io/library/golang will fail in the same way.

The layer that fails is `ca7dd9ec2225` which comes from `docker.io/library/golang`: ``` docker build --disable-content-trust=false -t gitea/gitea:latest . Sending build context to Docker daemon 313.2MB Step 1/27 : FROM golang@sha256:d171aa333fb386089206252503bc6ab545072670e0286e3d1bbc644362825c6e AS build-env docker.io/library/golang@sha256:d171aa333fb386089206252503bc6ab545072670e0286e3d1bbc644362825c6e: Pulling from library/golang ca7dd9ec2225: Pull complete c41ae7ad2b39: Pull complete 15b491bbc2fe: Pull complete 8cce5c603bfa: Pull complete Digest: sha256:d171aa333fb386089206252503bc6ab545072670e0286e3d1bbc644362825c6e Status: Downloaded newer image for golang@sha256:d171aa333fb386089206252503bc6ab545072670e0286e3d1bbc644362825c6e ---> d2745094e3b1 Step 2/27 : ARG GOPROXY ... ``` Which suggests all container images in codeberg.org that depend on `docker.io/library/golang` will fail in the same way.
Poster
See also the [associated conversation in Gitea general](https://matrix.to/#/!bXIgKPtRdSXHBnVBaz:osgeo.org/$A96sg8kPuMrY86RkY5igWnCcM2Yh2l084BWWNDwRcuk?via=libera.chat&via=t2bot.io&via=matrix.org).
Poster

More information in #803 which is a minimal reproducer.

More information in https://codeberg.org/Codeberg/Community/issues/803 which is a minimal reproducer.

I will add a PR later which prevents the problem and allows to fix the existing failures.

I will add a PR later which prevents the problem and allows to fix the existing failures.
Poster

@KN4CK3R great. I must admit that I've learned a few things about docker images while chasing this. I'll try to understand where the ca7dd9ec2225 layer comes from to complete my education.

Update: it is derived from the checksum of the tarbal in the base image.

@KN4CK3R great. I must admit that I've learned a few things about docker images while chasing this. I'll try to understand where the `ca7dd9ec2225` layer comes from to complete my education. Update: it is derived from the checksum of the tarbal in the base image.
Poster

Workaround used for forgejo at forgejo/forgejo#2 for inspiration.

Workaround used for forgejo at https://codeberg.org/forgejo/forgejo/issues/2#issuecomment-690018 for inspiration.
https://github.com/go-gitea/gitea/pull/21862
Poster

I would comment in the pull request if I had a GitHub account but I do not (yes, I sound like a broken record, my apologies for that). Here it is:

The workaround is going to be relevant for as long as there is a possibility that the inconsistency exists in the instance from which the upgrade is done. If someone upgrades from 1.17.3 to 1.40.4 (you very well know that happens: people procrastinate 😄 ), they will need the workaround.

I would comment in the pull request if I had a GitHub account but I do not (yes, I sound like a broken record, my apologies for that). Here it is: The workaround is going to be relevant for as long as there is a possibility that the inconsistency exists in the instance from which the upgrade is done. If someone upgrades from 1.17.3 to 1.40.4 (you very well know that happens: people procrastinate 😄 ), they will need the workaround.
Poster

@6543 @fnetX to avoid the creation of new inconsitent images while @KN4CK3R patch makes it to backport, it would be great if the docker daemon running behind ci.codeberg.org was configured with a max-concurrent-uploads to 1. It would immediately enable Woodpecker pipelines using plugin-docker-buildx to reliably publish new images.

echo '{"max-concurrent-uploads": 1}' > /etc/docker/daemon.json && systemctl restart docker

See also the chat log.

@6543 @fnetX to avoid the creation of new inconsitent images while @KN4CK3R patch makes it to backport, it would be great if the docker daemon running behind ci.codeberg.org was configured with a max-concurrent-uploads to 1. It would immediately enable Woodpecker pipelines using [plugin-docker-buildx](https://codeberg.org/woodpecker-plugins/plugin-docker-buildx) to reliably publish new images. `echo '{"max-concurrent-uploads": 1}' > /etc/docker/daemon.json && systemctl restart docker` See also [the chat log](https://matrix.to/#/!LLCLqwCmFazlBXIXjT:obermui.de/$FPlhNksTHBNVqbIqrXW8R7lPc9gvkpA9YP6O2dpge-0?via=freeyourgadget.org&via=matrix.org&via=tchncs.de).
Poster
@6543 [applied the setting](https://matrix.to/#/!LLCLqwCmFazlBXIXjT:obermui.de/$zbl3aUIRGeeknzla5wIt0YUelSRg9FJPwK-PX6l2sbo?via=freeyourgadget.org&via=matrix.org&via=tchncs.de) 🚀
Sign in to join this conversation.
No Milestone
No Assignees
5 Participants
Notifications
Due Date

No due date set.

Reference: Codeberg/Community#800
Loading…
There is no content yet.