#48 Enable Libravatar support

Open
opened 3 months ago by strk · 16 comments
strk commented 3 months ago

So the system will honour my self-hosted avatar rather than giving me a digital drawing as in https://codeberg.org/strk

So the system will honour my self-hosted avatar rather than giving me a digital drawing as in https://codeberg.org/strk ...
ashimokawa commented 3 months ago
Owner

@strk

Do you know if it is possible to enable libravatar support without enabling gravatar.com support?

What I tried is to set:

DISABLE_GRAVATAR        = true
ENABLE_FEDERATED_AVATAR = true

as soon as GRAVATAR is enabled I see xhr requests to gravatar.com (as tested on codeberg-test.org)

@strk Do you know if it is possible to enable libravatar support without enabling gravatar.com support? What I tried is to set: ``` DISABLE_GRAVATAR = true ENABLE_FEDERATED_AVATAR = true ``` as soon as GRAVATAR is enabled I see xhr requests to gravatar.com (as tested on codeberg-test.org)
hw commented 3 months ago
Owner

[…] rather than giving me a digital drawing as in https://codeberg.org/strk

Have you tried uploading your picture on your personal profile settings page?

> [...] rather than giving me a digital drawing as in https://codeberg.org/strk … Have you tried uploading your picture on your personal profile settings page?
strk commented 3 months ago
Poster

This is what you want:

[picture]
DISABLE_GRAVATAR        = false
GRAVATAR_SOURCE         = https://seccdn.libravatar.org/avatar/
ENABLE_FEDERATED_AVATAR = true

Setting DISABLE_GRAVATAR = true, as you do, disables the whole concept of fetching avatars from somewhere externally to same-host. I know the naming is not really correct, but it is set to that for historical reasons.

This is what you want: ``` [picture] DISABLE_GRAVATAR = false GRAVATAR_SOURCE = https://seccdn.libravatar.org/avatar/ ENABLE_FEDERATED_AVATAR = true ``` Setting `DISABLE_GRAVATAR = true`, as you do, disables the whole concept of fetching avatars from somewhere externally to same-host. I know the naming is not really correct, but it is set to that for historical reasons.
ashimokawa commented 3 months ago
Owner

@strk That seems to work. I did it on codeberg-test.org

I am not really sure if this is what I want. Since this is federated - I think it might be better if we host our own libgravatar server/proxy (however that is called).

I like the fact that when browsing codeberg umatix only shows codeberg.org as a source and no third party service ;)

What do others think?

@hw ?

@strk That seems to work. I did it on codeberg-test.org I am not really sure if this is what *I* want. Since this is federated - I think it might be better if we host our own libgravatar server/proxy (however that is called). I like the fact that when browsing codeberg umatix only shows codeberg.org as a source and no third party service ;) What do others think? @hw ?
strk commented 3 months ago
Poster

Once you setup your own libravatar server you can change GRAVATAR_SOURCE but the important thing here is ENABLE_FEDERATED_AVATAR, which would not even hit the GRAVATAR_SOURCE IFF the domain for the user email is configured for Libravatar support.

Check how my avatar is directly served by my mail domain service

Once you setup your own libravatar server you can change `GRAVATAR_SOURCE` but the important thing here is `ENABLE_FEDERATED_AVATAR`, which would not even hit the `GRAVATAR_SOURCE` IFF the domain for the user email is configured for Libravatar support. Check how my avatar is directly served by my mail domain service
strk commented 3 months ago
Poster

See how it works (check with your browser debug tools) my avatar here: https://gitea.com/strk/

See how it works (check with your browser debug tools) my avatar here: https://gitea.com/strk/
ashimokawa commented 3 months ago
Owner

@strk on https://gitea.com/strk/ I saw that your avatar was served from avatars.kbt.io in my browser tools, after I allowed xhr requests to that domain.

I am really not sure how the federation is supposed to work, but it seems to me something that the avatar servers do - but not gitea itself. So running our own server seems best to me.

But then again, maybe I missed something…

Also I saw blocked calls to www.googletagmanager.com all over gitea.com - well .com meets .com - good match.

@strk on https://gitea.com/strk/ I saw that your avatar was served from avatars.kbt.io in my browser tools, after I allowed xhr requests to that domain. I am really not sure how the federation is supposed to work, but it seems to me something that the avatar servers do - but not gitea itself. So running our own server seems best to me. But then again, maybe I missed something... ... Also I saw blocked calls to www.googletagmanager.com all over gitea.com - well .com meets .com - good match.
ashimokawa commented 3 months ago
Owner

@strk Sorry, I missed that avatars.kbt.io is your domain.

Well, I know this is debatable, but making all codeberg users browsers do xhr requests to different servers (like eg.avatars.kbt.io) allows the server owners to collect IPs of codeberg users. I don’t think that this is good.

Is there something like a libravatar proxy we can host which does the federation - so codeberg users only communicate with codeberg, and their IP remains hidden to avatar servers?

@strk Sorry, I missed that avatars.kbt.io is your domain. Well, I know this is debatable, but making all codeberg users browsers do xhr requests to different servers (like eg.avatars.kbt.io) allows the server owners to collect IPs of codeberg users. I don't think that this is good. Is there something like a libravatar proxy we can host which does the federation - so codeberg users only communicate with codeberg, and their IP remains hidden to avatar servers?
hw commented 3 months ago
Owner

@strk : in my personal opinion XHR requests for every avatar display in repo history or even discover page display don’t seem right, as this implies significant abuse/tracking potential, which is clearly against our mission.

If avatar images were cached on disk (once loaded to profile picture and then refreshed or updated either on request or in intervals that don’t allow tracking), then I think enabling support is a no-brainer. Would you like to volunteer for such a patch that stores images to file to inhibit continuous tracking traffic to 3rdparty servers?

@strk : in my personal opinion XHR requests for every avatar display in repo history or even discover page display don't seem right, as this implies significant abuse/tracking potential, which is clearly against our mission. If avatar images were cached on disk (once loaded to profile picture and then refreshed or updated either on request or in intervals that don't allow tracking), then I think enabling support is a no-brainer. Would you like to volunteer for such a patch that stores images to file to inhibit continuous tracking traffic to 3rdparty servers?
strk commented 3 months ago
Poster

@ashimokawa yes you are right that avatar server owners will be able to track who views/fetches avatars from that server. In my case that’s me tracking who views/fetches my avatar (the only avatar on my server). And this ability is given to anyone in control of their email domain.

Those who cannot control their email domain can still upload an image to the Gitea server OR to the avatar server of Gitea server owner’s choice (I suggested libravatar.org servers, but you can of course install your own server, libre libravatar servers being available).

I’d like that users would also be able to directly specify an avatar URL in their profile settings, so to be able to decide where to get users to fetch their avatar from, but this is currently not available in Gitea (I might have submitted an issue for that)

@hw: to recap, the tracking potential is given to the user as a choice. Note that recent Libravatar server implementations do implement caching on disk the avatars which are looked up from the fallback (as contrary to those looked up from email-owner configured domain). Browser itself will usually cache those images. Again, we’re talking about 1st party server (the server of the user) rather than 3rd party server, when the user controls her mail domain. Just install libravatar server and you’ll become the 2nd party (for the user) serving avatars for whoever doesn’t use its own domain.

@ashimokawa yes you are right that avatar server owners will be able to track who views/fetches avatars from that server. In my case that's _me_ tracking who views/fetches _my_ avatar (the only avatar on my server). And this ability is given to anyone in control of their email domain. Those who cannot control their email domain can still upload an image to the Gitea server OR to the avatar server of Gitea server owner's choice (I suggested libravatar.org servers, but you can of course install your own server, libre libravatar servers being available). I'd like that users would also be able to directly specify an `avatar URL` in their profile settings, so to be able to decide where to get users to fetch their avatar from, but this is currently not available in Gitea (I might have submitted an issue for that) @hw: to recap, the tracking potential is given to the user as a choice. Note that recent Libravatar server implementations do implement caching on disk the avatars which are looked up from the fallback (as contrary to those looked up from email-owner configured domain). Browser itself will usually cache those images. Again, we're talking about 1st party server (the server of the user) rather than 3rd party server, when the user controls her mail domain. Just install libravatar server and you'll become the 2nd party (for the user) serving avatars for whoever doesn't use its own domain.
strk commented 3 months ago
Poster

In reply to:

Is there something like a libravatar proxy we can host which does the federation - so codeberg users only communicate with codeberg, and their IP remains hidden to avatar servers?

You can install a caching libravatar server with a fallback to libravatar.org instance. The libravatar client (gitea) would then:

1. See if user's domain asks to fetch avatar from a specific server, and use that one in an `<img>` tag (no `XHR`) sent to browser.
2. Failing the above, use a "local" server (on the same domain as gitea) in the `<img>` tag sent to browser.
3. The local server would see if an avatar for that email hash exists locally, if not, it will fetch one from the fallback (in your case it could be libravatar.org) and cache it locally, goto 2.
4. IFF libravatar.org would receive the fetch request from point 3, it will see if an avatar for that email hash exists locally (either as first-class citizen or cache) and serve it. Failing to find it locally it will fetch it from its own fallback (usually gravatar.com) and cache it locally, returning it to the requestor.

So in the worst case tracking would be as follows:

1. User's email name server would receive a request from Gitea server (max once a day) to check if an avatar server is configured
2. [failing the above] avatar.codeberg.org would receive a request from user browser for the avatar associated with an hash
3. [failing to find in cache) cdn.libravatar.org would receive a request from avatar.codeberg.org
4. [failing to find in cache) realserver.libravatar.org would receive a request from cdn.libravatar.org
5. [failing to find in cache) gravatar.com would receive a request from realserver.libravatar.org

At the end of the above pipeline, the user’s avatar will be found in step 2

In reply to: > Is there something like a libravatar proxy we can host which does the federation - so codeberg users only communicate with codeberg, and their IP remains hidden to avatar servers? You can install a caching libravatar server with a fallback to libravatar.org instance. The libravatar client (gitea) would then: 1. See if user's domain asks to fetch avatar from a specific server, and use that one in an `<img>` tag (no `XHR`) sent to browser. 2. Failing the above, use a "local" server (on the same domain as gitea) in the `<img>` tag sent to browser. 3. The local server would see if an avatar for that email hash exists locally, if not, it will fetch one from the fallback (in your case it could be libravatar.org) and cache it locally, goto 2. 4. IFF libravatar.org would receive the fetch request from point 3, it will see if an avatar for that email hash exists locally (either as first-class citizen or cache) and serve it. Failing to find it locally it will fetch it from its own fallback (usually gravatar.com) and cache it locally, returning it to the requestor. So in the worst case tracking would be as follows: 1. User's email name server would receive a request from Gitea server (max once a day) to check if an avatar server is configured 2. [failing the above] avatar.codeberg.org would receive a request from user browser for the avatar associated with an hash 3. [failing to find in cache) cdn.libravatar.org would receive a request from avatar.codeberg.org 4. [failing to find in cache) realserver.libravatar.org would receive a request from cdn.libravatar.org 5. [failing to find in cache) gravatar.com would receive a request from realserver.libravatar.org At the end of the above pipeline, the user's avatar will be found in step 2
strk commented 3 months ago
Poster
Useful pointer: https://github.com/go-gitea/gitea/issues/6046
ashimokawa commented 3 months ago
Owner

@strk

yes you are right that avatar server owners will be able to track who views/fetches avatars from that server. In my case that’s me tracking who views/fetches my avatar (the only avatar on my server). And this ability is given to anyone in control of their email domain.

[…]

to recap, the tracking potential is given to the user as a choice.

I have to disagree here. Avatars are fetched all the time when someone sees that user in a commit list, user list, issue comment, whatever. Basically an active codeberg user can track a big chunk of users IPs if we has control over his avatar url, even if they do not visit your profile directly. I do not think you have such intentions - but I want to highlight the abuse potential.

You can install a caching libravatar server with a fallback to libravatar.org instance. The libravatar client (gitea) would then:

1. See if user's domain asks to fetch avatar from a specific server, and use that one in an `<img>` tag (no `XHR`) sent to browser.
2. Failing the above, use a "local" server (on the same domain as gitea) in the `<img>` tag sent to browser.
3. The local server would see if an avatar for that email hash exists locally, if not, it will fetch one from the fallback (in your case it could be libravatar.org) and cache it locally, goto 2.
4. IFF libravatar.org would receive the fetch request from point 3, it will see if an avatar for that email hash exists locally (either as first-class citizen or cache) and serve it. Failing to find it locally it will fetch it from its own fallback (usually gravatar.com) and cache it locally, returning it to the requestor.

The problem lies in 1. If we can prevent that and proxy that though avatar.codeberg.org - I am convinced.

EDIT: To make that clear, I think it is okay if an avatar server owner sees anonymized codeberg request, but I do not think it is ok if an avatar server owner sees IP, browser, referer etc.

@strk > yes you are right that avatar server owners will be able to track who views/fetches avatars from that server. In my case that’s me tracking who views/fetches my avatar (the only avatar on my server). And this ability is given to anyone in control of their email domain. > [...] > to recap, the tracking potential is given to the user as a choice. I have to disagree here. Avatars are fetched all the time when someone sees that user in a commit list, user list, issue comment, whatever. Basically an active codeberg user can track a big chunk of users IPs if we has control over his avatar url, even if they do not visit your profile directly. I do not think you have such intentions - but I want to highlight the abuse potential. >You can install a caching libravatar server with a fallback to libravatar.org instance. The libravatar client (gitea) would then: 1. See if user's domain asks to fetch avatar from a specific server, and use that one in an `<img>` tag (no `XHR`) sent to browser. 2. Failing the above, use a "local" server (on the same domain as gitea) in the `<img>` tag sent to browser. 3. The local server would see if an avatar for that email hash exists locally, if not, it will fetch one from the fallback (in your case it could be libravatar.org) and cache it locally, goto 2. 4. IFF libravatar.org would receive the fetch request from point 3, it will see if an avatar for that email hash exists locally (either as first-class citizen or cache) and serve it. Failing to find it locally it will fetch it from its own fallback (usually gravatar.com) and cache it locally, returning it to the requestor. The problem lies in 1. If we can prevent that and proxy that though avatar.codeberg.org - I am convinced. EDIT: To make that clear, I think it is okay if an avatar server owner sees anonymized codeberg request, but I do not think it is ok if an avatar server owner sees IP, browser, referer etc.
strk commented 3 months ago
Poster

When any user visits a user list, the DNS lookup is done by the Gitea server, so the only IP received by the users email domain name server is the one of the Gitea server.

You are right that the actual avatar server (discovered via DNS lookup) will receive the IP of the fetcher, so of the fetcher is the browser, it will receive IP of the users. If you want to restrict the number of servers receiving the user’s IP then the best choice would be to have Gitea fetch the avatar and serve it (proxy it) instead of redirecting the browser there. The cost of that would be higher traffic and disk occupation.

Disabling “federated avatars” and simply specifying a local server for GRAVATAR_URL would prevent using user-determined servers because the gravatar API wants an hash, so never gets the user domain and is thus unable to do the DNS lookup, which is what I would like to have (don’t want to be forced to upload my avatar all over the world).

You can refer to the issue link I pasted before to track or tackle implementation of that kind of proxying in Gitea itself.

When any user visits a `user list`, the DNS lookup is done by the Gitea server, so the only IP received by the users email domain name server is the one of the Gitea server. You are right that the actual avatar server (discovered via DNS lookup) will receive the IP of the fetcher, so of the fetcher is the browser, it will receive IP of the users. If you want to restrict the number of servers receiving the user's IP then the best choice would be to have Gitea fetch the avatar and serve it (proxy it) instead of redirecting the browser there. The cost of that would be higher traffic and disk occupation. Disabling "federated avatars" and simply specifying a local server for `GRAVATAR_URL` would prevent using user-determined servers because the gravatar API wants an hash, so never gets the user domain and is thus unable to do the DNS lookup, which is what I would like to have (don't want to be forced to upload my avatar all over the world). You can refer to the issue link I pasted before to track or tackle implementation of that kind of proxying in Gitea itself.
hw commented 3 months ago
Owner

the best choice would be to have Gitea fetch the avatar and serve it

Yeah, that’s exactly the “caching and storing avatar on disk”-model above.

The cost of that would be higher traffic and disk occupation

Pretty much unch relative to “normal” user avatars, with the advantage that all data goes through the same HTTP connection (with keep-alive or HTTP2), so higher performance for all users?

You can refer to the issue link I pasted before to track or tackle implementation of that kind of proxying in Gitea itself

Yes, this is on point. If the current behaviour would be changed to store the avatar file on disk and serve it just like normal avatar pictures, pretty much all issues above would be solved(*)…

(*) one exception tho, Codeberg.org’s current ToS and privacy statement declare that no user data is shared with external services. As this is an opt-in, the disclaimer seems appropriate on the profile settings page right where a user is enabling the avatar service. Adjusting privacy statement and properly explaining in layman’s terms how this works seems appropriate.

We would also have to ensure that external services are technically blocked and unable to set user cookies (with the current clientside/browser-based XHR avatar model they can), as ePrivacy regulation (“Cookie directive”) allows only plain session cookies without explicit opt-in of users.

> the best choice would be to have Gitea fetch the avatar and serve it Yeah, that's exactly the "caching and storing avatar on disk"-model above. > The cost of that would be higher traffic and disk occupation Pretty much unch relative to "normal" user avatars, with the advantage that all data goes through the same HTTP connection (with keep-alive or HTTP2), so higher performance for all users? > You can refer to the issue link I pasted before to track or tackle implementation of that kind of proxying in Gitea itself Yes, this is on point. If the current behaviour would be changed to store the avatar file on disk and serve it just like normal avatar pictures, pretty much all issues above would be solved(*)... (*) one exception tho, Codeberg.org's current ToS and privacy statement declare that no user data is shared with external services. As this is an opt-in, the disclaimer seems appropriate on the profile settings page right where a user is enabling the avatar service. Adjusting privacy statement and properly explaining in layman's terms how this works seems appropriate. We would also have to ensure that external services are technically blocked and unable to set user cookies (with the current clientside/browser-based XHR avatar model they can), as ePrivacy regulation ("Cookie directive") allows only plain session cookies without explicit opt-in of users.
Ghost commented 3 months ago

as this implies significant abuse/tracking potential, which is clearly against our mission.

privacy statement declare that no user data is shared with external services.

+1. I will simply block external request if you add gravatar.

> as this implies significant abuse/tracking potential, which is clearly against our mission. > privacy statement declare that no user data is shared with external services. +1. I will simply block external request if you add gravatar.
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
Cancel
Save
There is no content yet.