During the Codeberg hackathon, I was made aware that this idea was only propagated via voice and never actually written down, so let's do it now:
We had the idea to create a bot that allows users of Codeberg (or other Gitea instances) to create issues and PRs on proprietary forges. While Gitea is working on federation, it's unlikely that they'll participate soon.
fork an upstream project to a repo on Gitea instance
create an issue or PR there (local branches etc)
mention @codebergforgeforwarderbotservicethingy or some shorter more intuitive name
specify metadata in a machine-readable format (e.g. just put the URL of the upstream project after the mention)
the pull request is forwarded upstream using a codeberg-owned user / bot with the content created downstream, and ideally a nice description alike "Hey, some people don't want to join your proprietary walled garden. You normally miss contributions from those people, but we were so nice to forward them to you here ..." (let's fine-tune this later :D)
content is bidirectionally mirrored, mentions of the codeberg bot are turned into mentions of the downstream user, labels, comments etc are mirrored, too
Note that this approach shifts the responsibility of allowing more submissions from the projects (set up mirror etc) to the user (quickly contribute to any project from your home instance without the need to permanently mirror the whole project)
During the Codeberg hackathon, I was made aware that this idea was only propagated via voice and never actually written down, so let's do it now:
We had the idea to create a bot that allows users of Codeberg (or other Gitea instances) to create issues and PRs on proprietary forges. While Gitea is working on federation, it's unlikely that they'll participate soon.
- fork an upstream project to a repo on Gitea instance
- create an issue or PR there (local branches etc)
- mention @codebergforgeforwarderbotservicethingy or some shorter more intuitive name
- specify metadata in a machine-readable format (e.g. just put the URL of the upstream project after the mention)
- the pull request is forwarded upstream using a codeberg-owned user / bot with the content created downstream, and ideally a nice description alike "Hey, some people don't want to join your proprietary walled garden. You normally miss contributions from those people, but we were so nice to forward them to you here ..." (let's fine-tune this later :D)
- content is bidirectionally mirrored, mentions of the codeberg bot are turned into mentions of the downstream user, labels, comments etc are mirrored, too
Note that this approach shifts the responsibility of allowing more submissions from the projects (set up mirror etc) to the user (quickly contribute to any project from your home instance without the need to permanently mirror the whole project)
It is a great idea and, IMHO, complementary to the long term efforts towards federation. Its main benefit would be to spread the idea that remote pull requests are possible and desirable with actual examples.
In the context of the https://forgefriends.org project we did something similar but with a human being instead of a bot. Here are a few examples:
There are more at singuliere and realaravinth and it documents the problems and solutions that were found.
It is a great idea and, IMHO, complementary to the long term efforts towards federation. Its main benefit would be to spread the idea that **remote pull requests are possible and desirable with actual examples**.
In the context of the https://forgefriends.org project we did something similar but with a human being instead of a bot. Here are a few examples:
* https://github.com/go-gitea/gitea/pull/18124 & https://lab.forgefriends.org/forgefriends/forgefriends/-/merge_requests/30
* https://github.com/go-gitea/gitea/pull/18203 & https://lab.forgefriends.org/forgefriends/forgefriends/-/merge_requests/32
There are more at [singuliere](https://github.com/go-gitea/gitea/pulls?q=is%3Apr+author%3Asinguliere+is%3Aclosed) and [realaravinth](https://github.com/go-gitea/gitea/pulls?q=is%3Apr+author%3Arealaravinth+is%3Aclosed) and it documents the problems and solutions that were found.
If the developer issuing the pull request already has a ProprietaryForge account, the PR could be relayed on his/her behalf by the bot. There would not be a need for a shared / proxy account managed by the bot. Such a shared / proxied account is problematic because it would be complicated or sometime impossible for the user to claim ownership of their PR on ProprietaryForge at a later time.
If the developer issuing the pull request already has a **ProprietaryForge** account, the PR could be relayed on his/her behalf by the bot. There would not be a need for a shared / proxy account managed by the bot. Such a shared / proxied account is problematic because it would be complicated or sometime impossible for the user to claim ownership of their PR on **ProprietaryForge** at a later time.
fnetX
changed title from Contribute to proprietary forges to Contribute to projects on proprietary forges1 year ago
If the developer issuing the pull request already has a ProprietaryForge account, the PR could be relayed on his/her behalf by the bot. There would not be a need for a shared / proxy account managed by the bot.
Such a bot would actually work much better if implemented client side (i.e., from JavaScript in the browser). GitHub, GitLab, BitBucket and gitea all have a REST API that should be rich enough to support implementing this.
Such a shared / proxied account is problematic because it would be complicated or sometime impossible for the user to claim ownership of their PR on ProprietaryForge at a later time.
You could always sign your git commits.
> If the developer issuing the pull request already has a **ProprietaryForge** account, the PR could be relayed on his/her behalf by the bot. There would not be a need for a shared / proxy account managed by the bot.
Such a bot would actually work much better if implemented client side (i.e., from JavaScript in the browser). GitHub, GitLab, BitBucket and gitea all have a REST API that should be rich enough to support implementing this.
> Such a shared / proxied account is problematic because it would be complicated or sometime impossible for the user to claim ownership of their PR on **ProprietaryForge** at a later time.
You could always sign your git commits.
I would like to mention that ForgeFlux(disclosure: I'm one of the core devs) is involved in implementing software forge federation external to the forge. Implementing federation for Gitea, Sourcehut, GitLab and GitHub is part of the project's objective --- very similar to the requirements mentioned by @fnetX.
We felt that native federation on software forges will take time to implement as all forge developers will have to be convinced to participate. But implementing federation entirely using the forge's API can be implemented independently without the forge developers' involvement.
So far, we are close to federating issues on Gitea(Gitea issues can be viewed on compatible ActivityPub implementations like Pelorma) and we plan to start work on GitHub as soon as we get Gitea working.
User experience wise, foreign users will be able to send contributions via a bot account on the native forge and foreign repositories will be synchronized under a bot account namespace in the local forge.
Federated contribution workflow
So if a foreign contributor is interested in contributing to a project in another forge, the following process will take place:
Contributor requests synchronization of upstream repository
ForgeFlux software synchronizes upstream repository under a bot account
Contributor forks and creates PR against synchronized, bot account owned repository
ForgeFlux will synchronize the PR on upstream forge
If upstream maintainers give feedback on PR, ForgeFlux will synchronize comments
If PR is good and upstream maintainers satisfied, they can merge the PR. ForgeFlux will close the contributor's PR on the contributor's forge, do git pull upstream and update bot-owned repository on the contributor forge
I would like to mention that [ForgeFlux](https://forgeflux.org)_(disclosure: I'm one of the core devs)_ is involved in implementing software forge federation external to the forge. Implementing federation for Gitea, Sourcehut, GitLab and GitHub is part of the project's objective --- very similar to the requirements mentioned by @fnetX.
We felt that native federation on software forges will take time to implement as all forge developers will have to be convinced to participate. But implementing federation entirely using the forge's API can be implemented independently without the forge developers' involvement.
So far, we are close to federating issues on Gitea(Gitea issues can be viewed on compatible ActivityPub implementations like Pelorma) and we plan to start work on GitHub as soon as we get Gitea working.
User experience wise, foreign users will be able to send contributions via a bot account on the native forge and foreign repositories will be synchronized under a bot account namespace in the local forge.
## Federated contribution workflow
So if a foreign contributor is interested in contributing to a project in another forge, the following process will take place:
1. Contributor requests synchronization of upstream repository
2. ForgeFlux software synchronizes upstream repository under a bot account
3. Contributor forks and creates PR against synchronized, bot account owned repository
4. ForgeFlux will synchronize the PR on upstream forge
5. If upstream maintainers give feedback on PR, ForgeFlux will synchronize comments
6. If PR is good and upstream maintainers satisfied, they can merge the PR. ForgeFlux will close the contributor's PR on the contributor's forge, do `git pull upstream` and update bot-owned repository on the contributor forge
Well, if synchronize means synchronizing everything (e.g. issues, PR's etc) even if not related to the change, this adds a lot of overhead. Big projects are already trapped because they can't migrate within their API limits (e.g. Gitea), and fully migrating all issues with attachments etc takes ages (+ eats up precious storage). So this won't scale, and we'll surely go for a lighter approach as described in my comment above.
Well, if synchronize means synchronizing everything (e.g. issues, PR's etc) even if not related to the change, this adds a lot of overhead. Big projects are already trapped because they can't migrate within their API limits (e.g. Gitea), and fully migrating all issues with attachments etc takes ages (+ eats up precious storage). So this won't scale, and we'll surely go for a lighter approach as described in my comment above.
I agree the lighter, specialized approach that you propose makes sense. But I'd be curious to know what makes you think fully synchronizing projects between forges does not scale or takes too long.
I do understand the problem is real for organizations providing forge hosting for free, of course: it is very easy for a user to consume massive amounts of disks by mirroring very large projects. But that's different for people self-hosting and I'm under the impression that mirroring projects with updates every ten minutes is already a very popular feature.
Issues, pull requests etc. could be downloaded from a repository that contains a representation in an open format (which is what I'm working on with the Friendly Forge Format) and the forge could update its database only with the latest changes. Assuming this is available, do you think it won't scale either?
I agree the lighter, specialized approach that you propose makes sense. But I'd be curious to know what makes you think fully synchronizing projects between forges does not scale or takes too long.
I do understand the problem is real for organizations providing forge hosting for free, of course: it is very easy for a user to consume massive amounts of disks by mirroring very large projects. But that's different for people self-hosting and I'm under the impression that mirroring projects with updates every ten minutes is already a very popular feature.
Issues, pull requests etc. could be downloaded from a repository that contains a representation in an open format (which is what I'm working on with the Friendly Forge Format) and the forge could update its database only with the latest changes. Assuming this is available, do you think it won't scale either?
Hmm, you are right that this might work if people self-host. But as described in my blog article about community maintenance, this still creates the cost: People still need to pay for their systems, and increased load on them also comes with increased cost.
If you host a representation of the repos somewhere else, you basically shift the cost issue to a third-party. Yes, fine if someone wants to take this, but still somewhat costly. The Codeberg database is in the Gigabyte-range for all the metadata, and we don't yet have super large projects on Codeberg. I assume we'll quickly scale this up once we mirror repositories. In order to make issues really browsable, you also need to mirror all of the attachments, and this isn't even yet included in my metadata calculation. So this will quickly take a lot of storage even for personal instances.
I can't find the exact numbers, but someone once calculated that Gitea is well above 100GB if you really migrate all the attachments, files (probably including releases).
Speaking of Gitea, as already mentioned, there's a big issue with hitting the API limits. Gitea didn't yet manage to fully migrate out. The largest recurring issue for user support on Codeberg (next to Pages configuration) are migration issues. I can't imagine that a third-party will successfully bear with the headache of providing the metadata of those repos and dealing with the provider's APIs for these large repos.
Last but not least, I do agree that the vendor lock in of big platforms is an issue, and that locking people out is even worse. So obviously it'd be cool if folks could also browse repositories from outside these platforms, when big platforms limit access. So there might be a use for your idea to fully mirror projects. However, I'd first focus on the "mirror one issue / PR at a time" feature, also because it sounds much easier to implement than what you propose. I hope to spread awareness and rather have more projects move out of the proprietary services, than working ages on a workaround for them.
Hmm, you are right that this might work if people self-host. But as described in my blog article about community maintenance, this still creates the cost: People still need to pay for their systems, and increased load on them also comes with increased cost.
If you host a representation of the repos somewhere else, you basically shift the cost issue to a third-party. Yes, fine if someone wants to take this, but still somewhat costly. The Codeberg database is in the Gigabyte-range for all the metadata, and we don't yet have super large projects on Codeberg. I assume we'll quickly scale this up once we mirror repositories. In order to make issues really browsable, you also need to mirror all of the attachments, and this isn't even yet included in my metadata calculation. So this will quickly take a lot of storage even for personal instances.
I can't find the exact numbers, but someone once calculated that Gitea is well above 100GB if you really migrate all the attachments, files (probably including releases).
Speaking of Gitea, as already mentioned, there's a big issue with hitting the API limits. Gitea didn't yet manage to fully migrate out. The largest recurring issue for user support on Codeberg (next to Pages configuration) are migration issues. I can't imagine that a third-party will successfully bear with the headache of providing the metadata of those repos and dealing with the provider's APIs for these large repos.
Last but not least, I do agree that the vendor lock in of big platforms is an issue, and that locking people out is even worse. So obviously it'd be cool if folks could also browse repositories from outside these platforms, when big platforms limit access. So there might be a use for your idea to fully mirror projects. However, I'd first focus on the "mirror one issue / PR at a time" feature, also because it sounds much easier to implement than what you propose. I hope to spread awareness and rather have more projects move out of the proprietary services, than working ages on a workaround for them.
Such a shared / proxied account is problematic because it would be complicated or sometime impossible for the user to claim ownership of their PR on ProprietaryForge at a later time.
AFAIK, github will give author to the account opening the PR when using sqush merge. Using merge or rebase will keep authors of each commit intact however. Just putting it out there, in case someone cares about that.
> Such a shared / proxied account is problematic because it would be complicated or sometime impossible for the user to claim ownership of their PR on ProprietaryForge at a later time.
AFAIK, github will give `author` to the account opening the PR when using `sqush` merge. Using `merge` or `rebase` will keep authors of each commit intact however. Just putting it out there, in case someone cares about that.
if synchronize means synchronizing everything (e.g. issues, PR's etc) even if not related to the change, this adds a lot of overhead. Big projects are already trapped because they can't migrate within their API limits (e.g. Gitea), and fully migrating all issues with attachments etc takes ages (+ eats up precious storage).
Synchronisation is on-demand. Fully synchronising state, as you mention, will exhaust the bot's API quota.
If a forge natively supports federation then it is possible to achieve abuse prevention via rate-limiting using HTTP signatures. With such forges, it is possible to set federated content caching rules to keep resource usage minimum.
But the problem with forges that don't/won't support federation is that we'll have to get around the rate-limts imposed by them, as you've already mentioned. In such cases, I think permanent storage is necessary.
There is also the UX issue, which I think will increase resource usage on the forge: if we are to use the native forge's user interfaces, then we'll have to store issues/PRs on the forge to mirror them.
AFAIK, github will give author to the account opening the PR when using sqush merge. Using merge or rebase will keep authors of each commit intact however. Just putting it out there, in case someone cares about that.
True. The PR creator(bot) will have privileges to control the interaction on the proprietary forge but:
Such a shared / proxied account is problematic because it would be complicated or sometime impossible for the user to claim ownership of their PR on ProprietaryForge at a later time
> if synchronize means synchronizing everything (e.g. issues, PR's etc) even if not related to the change, this adds a lot of overhead. Big projects are already trapped because they can't migrate within their API limits (e.g. Gitea), and fully migrating all issues with attachments etc takes ages (+ eats up precious storage).
[quoted from](https://codeberg.org/Codeberg/Community/issues/607#issuecomment-465271)
Synchronisation is on-demand. Fully synchronising state, as you mention, will exhaust the bot's API quota.
If a forge natively supports federation then it is possible to achieve abuse prevention via rate-limiting using HTTP signatures. With such forges, it is possible to set federated content caching rules to keep resource usage minimum.
But the problem with forges that don't/won't support federation is that we'll have to get around the rate-limts imposed by them, as you've already mentioned. In such cases, I think permanent storage is necessary.
There is also the UX issue, which I think will increase resource usage on the forge: if we are to use the native forge's user interfaces, then we'll have to store issues/PRs on the forge to mirror them.
> AFAIK, github will give author to the account opening the PR when using sqush merge. Using merge or rebase will keep authors of each commit intact however. Just putting it out there, in case someone cares about that.
[quoted from](https://codeberg.org/Codeberg/Community/issues/607#issuecomment-467074)
True. The PR creator(bot) will have privileges to control the interaction on the proprietary forge but:
> Such a shared / proxied account is problematic because it would be complicated or sometime impossible for the user to claim ownership of their PR on ProprietaryForge at a later time
[quoted from](https://codeberg.org/Codeberg/Community/issues/607#issuecomment-418608)
[Git has facilities to set commit author and committer](https://libgit2.org/libgit2/#HEAD/group/commit/git_commit_create) where the author will be the dev wishing to participate in the proprietary forge and the committer being the bot.
This is demonstrated in @dachary's [PR on the Gitea repository](https://github.com/go-gitea/gitea/pull/19935/commits/1a3567b43263a41c346ab6a51d580d9a8b835e44) that I proxied :)
During the Codeberg hackathon, I was made aware that this idea was only propagated via voice and never actually written down, so let's do it now:
We had the idea to create a bot that allows users of Codeberg (or other Gitea instances) to create issues and PRs on proprietary forges. While Gitea is working on federation, it's unlikely that they'll participate soon.
Note that this approach shifts the responsibility of allowing more submissions from the projects (set up mirror etc) to the user (quickly contribute to any project from your home instance without the need to permanently mirror the whole project)
Send PRs to proprietary forgesto Contribute to proprietary forges 1 year agoIt is a great idea and, IMHO, complementary to the long term efforts towards federation. Its main benefit would be to spread the idea that remote pull requests are possible and desirable with actual examples.
In the context of the https://forgefriends.org project we did something similar but with a human being instead of a bot. Here are a few examples:
There are more at singuliere and realaravinth and it documents the problems and solutions that were found.
If the developer issuing the pull request already has a ProprietaryForge account, the PR could be relayed on his/her behalf by the bot. There would not be a need for a shared / proxy account managed by the bot. Such a shared / proxied account is problematic because it would be complicated or sometime impossible for the user to claim ownership of their PR on ProprietaryForge at a later time.
Contribute to proprietary forgesto Contribute to projects on proprietary forges 1 year agoSuch a bot would actually work much better if implemented client side (i.e., from JavaScript in the browser). GitHub, GitLab, BitBucket and gitea all have a REST API that should be rich enough to support implementing this.
You could always sign your git commits.
I would like to mention that ForgeFlux(disclosure: I'm one of the core devs) is involved in implementing software forge federation external to the forge. Implementing federation for Gitea, Sourcehut, GitLab and GitHub is part of the project's objective --- very similar to the requirements mentioned by @fnetX.
We felt that native federation on software forges will take time to implement as all forge developers will have to be convinced to participate. But implementing federation entirely using the forge's API can be implemented independently without the forge developers' involvement.
So far, we are close to federating issues on Gitea(Gitea issues can be viewed on compatible ActivityPub implementations like Pelorma) and we plan to start work on GitHub as soon as we get Gitea working.
User experience wise, foreign users will be able to send contributions via a bot account on the native forge and foreign repositories will be synchronized under a bot account namespace in the local forge.
Federated contribution workflow
So if a foreign contributor is interested in contributing to a project in another forge, the following process will take place:
Contributor requests synchronization of upstream repository
ForgeFlux software synchronizes upstream repository under a bot account
Contributor forks and creates PR against synchronized, bot account owned repository
ForgeFlux will synchronize the PR on upstream forge
If upstream maintainers give feedback on PR, ForgeFlux will synchronize comments
If PR is good and upstream maintainers satisfied, they can merge the PR. ForgeFlux will close the contributor's PR on the contributor's forge, do
git pull upstream
and update bot-owned repository on the contributor forgeWell, if synchronize means synchronizing everything (e.g. issues, PR's etc) even if not related to the change, this adds a lot of overhead. Big projects are already trapped because they can't migrate within their API limits (e.g. Gitea), and fully migrating all issues with attachments etc takes ages (+ eats up precious storage). So this won't scale, and we'll surely go for a lighter approach as described in my comment above.
I agree the lighter, specialized approach that you propose makes sense. But I'd be curious to know what makes you think fully synchronizing projects between forges does not scale or takes too long.
I do understand the problem is real for organizations providing forge hosting for free, of course: it is very easy for a user to consume massive amounts of disks by mirroring very large projects. But that's different for people self-hosting and I'm under the impression that mirroring projects with updates every ten minutes is already a very popular feature.
Issues, pull requests etc. could be downloaded from a repository that contains a representation in an open format (which is what I'm working on with the Friendly Forge Format) and the forge could update its database only with the latest changes. Assuming this is available, do you think it won't scale either?
Hmm, you are right that this might work if people self-host. But as described in my blog article about community maintenance, this still creates the cost: People still need to pay for their systems, and increased load on them also comes with increased cost.
If you host a representation of the repos somewhere else, you basically shift the cost issue to a third-party. Yes, fine if someone wants to take this, but still somewhat costly. The Codeberg database is in the Gigabyte-range for all the metadata, and we don't yet have super large projects on Codeberg. I assume we'll quickly scale this up once we mirror repositories. In order to make issues really browsable, you also need to mirror all of the attachments, and this isn't even yet included in my metadata calculation. So this will quickly take a lot of storage even for personal instances.
I can't find the exact numbers, but someone once calculated that Gitea is well above 100GB if you really migrate all the attachments, files (probably including releases).
Speaking of Gitea, as already mentioned, there's a big issue with hitting the API limits. Gitea didn't yet manage to fully migrate out. The largest recurring issue for user support on Codeberg (next to Pages configuration) are migration issues. I can't imagine that a third-party will successfully bear with the headache of providing the metadata of those repos and dealing with the provider's APIs for these large repos.
Last but not least, I do agree that the vendor lock in of big platforms is an issue, and that locking people out is even worse. So obviously it'd be cool if folks could also browse repositories from outside these platforms, when big platforms limit access. So there might be a use for your idea to fully mirror projects. However, I'd first focus on the "mirror one issue / PR at a time" feature, also because it sounds much easier to implement than what you propose. I hope to spread awareness and rather have more projects move out of the proprietary services, than working ages on a workaround for them.
AFAIK, github will give
author
to the account opening the PR when usingsqush
merge. Usingmerge
orrebase
will keep authors of each commit intact however. Just putting it out there, in case someone cares about that.quoted from
Synchronisation is on-demand. Fully synchronising state, as you mention, will exhaust the bot's API quota.
If a forge natively supports federation then it is possible to achieve abuse prevention via rate-limiting using HTTP signatures. With such forges, it is possible to set federated content caching rules to keep resource usage minimum.
But the problem with forges that don't/won't support federation is that we'll have to get around the rate-limts imposed by them, as you've already mentioned. In such cases, I think permanent storage is necessary.
There is also the UX issue, which I think will increase resource usage on the forge: if we are to use the native forge's user interfaces, then we'll have to store issues/PRs on the forge to mirror them.
quoted from
True. The PR creator(bot) will have privileges to control the interaction on the proprietary forge but:
quoted from
Git has facilities to set commit author and committer where the author will be the dev wishing to participate in the proprietary forge and the committer being the bot.
This is demonstrated in @dachary's PR on the Gitea repository that I proxied :)