Persistent "500 (Internal Server Error)" when creating PRs through API #1317

Open
opened 2023-10-16 18:12:47 +00:00 by pat-s · 6 comments

Comment

We've configured renovate through WP for the woodpecker-plugin org.

However, when running WP it always stops after a few repos with 500 errors from the API during PR creation.

Example: https://ci.codeberg.org/repos/12705/pipeline/6/3

PRs were created in some repos, so the workflow in general works and token scopes are correct. Is some rate limiting going on?
Or a proxy/LB which complains?
The error persisted across three runs and always appeared at the same repo (here https://codeberg.org/woodpecker-plugins/trivy).

Here's an excerpt of the error from above's build

             "message": "Response code 500 (Internal Server Error)",
             "stack": "HTTPError: Response code 500 (Internal Server Error)\n    at Request.<anonymous> (/opt/containerbase/tools/renovate/37.22.0/node_modules/got/dist/source/as-promise/index.js:118:42)\n    at processTicksAndRejections (node:internal/process/task_queues:95:5)",
             "options": {
               "headers": {
                 "user-agent": "RenovateBot/37.22.0 (https://github.com/renovatebot/renovate)",
                 "accept": "application/json",
                 "authorization": "***********",
                 "content-type": "application/json",
                 "content-length": "2806",
                 "accept-encoding": "gzip, deflate, br"
               },
               "url": "https://codeberg.org/api/v1/repos/woodpecker-plugins/trivy/pulls",
               "hostType": "gitea",
               "username": "",
               "password": "",
               "method": "POST",
               "http2": false
             },
             "response": {
               "statusCode": 500,
               "statusMessage": "Internal Server Error",
               "body": {"message": "", "url": "https://codeberg.org/api/swagger"},
               "headers": {
                 "cache-control": "max-age=0, private, must-revalidate, no-transform",
                 "content-type": "application/json;charset=utf-8",
                 "date": "Mon, 16 Oct 2023 17:09:07 GMT",
                 "content-length": "56",
                 "strict-transport-security": "max-age=63072000; includeSubDomains; preload",
                 "permissions-policy": "interest-cohort=()",
                 "x-frame-options": "sameorigin",
                 "x-content-type-options": "nosniff",
                 "content-security-policy-report-only": "default-src data: 'self' https://*.codeberg.org https://codeberg.org; script-src 'self' https://*.codeberg.org https://codeberg.org; style-src data: 'self' 'unsafe-inline' https://*.codeberg.org https://codeberg.org; img-src *; media-src *; object-src 'none'",
                 "connection": "close"
               },
               "httpVersion": "1.1",
               "retryCount": 0
             }
### Comment We've configured renovate through WP for the `woodpecker-plugin` org. However, when running WP it always stops after a few repos with 500 errors from the API during PR creation. Example: https://ci.codeberg.org/repos/12705/pipeline/6/3 PRs were created in some repos, so the workflow in general works and token scopes are correct. Is some rate limiting going on? Or a proxy/LB which complains? The error persisted across three runs and always appeared at the same repo (here https://codeberg.org/woodpecker-plugins/trivy). Here's an excerpt of the error from above's build ```yml "message": "Response code 500 (Internal Server Error)", "stack": "HTTPError: Response code 500 (Internal Server Error)\n at Request.<anonymous> (/opt/containerbase/tools/renovate/37.22.0/node_modules/got/dist/source/as-promise/index.js:118:42)\n at processTicksAndRejections (node:internal/process/task_queues:95:5)", "options": { "headers": { "user-agent": "RenovateBot/37.22.0 (https://github.com/renovatebot/renovate)", "accept": "application/json", "authorization": "***********", "content-type": "application/json", "content-length": "2806", "accept-encoding": "gzip, deflate, br" }, "url": "https://codeberg.org/api/v1/repos/woodpecker-plugins/trivy/pulls", "hostType": "gitea", "username": "", "password": "", "method": "POST", "http2": false }, "response": { "statusCode": 500, "statusMessage": "Internal Server Error", "body": {"message": "", "url": "https://codeberg.org/api/swagger"}, "headers": { "cache-control": "max-age=0, private, must-revalidate, no-transform", "content-type": "application/json;charset=utf-8", "date": "Mon, 16 Oct 2023 17:09:07 GMT", "content-length": "56", "strict-transport-security": "max-age=63072000; includeSubDomains; preload", "permissions-policy": "interest-cohort=()", "x-frame-options": "sameorigin", "x-content-type-options": "nosniff", "content-security-policy-report-only": "default-src data: 'self' https://*.codeberg.org https://codeberg.org; script-src 'self' https://*.codeberg.org https://codeberg.org; style-src data: 'self' 'unsafe-inline' https://*.codeberg.org https://codeberg.org; img-src *; media-src *; object-src 'none'", "connection": "close" }, "httpVersion": "1.1", "retryCount": 0 } ```

This is likely due to rate limiting. The software upstream has no proper rate limiting and no way to pass a custom response to the user from the code that triggers the rate limiting.

The code can be found here and suggestions for improvements are welcome: 6b44939deb

The rate limiting is 3 issues / five minutes (pull requests are also counted as issues here). It only applies if the issue itself contains hyperlinks, but I suppose it does in your case.

We haven't had automation in mind when hotfixing this to fight our spam problem. Most created issues (by users) do not share hyperlinks, and if they do, they usually don't create many consecutive issues containing links.

This is likely due to rate limiting. The software upstream has no proper rate limiting and no way to pass a custom response to the user from the code that triggers the rate limiting. The code can be found here and suggestions for improvements are welcome: https://codeberg.org/Codeberg/forgejo/commit/6b44939debf720e436bce9b8a3a60979a67de0d8 The rate limiting is 3 issues / five minutes (pull requests are also counted as issues here). It only applies if the issue itself contains hyperlinks, but I suppose it does in your case. We haven't had automation in mind when hotfixing this to fight our spam problem. Most created issues (by users) do not share hyperlinks, and if they do, they usually don't create many consecutive issues containing links.

Yeah OK I see. I understand and see the reasoning though I'd argue that it needs a different/modified solution as this is a bummer for any kind of semi-automated development.

Meanwhile renovate succeeded here: https://ci.codeberg.org/repos/12705/pipeline/8/3 so it might have been only a one-time issue, i.e. when renovate is initially run on an org and hence creates a bunch of PRs in a short time. Yet it can happen again easily if e.g. a widely used image dependency is bumped and renovate tries to open PRs in > 3 repos in a run.

Maybe you could whitelist some URLs? Sure, manual work needed but I guess maybe the only solution?

For renovate specifically, the issues usually contain links to github.com and renovate but then also to common package/registry sources. But maybe going with these first WRT to exclusion might help?
Surely, this could be exploited by motivated bots but will they really ever find out about a whitelist and then first create ad-like repos on GH?

Yeah OK I see. I understand and see the reasoning though I'd argue that it needs a different/modified solution as this is a bummer for any kind of semi-automated development. Meanwhile `renovate` succeeded here: https://ci.codeberg.org/repos/12705/pipeline/8/3 so it might have been only a one-time issue, i.e. when `renovate` is initially run on an org and hence creates a bunch of PRs in a short time. Yet it can happen again easily if e.g. a widely used image dependency is bumped and renovate tries to open PRs in > 3 repos in a run. Maybe you could whitelist some URLs? Sure, manual work needed but I guess maybe the only solution? For renovate specifically, the issues usually contain links to github.com and renovate but then also to common package/registry sources. But maybe going with these first WRT to exclusion might help? Surely, this could be exploited by motivated bots but will they really ever find out about a whitelist and then first create ad-like repos on GH?

Maybe Codeberg should send a 429 instead of a 500 when the rate limit is hit

Maybe Codeberg should send a 429 instead of a 500 when the rate limit is hit

We should, but we don't know how to do this. There is no precedent for ratelimiting in the code base AFAICT, and we didn't figure out a convenient way to send the custom code at the point where the rate limiting is triggered. I suppose we should probably move the code somewhere else, but I have no clue where best to do this.

We also apply some rate limiting on the reverse proxy. We send proper status code there (and I think also a nice page explaining why the rate limiting is necessary etc).

We should, but we don't know how to do this. There is no precedent for ratelimiting in the code base AFAICT, and we didn't figure out a convenient way to send the custom code at the point where the rate limiting is triggered. I suppose we should probably move the code somewhere else, but I have no clue where best to do this. We also apply some rate limiting on the reverse proxy. We send proper status code there (and I think also a nice page explaining why the rate limiting is necessary etc).

So after a few days of observing this and how renovate acts on it, it's quite limiting from a dev perspective. The woodpecker-plugins org is only operating on ~10 repos, and it takes me multiple weeks with a daily renovate schedule to get around the rate limit (to create all issues and PRs as needed). Yes, I can adapt the schedule but in the end this also means a lot of runs which consumed resources in the CI that could have been prevented.

Maybe an option would be to filter against the user-agent? I.e. renovate sends "user-agent": "RenovateBot/37.22.0 (https://github.com/renovatebot/renovate)" and a regex match against this could be a good whitelist approach?

So after a few days of observing this and how renovate acts on it, it's quite limiting from a dev perspective. The `woodpecker-plugins` org is only operating on ~10 repos, and it takes me multiple weeks with a daily renovate schedule to get around the rate limit (to create all issues and PRs as needed). Yes, I can adapt the schedule but in the end this also means a lot of runs which consumed resources in the CI that could have been prevented. Maybe an option would be to filter against the `user-agent`? I.e. `renovate` sends `"user-agent": "RenovateBot/37.22.0 (https://github.com/renovatebot/renovate)"` and a regex match against this could be a good whitelist approach?

but we don't know how to do this

You create a new error type and check in the API router if the error has this type. Here is an example.

> but we don't know how to do this You create a new error type and check in the API router if the error has this type. [Here is an example](https://codeberg.org/forgejo/forgejo/src/commit/5f83399d296fffefa5b8feddf23befa811cdecb4/routers/api/v1/repo/repo.go#L258).
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Codeberg/Community#1317
There is no content yet.