Add Code search #379

Open
opened 9 months ago by davidak · 20 comments
davidak commented 9 months ago
Collaborator

Gitea now suports it with elastic search.

https://github.com/go-gitea/gitea/pull/10273

I don't know it it works well and is usable. It seem not to be activated on https://try.gitea.io/

Gitea now suports it with elastic search. https://github.com/go-gitea/gitea/pull/10273 I don't know it it works well and is usable. It seem not to be activated on https://try.gitea.io/
davidak added the
enhancement
label 9 months ago
Collaborator

Gitea has had code search for a while, it was just 1.13 that introduced being able to use ES. Codesearch is not available in any form on try.gitea.io as people love to push giant repos to it and eat up all our disk space (hence the notice that repos may disappear at any time), and having codesearch would only increase the resources needed.

Gitea has had code search for a while, it was just 1.13 that introduced being able to use ES. Codesearch is not available in any form on try.gitea.io as people love to push giant repos to it and eat up all our disk space (hence the notice that repos may disappear at any time), and having codesearch would only increase the resources needed.
Collaborator

While I use Gitea codesearch on the daily, for codeberg an alternative could be to set up an install of https://about.sourcegraph.com/ as it is significantly more robust and more featureful. The non-enterprise edition is licensed as Apache 2.

While I use Gitea codesearch on the daily, for codeberg an alternative could be to set up an install of https://about.sourcegraph.com/ as it is significantly more robust and more featureful. The non-enterprise edition is licensed as Apache 2.
hw commented 8 months ago
Owner

Interesting, ... do you imagine some kind of integration, or running it side-by-side?

Interesting, ... do you imagine some kind of integration, or running it side-by-side?
Collaborator

I see it as running side-by-side, however to make integration easier I could send a customization PR to codeberg that would swap out the standard code search text box with it opening up results in sourcegraph, and also add a button for something like "open this repo in sourcegraph".

Whichever way is decided (enabling code search via bleve, es, or connecting to a sourcegraph install) will add additional effort for the infra team though, so while one way may be better in terms of accuracy and usability, we should also take into account those managing the infra so as not to cause burnout.

I see it as running side-by-side, however to make integration easier I could send a customization PR to codeberg that would swap out the standard code search text box with it opening up results in sourcegraph, and also add a button for something like "open this repo in sourcegraph". Whichever way is decided (enabling code search via bleve, es, or connecting to a sourcegraph install) will add additional effort for the infra team though, so while one way may be better in terms of accuracy and usability, we should also take into account those managing the infra so as not to cause burnout.

It would be great to have this feature! It would significantly boost productivity on codeberg for me. I am happy with whatever works best for the devs.

It would be great to have this feature! It would significantly boost productivity on codeberg for me. I am happy with whatever works best for the devs.
hw commented 8 months ago
Owner

@6543 : do you have any gut feeling how sourcegraph compares to the elasticsearch index in terms of storage requirements?

@6543 : do you have any gut feeling how sourcegraph compares to the elasticsearch index in terms of storage requirements?
Owner

@techknowlogick

I assume the ES instance could run on a different server, right?

@techknowlogick I assume the ES instance could run on a different server, right?
Collaborator

@ashimokawa can be configured as you wish ...

@hw sadly I dont have any real-world-data since I didn't hat tested big instances with neither both of them :/

why not just test it with codeberg's test instance and look ow mouch each take?

@ashimokawa can be configured as you wish ... @hw sadly I dont have any real-world-data since I didn't hat tested big instances with neither both of them :/ why not just test it with codeberg's test instance and look ow mouch each take?
hw commented 8 months ago
Owner

why not just test it with codeberg's test instance and look ow mouch each take?

Running such tests (in a dedicated VM for isolation) is high on the todo list.

> why not just test it with codeberg's test instance and look ow mouch each take? Running such tests (in a dedicated VM for isolation) is high on the todo list.

EDIT: Please don't use this for viewing non-unique exports like master/main/HEAD because each tarball is cached on the dezip side using URL as key!

So instead of my teddit example here is another one using lemmy-js-client (teddit doesn't have any tagged releases yet):
https://dezip.org/https://codeberg.org/LemmyNet/lemmy-js-client/archive/0.9.0.tar.gz

Also in order to remove tarball from the dezip server use the following URL form:
https://dezip.org/https://codeberg.org/LemmyNet/lemmy-js-client/archive/0.9.0.tar.gz?remove


While this by no means replaces proper builtin search functionality, it is a useful shortcut that can be utilized until we get there: https://dezip.org

Reading their mission statement makes me think that searching Codeberg/Gitea autogenerated tarball exports might be an ideal service use case:

discomfort with the centralization of software development into sites like github and gitlab. convenient source code browsing shouldn't be coupled so tightly to repository hosting services.

For example, this is how one could use dezip to search teddit's source code:
https://dezip.org/https://codeberg.org/teddit/teddit/archive/main.tar.gz

Hit F after landing there or just click on magnifying glass icon in the upper right corner.

They don't say anything about usage limits but I would assume fair share usage policy is implied 😉

**EDIT**: Please don't use this for viewing non-unique exports like `master`/`main`/`HEAD` because each tarball is cached on the dezip side using URL as key! So instead of my teddit example here is another one using `lemmy-js-client` (teddit doesn't have any tagged releases yet): https://dezip.org/https://codeberg.org/LemmyNet/lemmy-js-client/archive/0.9.0.tar.gz Also in order to remove tarball from the dezip server use the following URL form: https://dezip.org/https://codeberg.org/LemmyNet/lemmy-js-client/archive/0.9.0.tar.gz?remove --- While this by no means replaces proper builtin search functionality, it is a useful shortcut that can be utilized until we get there: https://dezip.org Reading their mission statement makes me think that searching Codeberg/Gitea autogenerated tarball exports might be an ideal service use case: >discomfort with the centralization of software development into sites like github and gitlab. convenient source code browsing shouldn't be coupled so tightly to repository hosting services. ~~For example, this is how one could use dezip to search [teddit's](https://codeberg.org/teddit/teddit) source code: https://dezip.org/https://codeberg.org/teddit/teddit/archive/main.tar.gz~~ Hit <kbd>F</kbd> after landing there or just click on magnifying glass icon in the upper right corner. They don't say anything about usage limits but I would assume fair share usage policy is implied 😉

EDIT: Please don't use this per reasons explained above. Leaving original content below for sake of transparency.


I'm sure it can be improved a lot but I believe it is a good start 😉

Here is a quick n dirty bookmarklet I just made:
https://codepan.egoist.sh/gist/a088ec968a35bb6dcd098dc0215ea85c

javascript:(function () {
  const segments = location.pathname.split('/').filter(Boolean);
  const [owner, repo, ...path] = segments;
  const ref = path.includes('branch') ? path[path.indexOf('branch') + 1] : 'HEAD';
  const tarball = `/${owner}/${repo}/archive/${ref}.tar.gz`;
  location = `https://dezip.org/${new URL(tarball, location)}`;
}());
**EDIT**: Please don't use this per reasons explained [above](https://codeberg.org/Codeberg/Community/issues/379#issuecomment-171625). Leaving original content below for sake of transparency. --- I'm sure it can be improved a lot but I believe it is a good start :wink: Here is a quick n dirty bookmarklet I just made: https://codepan.egoist.sh/gist/a088ec968a35bb6dcd098dc0215ea85c ```js javascript:(function () { const segments = location.pathname.split('/').filter(Boolean); const [owner, repo, ...path] = segments; const ref = path.includes('branch') ? path[path.indexOf('branch') + 1] : 'HEAD'; const tarball = `/${owner}/${repo}/archive/${ref}.tar.gz`; location = `https://dezip.org/${new URL(tarball, location)}`; }()); ```

@vladimyr this is actually quite awesome, it seems to be cached at dezip. Did you figure out a way to make a link to a particular line?

@vladimyr this is actually quite awesome, it seems to be cached at dezip. Did you figure out a way to make a link to a particular line?

@vanous Every tarball is cached using its URL as a key. Regarding linking to a particular line - no, I would like to know how to do it too 😉

@vanous Every tarball is cached using its URL as a key. Regarding linking to a particular line - no, I would like to know how to do it too 😉
fnetX added the
infrastructure
label 6 months ago

Hello all, dezip author here. I found this page doing a search for the dezip.org URL. I'm happy to see it being used! Archives are automatically cleaned up when disk usage hits a high-water mark, so don't worry about removing them manually. Also, each line number should be a link now, assuming you have JavaScript enabled. Feel free to use it for whatever you need, as long as it's not blowing through my monthly bandwidth cap (2000 GB) too quickly.

Hello all, dezip author here. I found this page doing a search for the dezip.org URL. I'm happy to see it being used! Archives are automatically cleaned up when disk usage hits a high-water mark, so don't worry about removing them manually. Also, each line number should be a link now, assuming you have JavaScript enabled. Feel free to use it for whatever you need, as long as it's not blowing through my monthly bandwidth cap (2000 GB) too quickly.

This is great. Thank you for stopping by.

I am not sure how far the research into Codeberg own search feature is at this point, when new Codeberg server hardware is being burned in and configured.

The line anchors are very useful. This is compatible with Gitea deep links, so simply by replacing the base url, one can make the link go from dezip back to the repo:

http://dezip.org/v1/9/https/codeberg.org/Freeyourgadget/Gadgetbridge/archive/0.58.0.tar.gz/gadgetbridge/app/src/main/java/nodomain/freeyourgadget/gadgetbridge/devices/pebble/PebbleCoordinator.java?line=173#L173

https://codeberg.org/Freeyourgadget/Gadgetbridge/src/branch/master/app/src/main/java/nodomain/freeyourgadget/gadgetbridge/devices/pebble/PebbleCoordinator.java?line=173#L173

This is great. Thank you for stopping by. I am not sure how far the research into Codeberg own search feature is at this point, when new Codeberg server hardware is being burned in and configured. The line anchors are very useful. This is compatible with Gitea deep links, so simply by replacing the base url, one can make the link go from dezip back to the repo: http://dezip.org/v1/9/https/codeberg.org/Freeyourgadget/Gadgetbridge/archive/0.58.0.tar.gz/gadgetbridge/app/src/main/java/nodomain/freeyourgadget/gadgetbridge/devices/pebble/PebbleCoordinator.java?line=173#L173 https://codeberg.org/Freeyourgadget/Gadgetbridge/src/branch/master/app/src/main/java/nodomain/freeyourgadget/gadgetbridge/devices/pebble/PebbleCoordinator.java?line=173#L173
Owner

Current status:

We are experimenting with opensearch (elasticsearch fork). It is enabled on codeberg-test.org.

gitea somehow only indexes some repos, which seems like an upstream bug to me.

The (outdated) Gadgetbridge repo on codeberg-test.org has an index and can be searched.

https://codeberg-test.org/ashimokawa/Gadgetbridge

Current status: We are experimenting with opensearch (elasticsearch fork). It is enabled on codeberg-test.org. gitea somehow only indexes some repos, which seems like an upstream bug to me. The (outdated) Gadgetbridge repo on codeberg-test.org has an index and can be searched. https://codeberg-test.org/ashimokawa/Gadgetbridge

This is awesome! Yes, some repos work, some don't

https://codeberg-test.org/test2/gitea/search?q=git works

https://codeberg-test.org/bigrepos/linux/search?q=enter_secure_mode doesn't work

Also, gitea's work on indexing wiki seems to have stalled, but even code search would be a huge improvement.

This is awesome! Yes, some repos work, some don't https://codeberg-test.org/test2/gitea/search?q=git works https://codeberg-test.org/bigrepos/linux/search?q=enter_secure_mode doesn't work Also, gitea's work on indexing wiki seems to have stalled, but even code search would be a huge improvement.

@ashimokawa any news on this?

@ashimokawa any news on this?
Collaborator

This might solve the not-indexing repo bug: https://github.com/go-gitea/gitea/pull/16991

Most of the repos are actually migrated for testing purposes, few were pushed directly.

We will test and hit you up.

This might solve the not-indexing repo bug: https://github.com/go-gitea/gitea/pull/16991 Most of the repos are actually migrated for testing purposes, few were pushed directly. We will test and hit you up.

@fnetX Thank you 👍 looking forward to further news.

@fnetX Thank you :+1: looking forward to further news.
Sign in to join this conversation.
No Milestone
No Assignees
11 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.