How to render jupyter notebook on Codeberg properly? #671

Open
opened 2 weeks ago by penguinsfly · 9 comments

How would jupyter notebooks be rendered properly instead of a JSON format?

I was recommended to look at this. From my naive understanding, it is a configuration on the backend of gitea (the app.ini file) instead of a user/repo-specific configuration, is that correct?

Thanks!

How would `jupyter` notebooks be rendered properly instead of a JSON format? I was recommended to look at [this](https://blog.gitea.io/2022/04/how-to-render-jupyter-notebooks-on-gitea/). From my naive understanding, it is a configuration on the backend of [`gitea`](https://docs.gitea.io/en-us/external-renderers/#example-jupyter-notebook) (the `app.ini` file) instead of a user/repo-specific configuration, is that correct? Thanks!
Collaborator

Hi @penguinsfly!

You're indeed right that this needs to be configured within the app.ini and cannot be achieved by a repo configuration.

Looking at the blog post it is a bit hacky and requires to install jupyter (and it's dependencies) on the server, which are quite a lot. Unfortunatly it doesn't seem to be that other software exists that also has the functionality to have .ipynb -> html.

Hi @penguinsfly! You're indeed right that this needs to be configured within the app.ini and cannot be achieved by a repo configuration. Looking at the [blog post](https://blog.gitea.io/2022/04/how-to-render-jupyter-notebooks-on-gitea/) it is a *bit hacky* and requires to install jupyter (and it's dependencies) on the server, which are quite a lot. Unfortunatly it doesn't seem to be that other software exists that also has the functionality to have `.ipynb` -> `html`.
Poster

Thank you for your reply. I agree with you that jupyter sometimes can be quite a lot.

Just for reference, I created a venv environment (~20M initially) and just pip install nbconvert and it turned to be (~50MB), so I'm guessing it's around 30MB.

There's also nbviewer (I think nbconvert is used backend), which renders a notebook from the raw link (e.g., see here, I just grabbed a random link on codeberg). So I wonder whether it is possible to fetch the html within the body from that. Or just to somehow automatically attach a link pointing towards the nbviewer site with the raw link appended.

Another solution is pandoc, which might be lighter. I usually use it for md convesion but I believe it can also convert notebooks, though one might need a few tweaks/configurations to get it to look right.

Do you think any of the above suggestions might be possible? Or do you anticipate some future solutions any time soon?

Thank you for your reply. I agree with you that `jupyter` sometimes can be quite a lot. Just for reference, I created a `venv` environment (~20M initially) and just `pip install nbconvert` and it turned to be (~50MB), so I'm guessing it's around 30MB. There's also [`nbviewer`](https://nbviewer.org/) (I think `nbconvert` is used backend), which renders a notebook from the raw link (e.g., see [here](https://nbviewer.org/urls/codeberg.org/lcsrr/jupyter_notebooks/raw/branch/main/Univesp/Introdu%C3%A7%C3%A3o%20a%20Ci%C3%AAncia%20de%20Dados/Pandas/Pandas_intro.ipynb), I just grabbed a random [link](https://codeberg.org/lcsrr/jupyter_notebooks/raw/branch/main/Univesp/Introdu%C3%A7%C3%A3o%20a%20Ci%C3%AAncia%20de%20Dados/Pandas/Pandas_intro.ipynb) on codeberg). So I wonder whether it is possible to fetch the `html` within the body from that. Or just to somehow automatically attach a link pointing towards the `nbviewer` site with the raw link appended. Another solution is `pandoc`, which might be [lighter](https://github.com/jgm/pandoc/releases/tag/2.18). I usually use it for `md` convesion but I believe it can also convert notebooks, though one might need a few tweaks/configurations to get it to look right. Do you think any of the above suggestions might be possible? Or do you anticipate some future solutions any time soon?
Collaborator

I bookmarked that blog post quite a while ago. I didn't want to frickle with the setup yet.

What would be optimal in my opinion: Have (docker?) containers where everything is installed, and instead of calling pandoc or whatever, we're calling containertool run in-container-xy pandoc or something like that. Anyone interested to work on that with us?

I bookmarked that blog post *quite a while ago*. I didn't want to frickle with the setup yet. What would be optimal in my opinion: Have (docker?) containers where everything is installed, and instead of calling `pandoc` or whatever, we're calling `containertool run in-container-xy pandoc` or something like that. Anyone interested to work on that with us?
rwa added the
infrastructure
label 6 days ago
Collaborator

Anyone interested to work on that with us?

Happy to take a stab at it(add it to the backlog), seems like using something like pandoc container is useful in the long-term as it also provides a lot of other transformations.

> Anyone interested to work on that with us? Happy to take a stab at it(add it to the backlog), seems like using something like pandoc container is useful in the long-term as it also provides a lot of other transformations.
Collaborator

I noticed that Codeberg already seems to be using pandoc for .rst files. 77a0e2828f/etc/gitea/conf/app.ini (L183)

So it seems like just a small configuration part to add support for this.

I noticed that Codeberg already seems to be using pandoc for `.rst` files. https://codeberg.org/Codeberg-Infrastructure/build-deploy-gitea/src/commit/77a0e2828f8a78df42493da8f155b412cc7e71cf/etc/gitea/conf/app.ini#L183 So it seems like just a small configuration part to add support for this.
Collaborator

Yes, but we actually don't want to use pandoc directly on the server, but use containers for that instead.

Yes, but we actually don't want to use pandoc directly on the server, but use containers for that instead.
Collaborator

Yes, but we actually don't want to use pandoc directly on the server, but use containers for that instead.

Hmm okay, either way I had some fiddling with pandoc in how they convert it into HTML. It seems that they don't add any syntax highlighting classes/language when it's rendered "directly". It's only added when you specify to be self contained(--self-contained). As well for images, the conversion to data:...URI's is when specifying the self containment as well. So while looking into just using a <iframe> via Gitea. It became clear that unless you fully trust the output to not contain malicious scripts, it's not possible to use the <iframe>.

> Yes, but we actually don't want to use pandoc directly on the server, but use containers for that instead. Hmm okay, either way I had some fiddling with pandoc in how they convert it into HTML. It seems that they don't add any syntax highlighting classes/language when it's rendered "directly". It's only added when you specify to be self contained(`--self-contained`). As well for images, the conversion to `data:...`URI's is when specifying the self containment as well. So while looking into just using a `<iframe>` via Gitea. It became clear that unless you fully trust the output to not contain malicious scripts, it's not possible to use the `<iframe>`.
Poster

So I played around with the docker images for a bit and found that nbconvert, though not generalizable to other formats, might not be so bad (relatively). This is my first time playing around with docker so excuse my naive approach.

For pandoc, pandoc/minimal seems to be the smallest one available (?) and it's 79.3 MB. Which pandoc container source is currently used for gitea/codeberg?

Anyway, like @Gusted said, it needs additional configuration and more tinkering to get things to look right. And I haven't found a source for a template yet.

With nbconvert (i.e. dev:nbconvert-alpine), I just used the python:3.8-alpine then installed nbconvert and it was around 83.9MB (initially 46.8MB), so not that far from pandoc/minimal. And I tested with this demo file on nbviewer, which even without additional installations of pandoc somehow renders the math parts quite nicely (unsure how).

Here's what I used:

# Dockerfile
FROM python:3.8-alpine
RUN pip install --no-cache-dir -U nbconvert
WORKDIR /data
ENTRYPOINT ["jupyter", "nbconvert"]
# build 
docker build -t dev:nbconvert-alpine -f Dockerfile docker-images

# run with specific file or multiple ones
docker run --rm -v "`pwd`:/data" --user `id -u`:`id -g` \ # I follow what pandoc did
    dev:nbconvert-alpine --to html <path/to/ipynb-file or path/to/*.ipynb>

Here's the docker images output

REPOSITORY       TAG                SIZE
dev              nbconvert-alpine   83.9MB
pandoc/minimal   latest             79.3MB
pandoc/core      latest             371MB
python           3.8-alpine         46.8MB
python           3.8-slim           124MB

As for timing, with the same file, pandoc/minimal took a bit less than a second while nbconvert took around 3 seconds. Of course, one would probably have to set a file-size and timing limit for conversion to prevent nbconvert from taking too long or too much space.

So I played around with the docker images for a bit and found that `nbconvert`, though not generalizable to other formats, might not be so bad (relatively). This is my first time playing around with docker so excuse my naive approach. For `pandoc`, [`pandoc/minimal`](https://hub.docker.com/r/pandoc/minimal) seems to be the smallest one available (?) and it's 79.3 MB. Which `pandoc` container source is currently used for `gitea/codeberg`? Anyway, like @Gusted said, it needs additional configuration and more tinkering to get things to look right. And I haven't found a source for a template yet. With `nbconvert` (i.e. `dev:nbconvert-alpine`), I just used the `python:3.8-alpine` then installed `nbconvert` and it was around 83.9MB (initially 46.8MB), so not that far from `pandoc/minimal`. And I tested with this [demo file on nbviewer](https://nbviewer.org/github/jrjohansson/qutip-lectures/blob/master/Lecture-1-Jaynes-Cumming-model.ipynb), which even without additional installations of `pandoc` somehow renders the math parts quite nicely (unsure how). Here's what I used: ``` Dockerfile # Dockerfile FROM python:3.8-alpine RUN pip install --no-cache-dir -U nbconvert WORKDIR /data ENTRYPOINT ["jupyter", "nbconvert"] ``` ``` bash # build docker build -t dev:nbconvert-alpine -f Dockerfile docker-images # run with specific file or multiple ones docker run --rm -v "`pwd`:/data" --user `id -u`:`id -g` \ # I follow what pandoc did dev:nbconvert-alpine --to html <path/to/ipynb-file or path/to/*.ipynb> ``` Here's the `docker images` output ``` REPOSITORY TAG SIZE dev nbconvert-alpine 83.9MB pandoc/minimal latest 79.3MB pandoc/core latest 371MB python 3.8-alpine 46.8MB python 3.8-slim 124MB ``` As for timing, with the same file, `pandoc/minimal` took a bit less than a second while `nbconvert` took around 3 seconds. Of course, one would probably have to set a file-size and timing limit for conversion to prevent `nbconvert` from taking too long or too much space.
6543 added the
docs
label 3 days ago
Gusted added
enhancement
codeberg
and removed
docs
labels 2 days ago
Collaborator

FWIW, we will need this PR. Otherwise we couldn't use iframe to render the HTML.

FWIW, we will need [this PR](https://github.com/go-gitea/gitea/pull/20180). Otherwise we couldn't use iframe to render the HTML.
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Codeberg/Community#671
Loading…
There is no content yet.