Based on hexP language. |
1 week ago | |
|---|---|---|
| doc | 7 months ago | |
| seed | 1 week ago | |
| src | 1 week ago | |
| tools | 1 year ago | |
| .gitignore | 1 year ago | |
| LICENSE | 1 year ago | |
| Makefile | 1 week ago | |
| README.md | 1 week ago | |
| README.org | 1 week ago | |
README.md
Table of Contents
Objective
The objective of this project is to provide a minimal binary seed for booting either bare hardware or software e.g. in a linux environment.
The minor objective is to do it in code that is understandable by 75% of programmers.
Credits
Most of the ideas are from other projects. In particular:
- PreForth by Ulrich Hoffmann, c.f. https://github.com/uho/preForth
- SectorForth by Cesar Blum , c.f. https://github.com/cesarblum/sectorforth
- JonesForth by Richard WM Jones, c.f. http://rwmj.wordpress.com/2010/08/07/jonesforth-git-repository/ or https://github.com/nornagon/jonesforth
- MiniForth by Jakub Kądziołka, c.f. https://github.com/NieDzejkob/miniforth
One of its ideas is to put variable addresses (&here, …) onto the
initial stack. We extend this to execution tokens and the constant
-1. - stage0 by Jeremiah Oriansj with a binary seed for x86 of 357 bytes, c.f. https://github.com/oriansj/stage0/
- Sectorlisp by Justine Tunney with 436 bytes, c.f. https://justine.lol/sectorlisp2/
- Planckforth by Koichi NAKAMURA, c.f. https://github.com/nineties/planckforth It starts with one character words. Therefore, his parser is really simple, i.e. it reads the next character. Later he extends this to normal forth.
- tiny.asm by Brian Raiter with 76 bytes, c.f. http://www.muppetlabs.com/~breadbox/software/tiny/revisit.html
- The new Gforth header by Bernd Paysan and M. Anton Ertl, c.f. https://publik.tuwien.ac.at/files/publik_287599.pdf
The most important original new ideas are:
-
We start with a small forth nuclei or pre-forth system, called character indexed pre forth or CIPF for short. We use it to define a regular forth by forth source code. The pre-forth CIPF is just enough to define new code words. To minimize it we use:
- Tail code sharing.
- We place variables directly in the machine instructions.
- We use direct threading code.
- We drop the return stack, at first. Therfore, we can only call primitives initially.
- We start with only 5 (yes, five) primitives in the boot
binary; i.e.
: (simple-colon) ( "c" a -- )p (psp@) ( -- x) \ parameter stack pointer@ (fetch) ( a -- x )- (minus) ( n0 n1 -- n2 ; n2 = n0 - n1 )! (store) ( x a -- x )
- We don't even provide a dictionary in the binary. Instead, we
put the xt of our five primitives on the initial stack. By
chance we can compute with them
hereand-1and put them on the stack.- For other CPU we can add
nop-codes or rearrange the next part to calculate-1. - The value of
hereis much easier. It must only be greater than the address of the last needed maschine code.
- For other CPU we can add
- We replaced the linked list directory with a lookup table for words of one character.
In total we gain:
- A simple parser: read one character.
- A simple find: lookup the execution token in a table which is indexed by the character.
- Define the first words by consuming the execution tokens (xt)
on the initial stack. And calculate
hereand-1in between. - Define new code words (i.e. words in machine code) using the first five words
- Simplify the definition of new code words by the first new code words
- Define a 'normal' forth interpreter with a 'normal' forth directory.
-
Using the simpler octal monitor instead of an hex monitor.
-
Using hexP encoding, i.e. using the 16 uppercase letters A to P to encode the hexadecimal numbers. Now, we can decode them as easily as octal numbers!
-
The octal monitor with the shifting flag, which controls, if we read the next octal digit or emit the next octet into the memory.
Purpose
The purpose of this project is to minimize the binary seed size. E.g. for
- https://bootstrappable.org/
- https://guix.gnu.org/manual/en/html_node/Reduced-Binary-Seed-Bootstrap.html
- https://guix.gnu.org/en/blog/2020/guix-further-reduces-bootstrap-seed-to-25/
- https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf
State
-
The monitor sizes are
-
for octal linux input on x86: 48 bytes
-
for octal keyboard input on x86: 33 bytes
-
for octal uninitialized serial input on x86: 35 bytes
-
for octal initialized serial input on x86: 42 bytes
-
for hexP linux elf on x86 with 32 bit: 109 bytes
- 76 bytes for the elf header
- 40 bytes for the real code
- NB: 7 bytes of the code are in the elf header
-
for octet keyboard input on x86: 18 bytes (!)
-
for octet uninitialized serial input on x86: 20 bytes
-
for octet initialized serial input on x86: 27 bytes
NB: It's less than one tenth of a boot sector! (Besides the ELF-header.)
-
-
CIPF (character indexed pre forth) binary code size is:
- 71 bytes for bios monitor on x86
- 77 bytes for bios standalone on x86
- 103 bytes for linux monitor on x86 ( ?? bytes, if preserving the linux stack)
- 165 bytes as a standalone linux elf binary on x86 with preserving the linux stack
At these small sizes, a mere human can rigorously check each bit with due diligence.
At these small sizes, there is really no bit available for malicious code.
Improvements are welcome.
-
The source code which transforms CIPF into a quite normal forth is in still not complete:
- We use a modified 'new gforth header'. We put the
cfainto the virtual table of antand use adoPrimitivsimiliar todoColon. - We use a
perciveto obtain extendable input parsing.
The next steps are:
- Define input and output stream handling.
- wordlists and vocabularies
- floating point arithmetic
- scheme interpreter
- maybe a c interpreter or tiny c compiler
- We use a modified 'new gforth header'. We put the
Notes on Gnu assembler as
I have found no way to compile a short jump without doubt. Gnu as optimizes jmp on its own and do not provide something like jmpb AFAIK. Any hints are welcome.
Possible hacks
-
fill to much
.macro jmpb target pre_jump_byte_\@ : jmp \target .fill 0xffffffff * (. - 2 - pre_jump_byte_\@), 8, 0xdeadbeef # its size is zero or to big .endm -
assemble op-code byte by byte
.macro jmpb target .db 0xEB, target - ( $ + 1) ; hex code and relative offset .fill 0xffffffff * ((target - $) & ~ 0xff), 8, 0xdeadbeef .endm
Bootstrapping
Seeding
Binary seed
The binary seed is only a monitor, i.e. a code loader. The default one reads octal numbers and writes them as octets into memory.
The last octet overwrites the last jump offset of the monitor. Thus, we jump into the new code.
-
Why octal?
Octal code is easier to parse and convert to octets. We only have to substract '0'. With hex code we must substract either '0' or 'A' or 'a'. Therefore, the octal monitor is smaller than a hex monitor.
-
Alt numpad monitor
On x86 hardware we can use a monitor which reads octets, because of the alt numpad input method. It allows us to enter octals numbers directly.
Otherwise octet input would be cheating. We would enter a binary manually. That's still a binary seed.
Octal seed or cipf, i.e. character indexed pre forth
The monitor input is a minimal pre forth interpreter compiled to machine code, all bytes reversed and encoded as octal numbers.
- The purist compiles the assembler to machine code by pen and paper; then he reverses the bytes, does the octal encoding and enter the result by hand.
- The lazy one uses make, yasm, and python.
- If you don't accept octal code as source code you can take the binary of cipf as a small binary seed.
The forth systems fits easily into a boot sector of 512 octets (aka bytes). Thus, it's small by itself but huge in comparison to the octal monitor.
Binary Seed Pre-Forth
For those, who don't like reverse octal code as source input, we provide a binary seed of our forth nucleus or pre forth system.
Feeding the code
Because of the small size, we don't provide visual feedback at the start. This means entering blindly a lot of characters. If you don't like it we recommend a serial connection. That has the advantage, that we can attach a simple electric non-digital logger to it, which prints all send and received bytes on a paper trail. That can be checked rigorously with due diligence.
TODO:
- Learn from: http://collapseos.org/ a OS in forth.
- Learn from other forth implementations
- Copy an/some assembler in forth, e.g. https://git.savannah.gnu.org/cgit/gforth.git/tree/arch/386/asm.fs
We start with a test environment for i386 linux. In this environment it is easier to develop and debug the code.
- copy'n'paste forth code to qemu simulation
- Learn how to use the serial input in qemu while booting.
- implement in Forth a hexcode word, e.g. hex0, , cf. [1]. [1] https://github.com/oriansj/stage0 It's seed takes 357 bytes; i.e. much more than 2wf or 0w4tf
- implement a scheme interpreter
- implement a c or a "tiny c" compiler Check https://github.com/pzembrod/cc64 https://groups.google.com/g/comp.lang.forth/c/lBYFfVJ1qhc#98fc97704cda1b07
- implement a tiny c compiler https://github.com/meithecatte/2klinux
- bin2octal-revers
- bin2octet-revers
- elf-text-writable
- strip-forth-comments