Haku is a toy functional language with grammar, syntax and vocabulary inspired by Japanese.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

30 KiB

Haku

A toy functional programming language based on literary Japanese.

Is Haku for you?

  • Haku lets you write programs that look very much like written Japanese. So you need to be familiar with written Japanese to program in Haku. I have added translations and explanations to the documentation.
  • Haku is an experiment, not a practical programming language. Several of its features are rather contrary.
  • The implementation is incomplete and buggy, and error messages are poor.

Requirements

To run Haku you'll need to install the Raku programming language. If you plan to use Raku (it is a wonderful language), I recommend you use the Rakubrew installation tool.

Running Haku

I am assuming you'll run Haku on command line, in the directory cloned from Git or where you unzipped the downloaded archive.

In that directory is a script haku. If you want to run haku outside this directory, you'll need to add the path to the environment variable RAKULIB.

Example programs are in the subdirectory examples (horizontal writing) and examples/tategaki (vertical writing).

Usage: haku <Haku program, written horizontally or vertically, utf-8 text file>
    [--tategaki, -t] : do not run the program but print it vertically.
    [--miseru, -m] : just print the Raku source code, don't execute.
    [--yomudake, -y] : just print the Haku source after reading, as a single line. Don't execute.
  • The tategaki options is a pretty-printer but the generated code is valid Haku: if you save the result in a text file, haku can read it in and run it.
  • The miseru option shows the Raku code generated by the compiler.
  • The yomudake option prints the string that the compiler uses as input, i.e. the program source converted to a single line.

Example:

$ ./haku examples/tategaki/iroha_tategaki.haku

This should print

7188
557188

The first time you run haku will take quite a long time (several minutes) because Raku needs to compile the modules to bytecode. After that, it should only take a few seconds.

Haku by example

Example 1: Iroha

Iroha is the name of a famous poem in which every kana occurs only once. The i,ro,ha sequence is used for ordered lists, so it is similar to a,b,c (or x,y,z in this case).

Consider the following Haku program:

註 例のはくのプログラム。

本とは
ラムダは或エクスでエクス掛けるエクスです、
カズ達は八十八と七千百と五十五で、
イ・ロ・ハ・空はカズ達で、
シンカズはイとロの和で、
シンカズを見せる、
ケッカは〈七百四十壱をラムダする〉足す九百十九、
【ケッカとシンカズの和】を見せる
のことです。

In Romaji (Japanese written using the Latin script) this reads:

Chuu Rei no Haku no PUROGURAMU.

Hon toha
RAMUDA ha aru EKSU de EKUSU Kakeru EKUSU desu,
KAZUtachi ha 88 to 7100 to 55 desu,
I:RO:HA:Kuu ha KAZUtachi desu,
SHINKAZU ha I to RO no Wa desu,
SHINKAZU wo Miseru,
KEKKA ha (741 wo RAMUDA suru) Tasu919,
(KEKKA to SHINKAZU no Wa) wo Miseru
no Koto desu.

In this documentation I use capitals to indicate words starting with kanji (nouns, verbs and adjectives), and all caps for words originally written in katakana.

And the translation is

Comment: an example Haku program

Main is (the following things):
LAMBDA is, with a given X, X * X;
NUMBERs is 88 and 7100 and 55;
I:RO:HA:Empty is NUMBERs;
NEWNUMBER is the Sum of I and RO; 
RESULT is (do LAMBDA with 741) + 919;
Show (the Sum of RESULT and NEWNUMBER)

which compiles to the following Scheme code:

(define (displayln str) (display str) (newline))(define (hon)

; 例のはくのプログラム    
(define (hon)
    (let* (
            (RAMUDA (lambda (EKUSU) (* EKUSU EKUSU )))
            (KAZUTACHI (list 88 7100 55))
            (I (car KAZUTACHI))
            (RO (cadr KAZUTACHI))
            (HA (caddr KAZUTACHI))
            (SHINKAZU (+ I RO ))
            (KEKKA (+ (RAMUDA 741) 919 ))
        )
        (displayln SHINKAZU)
        (displayln (+ KEKKA SHINKAZU ))
    )
)

(hon)

Let's take it apart:

註 ... 。

is a comment (註 chuu means "note").

The main program (called 本, hon, here meaning "main") has a fixed begin and end string:

本とは
...
のことです。

In Romaji this reads "Hon to wa ... no koto desu.", roughly "Main is the following thing(s): ...".

In Scheme I emit a function as body of Hon a let*-binding (i.e. binding is sequential):

(define (hon)
    (let* (
        ...
        )
        ...
    )
)

In the example we have an number of different types of assignments:

ラムダは或エクスでエクス掛けるエクスです、

"RAMUDA wa aru EKSU de EKSU kakeru EKSU desu"

Katakana is for variables, kanji for functions and keywords, hiragana for keywords and verb endings (e.g. in 掛ける and 見せる).

This roughly reads as "as for RAMUDA, with a given X it is X times X", so RAMUDA binds to a lambda function. In Scheme this becomes:

(RAMUDA (lambda (EKUSU) (* EKUSU EKUSU )))

Next we have an assignment to a list of number constants:

カズ達は八十八と七千百と五十五で、

"KAZUTachi wa 88 to 7100 to 55 desu,"

Numbers are written in kanji. The particle to ("and") is the list separator. You can use the pluralising suffix 達 tachi to show that a variable name is plural. In Scheme this becomes:

(KAZUTACHI (list 88 7100 55))

Next we have a bit of syntactic sugar borrowed from Haskell (cons):

イ・ロ・ハ・空はカズ達で、

"I:RO:HA:Kuu wa KAZUTachi de,"

kuu means "empty". This means that the list is deconstructed into elements I, RO, HA and and empty list. Scheme does not have this kind of pattern matching so each assignment is generated separately.

The next assignment,

シンカズはイとロの和で、

"SHINKAZU wa I to RO no Wa de"

is simply

"SHINKAZU is the sum of I and RO"

(SHINKAZU (+ I RO ))

Then we have a print statement:

シンカズを見せる、

"SHINKAZU wo Miseru"

"To show SHINKAZU"

In Scheme:

(displayln SHINKAZU)

Then follows another assignment:

ケッカは〈七百四十壱をラムダする〉足す九百十九、

"KEKKA wa (741 wo RAMUDA suru) Tasu 919"

"KEKKA is (RAMUDA of 741) plus 919

(KEKKA (+ (RAMUDA 741) 919 ))

And finally we show the result of an expression:

【ケッカとシンカズの和】を見せる

"(KEKKA to SHINKAZU no Wa) wo Miseru"

"To show the sum of KEKKA and SHINKAZU"

(displayln (+ KEKKA SHINKAZU ))

Example 2: Length of a list

This example shows the use of named functions, conditionals and recursion. I will use Haskell as pseudocode instead of Scheme.

長さとはカズ達と回数で
若しカズ達が空に等しいなら
回数ですけど、
そうでない、
【カズ達の尻尾】と【回数足す壱】の高さ
の事です。

丈とはカズ達で
カズ達と零の長さ
の事です。

本とは
カズ達は壱〜四十弐、
ナガサはカズ達の丈、
ナガサを見せる
の事です。

In Romaji:

Nagasa toha KAZUtachi to KAISUU de
Moshi KAZUtachi ga Kuu ni Hitoshii nara
KAISUU desukedo,
soudenai,
(KAZUtachi no shippou) to (KAISUU Tasu Ichi) no Nagasa
no Koto desu.

Jou toha 
KAZUtachi de KAZUtachi to Zero no Nagasa 
no Koto desu.

Hon toha
KAZUtachi ha ichi .. yonjuuni
NAGASA ha KAZUtachi no Jou    
NAGASA wo Miseru
no Koto desu.

Translation:

Longness is (the following thing):
With NUMBERs and Count,
If NUMBERS is equal to Empty, Count 
but if this is not so
Longness of (Tail of Numbers) and (Count + 1)

Length is (the following thing):
With NUMBERs, 
Longness of NUMBERS and 0

Main is (the following things):
NUMBERS is 1 .. 42;
LENGTH is Length of NUMBERS;
Show LENGTH

(長さ and 丈 both mean "length" but to avoid a naming conflict, I literally translate nagasa as "long-ness".)

Let's start with the main program:

カズ達は壱〜四十弐、
"KAZUtachi ha ichi .. yonjuuni"

kazutachi = [1 .. 42]

We create a list kazutachi with the range operation 〜

ナガサはカズ達の丈、
"NAGASA ha KAZUtachi no Jou"

nagasa = jou kazutachi

We call the function jou on kazutachi and bind the result to nagasa

ナガサを見せる
"NAGASA wo Miseru"

print nagasa

The function jou is quite simple:

丈とはカズ達でカズ達と零の長さの事です。
"Jou toha KAZUtachi de KAZUtachi to Zero no Nagasa no Koto desu."

jou kazutachi = nagasa kazutachi 0

Finally nagasa

Nagasa toha KAZUtachi to KAISUU de
Moshi KAZUtachi ga Kuu ni Hitoshii nara
KAISUU desukedo,
soudenai,
(KAZUtachi no shippou) to (KAISUU Tasu Ichi) no Nagasa
no Koto desu.

In Haskell:

nagasa kazutachi kaisuu = 
    if kazutachi == [] 
        then kaisuu 
        else nagasa (tail kazutachi) (kaisuu+1)

Example 3: Summing a list

This example shows the use of comments, named functions, conditionals, let-bindings and recursion. I will again use Haskell for the equivalent code.

註 再帰関数の例の「加える」。

加えるとは
カズ達とサムで
〈カズ達の長さ〉が零に等しい場合は
サムですけれど、
そうでない場合は、
このノコカズ達とシンサムを加えるに
カズ・ノコカズ達がカズ達、
シンサムがサムにカズを足す
の事です。

註 主プログラム。

本とは
コタエは[三十四と八]と零を加える、
コタエを見せる
の事です。

In Romaji:

Chuu Saikikansuu no Rei no "Kuwaeru".

Kuwaeru toha
KAZUtachi to SAMU de
(KAZUtachi no Nagasa) ga zero ni hitoshii baaiha,
SAMU desukeredo,
soudenai baaiha,
kono NOKOKAZUtachi to SHINSAMU wo Kuwaeru ni
KAZU:NOKOKAZUtachi ga KAZUtachi、
SHINSAMU ga SAMU ni KAZU wo Tasu
no Koto desu.

Chuu ShuPUROGURAMU.

Hon toha
KOTAE ha[34 to 8]to 0 wo Kuwaeru,
KOTAE wo Miseru
no Koto desu.

Translation:

Comment: Example recursive function "Sum"

Sum is:
Given NUMBERs and SUM,
in case the length of NUMBERs is zero,
it is SUM but
in case this is not so,
in this Sum of REMAININGNUMBERS and NEWSUM,
NUMBER:REMAININGNUMBERS is NUMBERs;
NEWSUM is to Add NUMBER to SUM

Comment: Main program.

Main is:
ANSWER is Sum of [34 and 8] and 0;
to show the ANSWER    

Starting again with the main program:

"KOTAE ha [sanjuuyon to hachi] to zero wo Kuwaeru"

kotae = kuwaeru [34,8] 0

Then the function 加える (kuwaeru, "so sum")

Kuwaeru to ha
KAZUtachi to SAMU de
(KAZUtachi no Nagasa) ga zero ni hitoshii baai ha
SAMU desukeredo,
soudenai baai ha,
kono NOKOKAZUtachi to SHINSAMU wo Kuwaeru ni
KAZU:NOKOKAZUtachi ga KAZUtachi
SHINSAMU ga SAMU ni KAZU wo Tasu
no Koto desu.    

This function uses the 場合 variant of the if-then-else

〈カズ達の長さ〉が零に等しい場合は
サムですけれど、
そうでない場合は、
...

It also has a let-expression:

このノコカズ達とシンサムを加えるに
カズ・ノコカズ達がカズ達、
シンサムがサムにカズを足す

In Haskell this becomes:

kuwaeru kazutachi samu =
    if length kazutachi == 0 
        then samu 
        else
            let                
                kazu:nokokazutachi = kazutachi
                shinsamu = samu + kazu
            in
                kuwaeru nokokazutachi shinsamu

Language guide

  • Haku is a simple, mostly-pure, implicitly typed, strict functional language.
  • Think Scheme with a sprinkling of Haskell.
  • TODO: At the moment, all type checking is deferred to Raku, the implementation language. So the currently Haku is not really strict and more dynamically typed.

Punctuation

  • As Haku does not rely on whitespace, spaces and newlines are not delimiters.
  • Bindings in a let and expressions and bindings in the main program must be delimited by 、or 。.
  • At some places (e.g. after delimiters), newlines are allowed to ease readability.

Comments

All comments must start with 註 (chuu, "note") or 注 (chuu, "comment", so you can write 注意, "warning") and end with a . A newline is allowed after a comment.

Program structure

  • A haku program source file can contain named function definitions and must contain a main program, called" 本 (pronounced hon and meaning "main").

  • The main program differs from the functions in that functions must be pure and therefore consist of a single expression, whereas the main program can be a sequence of expressions, similar to the do-sequence in a Haskell main program. The main program is defined as

      本とは 
      <var1>は<rhs-expression1>、
      <var2>は<rhs-expression2>、
      ...
      <expression1>、
      <expression2>、
      ...
      の事です。
    

There are a few variants to make it sound a bit more formal:

  • 本とは can also be 本真とは. 本真 honma means "truth".
  • の事です。(no koto desu, "is this thing") can also be と言う事です (to iu koto desu, "the said thing")。

A newline is allowed after 本とは and after all bindings and expressions.

Identifiers

  • variables: the first character must be katakana; further characters katakana or number kanji. The last character can be 達 (tachi), to indicate a plural.
  • function names: must start with a kanji. If they are nouns, further characters are also kanji; if they are verbs, further characters are hiragana verb endings. If they are adjectives, the final character must be い i or な na.

Constants

  • integer: written using number kanji. For zero, either 零 (rei, means zero), ゼロ (ZERO, also means zero) or ◯ (maru, also zero). Negative number prefix is マイナス (MAINASU, "minus"), optional positive number prefix is プラス (PURASU,"plus"). All kanji for large numbers (億, 兆, 京, etc) are supported, please read my article if you are not familiar with them.
  • rational: two integers separated by 点 (ten, "point")
  • string: quotes are「」or 『』
  • list: consist of identifiers or constants separated by と (to, "and"). To nest lists, wrap them in [...].

Named function definitions

In Haku, named function definitions are statements. The structure is

<function-name> とは <argument-list> で <expression> の事です。

The same closing variants as for 本 are allowed; a newline is allowed after とは, で and the expression. The function name should be either a verb, noun or adjective (-i or -na).

Lambdas

或 <argument-list> で <expression>

aru means "a certain ..., a given ..., some ...",

Function application

There are a few forms of function application:

  • Verb form:

      <arg-list> を [ <arg-list> で ] <function>
    
      味噌汁をスプーンで食べる
    
      misoshiru wo SUPUUN de Taberu
    
      To eat miso soup with a spoon
    
  • Adjectival verb form (single argument only):

      <function> <arg>
    
      青い信号
    
      Aoi Shingo
    
      green traffic light
    
  • Noun form:

      <arg-list> の [ 、 <arg-list> での ] <function>
    
      六と七の積
    
      Roku to Nana no Seki
    
      the product of six and seven
    
  • Adjective form:

      <funtion> <argument>
    
      送ったメッセージ
    
      Okutta MESSEEJI
    
      the sent message
    

The argument list can optionally be followed by の皆 (no minna, "all of"). This is used in particular when applying map or fold. Also, instead of で you can use のために or の為に (no tame ni, "in order to").

Partial application

The arg list can be followed by だけ or 丈 (dake, "only") to indicate partial application.

Map and Fold

Haku has built-in map and foldl:

  • foldl: 畳み込む (Tatamikomu, "to fold")

      <list>と<accumulator>を<nominal-function>で畳み込む
    
  • map: 写像する (Shazou suru, "to map")

      <list>の皆を<nominal-function>で写像する
    

A 'nominal function' is either a noun, variable, lambda expression, a verb nominalised with の (no) or こと (koto), or a -na adjective nominalised by dropping -na.

零〜四を〈或カズでカズ掛ける弐足す壱〉で写像する
仮二を逆で写像する

zero~yon wo (aru KAZU de KAZU Kakeru Ni Tasu Ichi) de Shazou suru

Map (with a given NUMBER, NUMBER times two plus one) to 0 .. 4

Function composition

We use 後, 'のち' (nochi, "and then"), to compose functions:

<function1>後<function2>

Note that

f1後f2 

corresponds to

f2 . f1 

Currently, you have to bind them to a variable or wrap them in a lambda to apply them (TODO)

Let binding

There are two forms. The first is more like where-clause (expression at the start):

この <expression> に <variable> が <expression> 、... 。

A newline is allowed after この (kono, "this"), に and every bind expression.

The second is more like a conventional let (expression at the end):

●<variable>は <expression>、
●...
では<expression>

A newline is allowed after every bind expression. The is not full-width so for vertical writing and are also supported.

Conditional expressions

Similar to other Japanese natural programming languages, we use 若し or もし (moshi, "if, supposing") as the keyword to introduce an if-then-else. The condition can be either なら, ならば or 〜たら (all of them have a meaning close to "would be the case"). The 'true' expression can optionally be followed by ですけれども (desukeredomo, a polite "but") or variants; the 'false' branch is introduced by そうでなければ or そうでないなら (soudenakereba or soudenainara, "if that would not be the case").

For example

もし <cond-expression> ならば <expression> そうでなければ <expression> 。

そうでなければ can also be written そうでない. A newline and/or comma is allowed after なら and before and after そうでなければ. Before そうでなければ, an optional ですけれど/ですけど/ですけれども or ですが is allowed.

Currently, there is another form of if-then-else expression supported, which uses 場合 (baai, "in case"):

<cond-expression>場合は<expression>ですけれど、そうでない場合は<expression>

The ですけれど is optional and can also be ですけど, ですけれども or ですが; a newline and/or comma is allowed after 場合は and before そうでない.

Operators

Haku provides a minimal set of arithmetic and logical operations and numerical comparisons. Built-in operators in Haku can have a different syntax from ordinary function calls. There is no operator precedence handling, so combined expressions need parentheses.

Arithmetic

Verb form:

<expression> <verb> <expression>

or

<expression> と <expression> を <verb>

+: 足す (Tasu)
-: 引く (Hiku)
*: 掛ける (Kakeru)
/: 割る (Waru)

Noun form:

<expression> と <expression> の <noun>

+: 和 (Wa)
-: 差 (Sa)
*: 積 (Seki)
/: 除 (Jo)

Logical (TODO)

Boolean values:

True: 陽 (You)
False: 陰 (In)

(These are the kanji for yang and yin, so they do not really mean "true" and "false" but close connotations such as light and darkness, sun and moon, positive and negative.)

Operations:

A and B: A も B も
A or B: A また[は] B 
not A: 不A 

A mo B mo means "both A and B", mata means "or"

I will likely also support the formal names:

XOR: 排他的論理和 (Haitateki Ronriwa, exclusive logical sum)
OR: 論理和 (Ronriwa, logical sum)
AND: 論理積 (Ronriseki, logical product)
NOT: 論理否定 (Ronrihitei, logical negation)

Comparison

<expression> が <expression> <comparison-operation>

==: に等しい (ni hitoshii)
>: より多い (yori ooi)
<: より少ない (yori sukunai)

TODO:

>=: 以上 (ijou)
<=: 以下 (ika)

Lists

  • Haku list are simply expressions separated by と to ( or に ni "at, into", から kara "from" or まで made "to"), without parentheses.

  • Square brackets (角括弧 kakugakko) [] are used for nesting lists.

  • The empty list is 空 (Kuu, "empty").

  • Lists have a minimal set of list manipulation functions:

      length: <list>の長さ (Nagasa)
      head: <list>の頭 (Atama)
      tail: <list>の尻尾 (Shippou)
      cons:・(中黒) (Nakamaru)
      concatenation: <list1>と<list2>を合わせる (Awaseru)
      range operator: <integer>〜<integer> 
      reverse: 逆な<list> (Gyaku na, an adjectival function)
    

Maps

  • Maps ("dictionaries") are created from lists of pairs,

      <key1>と<value1>と...で図を作る 
    

    or from an empty list

      空で図を作る
    

    or shorter

      空図
    

図を作る Zu wo Tsukuru means "to create a map".

  • Maps support the following functions:

      has: <map>に<key>が有る (ga Aru)
      insert: <map>に<key>と<value>を入れる (wo Ireru)
      lookup: <map>に<key>を正引きする (TODO: 探索する) (Seibiki suru, Tansaku suru) 
      delete: <map>から<key>を消す
      length: <map>の長さ 
      keys:    <map>の鍵 (Kagi)
      values:  <map>の値 (Atai)
    

Interpolation in strings (TODO)

《バリュー》 returns a string.

System call (TODO)

機関で「<system call string>」する

機関 kikan means "system".

Any string will be passed on to the shell for execution.

I/O

  • The print function is called 見せる miseru, "to show", and returns the stringified argument.

      <arg>を見せる
    

TODO:

  • Minimal file I/O for text files only.

      open: 
          <file>を<mode>の為に開ける (_Akeru_)
          where
          <mode>: 読む (Yomu, read) or 書く (Kaku, write)
          or
          <file>を開ける
          for read-write
    
      write: <string>を<filehandle>で書く
      read: 
          a single line: <filehandle>から一線を読む、(Issen, a single line)
          all lines: <filehandle>から全線を読む、(Zensen, all lines)
    
      close: <filehandle>を閉める (Shimeru)
    
      eof: <filehandle>の終了 (Shuuryou, TODO)
    

Types

TODO

Modules and imports

TODO

Expressiveness

Haku tries to be more like a natural language. Apart from adopting Japanese writing and word order, it does this mainly in two ways:

Verb conjugation on function calls

Haku lets you conjugate the verbs for a function call. For example:

  • Given a function send:

      送るとは... 
    

    and an argument message:

      メッセージ  
    
  • In Scheme:

      (send message)
    
  • Plain Haku

      メッセージを送る。
      MESSEEGI wo Okuru
      “To send a message”
    
  • Polite Haku

      メッセージを送って下さい。
      MESSEEGI wo Okutte kudasai
      “Please send the message”
    
  • Insistent Haku

      メッセージを送なさい。
      MESSEEGI wo Okunasai
      “Do send the message”
    
  • Adjectival

      送ったメッセージ。
      Okutta MESSEEGI
      “The sent message”
    

TODO: Not all of this works yet, currently you can do dictionary form (adjectival or plain), -te form with or without 下さい、しなさい and even くれて. And です can be で御座います (degozaimasu).

Choice of list separators and function application constructs

There is a lot of choice in how to express certain constructs, in particular function application. For example,

書類を読むの為に開ける
Shorui wo Yomu no tame no Akeru
"to open the document for reading"  
open doc ReadOnly

辞書にカギとバリューを入れる
Jisho ni KAGI to BARYUU wo Ireru
"to insert a key and value in the dictionary"
insert dict key value

ジショにカギを正引きする
JISHO ni KAGI wo Seibiki suru
"to lookup a key in the dictionary"
lookup dict key

辞書からカギを消す
Jisho kara KAGI wo Kesu
"to delete a key from the dictionary"
delete dict key

カギとバリューから図を作る 
KAGI to BARYUU kara Zu wo Tsukuru
"to create a map from key and value"
fromList [(key,value)]

カズ達とアクを足すので畳み込む
KAZUtachi to AKU wo Tasu no de Tatamikomu
"to fold the numbers and the acc with addition"
foldl add acc xs

数達の皆を二倍で写像する
Kazutachi no minna wo nibai de Shazou suru
"to map all numbers with double"
map double xs

六に七を足す
Roku ni Nana wo Tasu
"add seven to six"
6+7
六で掛ける七
Roku de Kakeru NANA
"multiply seven with six"
6*7
六と七の積
Rok to Nana no Seki
"the product of six and seven"
6*7

I plan to add support for adjectives as well (TODO).

Motivation

The Wikipedia page on Non-English-based programming languages lists eight different languages based on Japanese. So why make a ninth one? The short answer is, to see what I would end up with. The slightly longer answer is that these other eight languages serve a practical purpose: they want to make programming easier for Japanese native speakers, and most of them target education.

My motivation to create Haku is very different. I don't want to create a practical language. I want to explore what the result is of creating a programming language based on a non-English language, in terms of syntax, grammar and vocabulary. In particular, I want to allow the programmer to control the register of the language to some extent (informal/polite/formal).

Syntax

I also want the language to be closer, at lease visually, to literary Japanese. Therefore Haku does not use Roman letters, Arabic digits or common arithmetic, logical and comparison operators. And it supports top-to-bottom, right-to-left writing.

Grammar

The main motivation for Haku is the difference in grammar between Japanese and most Indo-European languages. In particular, it has subject-object-verb order. This makes the familiar programming constructs quite different.

Some time ago I ran a poll about how coders perceive function calls, and 3/4 of respondents answered "imperative" (other options were infinitive, noun, -ing form).

In Japanese, the imperative (命令形 meireikei, "command form") is rarely used. Therefore in Haku you can't use this form. Instead, you can use the plain form, -masu form or -te form, including -te kudasai. Whether a function is perceived as a verb or a noun is up to you, and the difference is clear from the syntax. If it is a noun, you can turn it into a verb by adding suru, and if it is a verb, you can add the 'no' or 'koto' nominalisers. And you can conjugate the verb forms.

Naming and giving meaning

In principle, programming language does not need to be based on natural language at all. The notorious example is APL, which uses symbols for everything. Agda programmers also tends to use lots of mathematical symbols. It works because they are very familiar with those symbols. An interesting question is if an experienced programmer who does not know Japanese could understand a Haku program; or if not, what the minimal changes would be to make it understandable.

To allow to investigate that question, the Scheme and Raku emitters for Haku supports (limited) transliteration to Romaji. And there is also Roku, but more about that later ...

Parsing

Japanese does not use spaces. So how do we tokenise a string of Japanese?

  • There are three writing systems: katakana (angular), hiragana (squigly) and kanji (complicated).
  • Katakan is used in a similar way as italics
  • Nouns, verb, adjectives and adverbs normally start with a kanji
  • Hiragana is used for verb/adjective/adverb endings and "particles", small words or suffixes that help identify the words in a sentence.
  • A verb/adjective/adverb can't end with a hiragana character that represents a particle.

So we have some simple tokenisation rules:

  • a sequence of katakana
  • a kanji followed by more kanji or hiragana that do not represent particles
  • hiragana that represent particles

Where that fails, we can introduce parentheses. In practice, only specific adverbs and adjectives are used in Haku. For example:

ラムダ|は|或|エクス|で|エクス|掛ける|エクス|です

ラムダ: katakana word
は: particle
或: pre-noun adjective
エクス: katakana word
で: particle
エクス: katakana word
掛ける: verb
エクス: katakana word 
です: verb (copula)

Implementation

Haku is implemented in Raku, a gradually-typed multi-paradigm language.
The parser uses Raku’s Grammars. It is a recursive descent, longest token match parser. The parser populates an AST using Haku’s Actions. The AST is an algebraic datatype implemented using Haku’s Parameterized Roles. The emitter generates Raku code which is executed via dynamic module loading. Currently all type checking is delegated to Raku.

About the name

I call it 'haku' because I like the sound of it, and also because that word can be written in many ways and mean many things in Japanese. I was definitely thinking about Haku from Spirited Away. Also, I like the resemblance with Raku, the implementation language. I would write it 珀 (amber) or 魄 (soul, spirit).