2015-04-26

いつかきっと

なんとなく夢想したことをメモしておく。

きっかけはRustでOSを書くというツイートを見たことから始まる。もとのツイートが探せない程度のツイート検索力なのはご容赦いただきたい。それを見たときにCでOSが書けるのはGCCがアセンブラとリンカを持っていてるからではないかと思ったわけだ。まぁ、実際にはリンカが重要で後は単にファイルフォーマットだと思うのだが。そこで、こいつらを全部Schemeで書いてしまえば、SchemeがC並みのことをできる、つまり現状ではほぼCの独占状態である低レベルプログラミングの一角をSchemeが崩すことができるとふと夢想した。

実際にこれらを実装するとなるとかなりの労力が必要になるので夢想するだけで終わらせようかなと思ったのだが、せっかくだし暇を見つけてできるところまでやってみようかなぁと思い立った。まず必要になりそうなのは以下だろうか?
  • アセンブラ(とりあえずx86、x64で)
  • リンカ(ELFとPE辺りで)
っで、この上にSchemeの処理系を作るとさらに以下が必要になる(と思われる):
  • libc相当の何か
  • コンパイラ
リンカを作るのならlibc相当の何かはlibcでいいような気もするが、折角なのでCに全く依存しない処理系にしたいところである。

とりあえず下調べしたところでは、Schemeで実装されたアセンブラは2つある。一つはx86用のsassyで、もう一つはindustriaに実験的に組み込まれているx86、x64用ライブラリ。どちらもELFを吐くことができる。個人的にはsassyがいいかなぁと思ったのだが、これをx64用に拡張するのは酷く面倒そうに見える。問題になるのはx86用のレジスタ等がハードコーディングされているので、いろいろ考えるとこの辺はルールベースで良しなに扱ってほしいところ。例えば、x86だけをサポートするようとかだとフラグ立てるよりは、組み込むルールを限定した方が変なバグを埋め込まない気がする。一番の問題は、僕がこの辺に明るくないことか。とりあえずNASMのソースを見つつ、Intelのマニュアルを見つつやるしかないだろう。

残念ながらSchemeで書かれたリンカは見つけられなかった。 リンカのことを扱った本があった気がするのだが、名前を思い出せない。LLVMのリンカ部分を読めばいいだろうか?一番の不安材料である(ここで挫折して放置する可能性が高い)。

libc相当はとりあえずLinux(気が向いたらWindowsも)限定にすればシステムコール上に構築すればいいだけなので、ひたすら地味な作業になるだけと踏んでいる。dietlibcのような小さい実装もあるし(読みやすいと踏んでいるだけ)。

コンパイラはCPS変換するものにしておけばcall/ccの実装が楽になる(はず)。マクロ展開は辛いところではあるが、 既に実装しているので何とかなると思いたい。最悪ポータブルなものを使ってしまってもいいわけだし。

全てをR6RSポータブルにすれば、ホスト処理系を(理論上は)限定する必要がなくなるはず。R7RSじゃないのは浮動少数点の扱いが弱いから。いずれ実装する必要があるとしても、楽できるところは楽したい。

さて、これ何かしら動くものができるのは何年後になるのかね?

2015-04-14

Comparison of er-macro-transformer 2

Previous article: Comparison of er-macro-transformer

I've heard er macro and syntax-case can do the same things (or they have almost the equal power). Now I'm wondering if this is really true or not. Once upon a time, there was an R6RS portable OOP library called nausicaa-oopp. For some reason its repository is removed from GitHub. The library used syntax-case to its limit (I think) and provided CL like multi method and other OOP things.

One of the remarkable thing of the library was it inserts identifiers which renamed by syntax-case or syntax and binds a macro to it, then inside of the macro it refers the original identifier. For example, it can do like this thing:
;; <beta> is a macro which represents a class of object o.
;; with-tags binds a macro which name is o, thus the o in its body
;; refers to the macro. however inside of the macro o, it refers
;; given o.
(define (beta-def-ref o)
  (with-tags ((o <beta>))
    (list (o d) (o e) (o f))))
If you just read the comment, it seems pretty much normal thing however the macro with-tags requires to have the same name as the o passed to beta-def-ref. Thus this would raise an error.
(define (beta-def-ref o)
  (with-tags ((oo <beta>))
    (list (oo d) (oo e) (oo f))))
Can this be done on er macro? The answer is yes. This is the piece of code: https://gist.github.com/ktakashi/63745cf1b7e0a018b64f

Then how portable is this? I know there is no concrete specification of er macro so I would expect this type of code may not be portable. So I've run it on Chicken, Chibi, Gauche and Sagittarius. And the following is the result:
Chicken:
dispatched!
dispatched!
dispatched!
(a a a)
dispatched!
dispatched!
dispatched!

Error: unbound variable: oo

 Call history:

 <eval>   (cdr2441 x2345)
 <eval>   (pair?2514 x2440)
 <eval>   (null?2515 (cdr2516 x2440))
 <eval>   (cdr2516 x2440)
 <eval>   (car2519 x2440)
 <eval>   (pair?2536 w2518)
 <eval>   (car2539 w2518)
 <eval>   (cdr2541 w2518)
 <eval>   (display (quote dispatched!))
 <eval>   (newline)
 <eval>   (r src)
 <syntax>   (beta-def-ref2 (quote a))
 <syntax>   (quote a)
 <syntax>   (##core#quote a)
 <eval>   (beta-def-ref2 (quote a))
 <eval>   [beta-def-ref2] (list2684 (oo2680 d2685) (oo2680 e2686) (oo2680 f2687)) <--

Chibi:
dispatched!
dispatched!
dispatched!
(a a a)
dispatched!
dispatched!
dispatched!
ERROR in beta-def-ref2: undefined variable: oo
  called from <anonymous> on line 1039 of file /home/takashi/sandbox/share/chibi/init-7.scm
  called from <anonymous> on line 542 of file /home/takashi/sandbox/share/chibi/init-7.scm
  called from <anonymous> on line 622 of file /home/takashi/sandbox/share/chibi/init-7.scm
Searching for modules exporting oo ...
... none found.

Gauche:
dispatched!
dispatched!
dispatched!
*** ERROR: unbound variable: o
    While loading "./er.scm" at line 154
Stack Trace:
_______________________________________
  0  o

  1  (beta-def-ref 'a)
        At line 154 of "./er.scm"

Sagittarius:
dispatched!
dispatched!
dispatched!
(a a a)
dispatched!
dispatched!
dispatched!
Unhandled exception
  Condition components:
  1. &undefined
  2. &who oo
  3. &message unbound variable oo in library user

stack trace:
  [1] beta-def-ref2
    src: (lambda (o) (with-tags ((oo <beta>)) (list (oo d) 
    "er.scm":157
  [2] load
Chicken, Chibi and Sagittarius worked as I expected. It seems Gauche can't resolve the original binding. I guess Chicken requires to put explicit phase when importing a library if I want to use it in procedural macro but I don't know how on R7RS extension. Updated on 11/05/2015: Using import-for-syntax resolves this. Thanks evhan!

During writing the code, I've notice a small thing. Chibi and Chicken accepts the following:
(define-syntax foo (er-macro-transformer (lambda (f r c) (r '(display "hoge")))))
(foo)
;;-> prints "hoge"
However Gauche and Sagittarius raise an error. As a user, it's convenient if I can rename a list with rename procedure but is this actually common in sense of er macro?

2015-04-10

Improving string->utf8 and utf8->string

The procedures string->utf8 and utf8->string can be categorised in those mostly used procedures (at least in my mind). Until 0.6.2, the conversion was done via textual port allocated however this approach allocates lots of chunk of memory (one conversion allocates at least 32bytes for binary, and 128bytes for text) and calls bunch of C function which may not be inlined because of function pointer.

If we don't know which encoding to use, then this is not bad. But we know it's UTF8. So I've made changes to use UCS4->UTF8 and UTF8->UCS4 conversion directly. And making sure memory allocation only happens once at a conversion. Now, if changes are for performance improvements, then I must measure how much it's improved. So, I've used the following piece of code to benchmark.
#!r6rs
(import (rnrs) (time))

(define t)

(define (file->string file) (call-with-input-file file get-string-all))

(define-syntax dotimes
  (syntax-rules ()
    ((_ (i count) body ...)
     (do ((i 0 (+ i 1)))
         ((= i count) #t)
       body ...))))

;; CMakeLists.txt has appox 20KB contents.
(define bv (string->utf8 (file->string "CMakeLists.txt")))
(define s (file->string "CMakeLists.txt"))

(time
 (dotimes (i 10000)
   (set! t (utf8->string bv))))

(time
 (dotimes (i 10000)
   (set! t (string->utf8 s))))
If I use small string or bytevector, then I probably wouldn't see much improvements, so I used a bit bigger one. (the CMakeLists.txt now contains 20KB of data... I feel like I need to refactor it...)

And the result.
$ sash test2.scm

;;  (dotimes (i 10000) (set! t (utf8->string bv)))
;;  4.170478 real    4.163981 user    2.8e-500 sys

;;  (dotimes (i 10000) (set! t (string->utf8 s)))
;;  2.646136 real    2.638321 user    0.003989 sys

$ ./build/sagittarius test2.scm

;;  (dotimes (i 10000) (set! t (utf8->string bv)))
;;  1.437317 real    1.431425 user    0.003978 sys

;;  (dotimes (i 10000) (set! t (string->utf8 s)))
;;  1.210154 real    1.208894 user    0.000000 sys
It's now as twice as faster than befure. The result doesn't show but the GC occurrence is also improved. (approx. it occurs a half number.) This is kinda obvious because it doesn't allocate unnecessary memory at all now.

I've also compared with other implementations, Vicare, Mosh and Ypsilon. To run on Vicare and Mosh, I needed to create the compatible layered (time) library like this:
;; time.vicare.sls
(library (time)
    (export time)
    (import (only (vicare) time)))

;; time.mosh.sls
(library (time)
    (export time)
    (import (only (mosh) time)))
I couldn't find out how to create such a library on Racket, so just skipped.
Then this is the results.
$ vicare -L . test2.scm
running stats for (dotimes (i 10000) (set! t (utf8->string bv))):
    103 collections
    1818 ms elapsed cpu time, including 98 ms collecting
    1821 ms elapsed real time, including 100 ms collecting
    864268560 bytes allocated
running stats for (dotimes (i 10000) (set! t (string->utf8 s))):
    26 collections
    1539 ms elapsed cpu time, including 7 ms collecting
    1539 ms elapsed real time, including 7 ms collecting
    216186272 bytes allocated
$ mosh --loadpath=. test2.scm

;;3.1819961071014404 real 3.178515 user 0.0 sys

;;6.042346000671387 real 6.038440000000001 user 0.0 sys

$ ypsilon test2.scm

;;  0.198673 real    0.211882 user    0.021317 sys

;;  0.087744 real    0.108816 user    0.012085 sys
Ypsilon is the fastest. It's because Ypsilon uses UTF-8 as its internal string representation. Thus it just needs to copy it without conversion.Whenever I saw this type of difference because of the different choice, I would always think I've made a wrong decision. Well, it's too late anyway...

For now, it's only for UTF8 conversion procedures but if I start using other encodings (e.g. UTF16) a lot, I might apply the same trick.