Same here. The main design decision is how to represent the secondary stack. I was thinking we could reserve a pair of registers to keep the parameter stack pointer and return stack pointer, and swap them out with SP as needed.
1 2 + >r
becomes something like
SET PUSH, 0x1
SET PUSH, 0x2
SET A, POP
ADD A, POP
SET X, SP // back up data stack pointer
SET SP, Y // switch to return stack
SET PUSH, A
My first try would probably be to use SP for one stack and some other fixed register for the other stack. Also, traditional Forth interpreters don't use code like you showed. I think the main benefit of using Forth (unless you just happen to love RPN) is the compactness of the interpreted code. If you're not going to use an interpreter, then I'm not sure if Forth is really a win.
Edit: Just to clarify about the interpreter point, I'd expect something like "1 2 + >r" to be represented at run time as four words in memory and there'd be an interpreter, written in machine code, reading code word sequences and jumping to small bits of machine code to either do a built-in function or push the call stack.
What you've described is a threaded Forth, which is indeed the most common type of implementation. I was thinking about writing a subroutine-threaded (no inner interpreter) Forth that inlines common primitives and does some peephole optimization. Not necessarily as compact, but much faster. Forth still provides a flexible, structured programming environment compared to raw assembly, to say nothing of Forth's metaprogramming facilities. I'd also say that a fair amount of Forth's compactness comes from the programming style- lots of tiny subroutines allowing extensive code reuse.
That's interesting. Sorry for not getting it right away. It's making more sense on second reading. :)
Now I'm curious how you'd encode subroutine calls. Your example was completely inlined but the call-sequence will be critical in determining how compact the code ends up.
Also, were you thinking of using a cross-compiler and keeping the dictionary out of the DCPU-16s memory? Let me know if you create a public repo. I'd be interested to see what you come up with!
No worries, I should've been more detailed in my original post- what I'm describing is definitely an atypical (if not unprecedented) Forth. I agree, the call-sequence is critical. My initial thought was to ensure that we always leave the stack pointers in "return stack mode" before calling, and return them to that state before exiting a routine. Thus, procedures:
: A dup * ;
: B 2 A A 1 A ;
would look like:
A:
SET Y, SP
SET SP, X // switch to data
SET PUSH, PEEK
SET A, POP
SET B, POP
MUL A, B
SET PUSH, A
SET X, SP
SET SP, Y // switch to return
SET PC, POP
B:
SET Y, SP
SET SP, X // switch to data
SET PUSH, 2
SET X, SP
SET SP, Y // switch to return
JSR A
JSR A
SET Y, SP
SET SP, X // switch to data
SET PUSH, 1
SET X, SP
SET SP, Y // switch to return
JSR A
As you can see, a little bulky, but doable. Optimizing stack operations to make better use of the registers can get rather nasty- still thinking about the best way to approach it. I'm definitely thinking in terms of a cross-compiler, and ignoring things like defining words for the sake of simplicity, at least at first. I'll drop you a comment if I get a prototype working.
Oh yeah, you could whip a sweet little Forth system up for this processor.
The first thing to do is to write a DCPU-16 assembler in Forth, and use that to write the primitives. That's pretty simple -- just look at the 6502 assembler: http://www.forth.org/fd/FD-V03N5.pdf
Using a metacompiler with a Forth DCPU-16 assembler would be the best way to go. Then you could easily experiment with different threading schemes, stack architectures, etc.
FORTH ?KNOW IF HONK ELSE FORTH LEARN THEN