My first try would probably be to use SP for one stack and some other fixed register for the other stack. Also, traditional Forth interpreters don't use code like you showed. I think the main benefit of using Forth (unless you just happen to love RPN) is the compactness of the interpreted code. If you're not going to use an interpreter, then I'm not sure if Forth is really a win.
Edit: Just to clarify about the interpreter point, I'd expect something like "1 2 + >r" to be represented at run time as four words in memory and there'd be an interpreter, written in machine code, reading code word sequences and jumping to small bits of machine code to either do a built-in function or push the call stack.
What you've described is a threaded Forth, which is indeed the most common type of implementation. I was thinking about writing a subroutine-threaded (no inner interpreter) Forth that inlines common primitives and does some peephole optimization. Not necessarily as compact, but much faster. Forth still provides a flexible, structured programming environment compared to raw assembly, to say nothing of Forth's metaprogramming facilities. I'd also say that a fair amount of Forth's compactness comes from the programming style- lots of tiny subroutines allowing extensive code reuse.
That's interesting. Sorry for not getting it right away. It's making more sense on second reading. :)
Now I'm curious how you'd encode subroutine calls. Your example was completely inlined but the call-sequence will be critical in determining how compact the code ends up.
Also, were you thinking of using a cross-compiler and keeping the dictionary out of the DCPU-16s memory? Let me know if you create a public repo. I'd be interested to see what you come up with!
No worries, I should've been more detailed in my original post- what I'm describing is definitely an atypical (if not unprecedented) Forth. I agree, the call-sequence is critical. My initial thought was to ensure that we always leave the stack pointers in "return stack mode" before calling, and return them to that state before exiting a routine. Thus, procedures:
: A dup * ;
: B 2 A A 1 A ;
would look like:
A:
SET Y, SP
SET SP, X // switch to data
SET PUSH, PEEK
SET A, POP
SET B, POP
MUL A, B
SET PUSH, A
SET X, SP
SET SP, Y // switch to return
SET PC, POP
B:
SET Y, SP
SET SP, X // switch to data
SET PUSH, 2
SET X, SP
SET SP, Y // switch to return
JSR A
JSR A
SET Y, SP
SET SP, X // switch to data
SET PUSH, 1
SET X, SP
SET SP, Y // switch to return
JSR A
As you can see, a little bulky, but doable. Optimizing stack operations to make better use of the registers can get rather nasty- still thinking about the best way to approach it. I'm definitely thinking in terms of a cross-compiler, and ignoring things like defining words for the sake of simplicity, at least at first. I'll drop you a comment if I get a prototype working.
Edit: Just to clarify about the interpreter point, I'd expect something like "1 2 + >r" to be represented at run time as four words in memory and there'd be an interpreter, written in machine code, reading code word sequences and jumping to small bits of machine code to either do a built-in function or push the call stack.