What you've described is a threaded Forth, which is indeed the most common type of implementation. I was thinking about writing a subroutine-threaded (no inner interpreter) Forth that inlines common primitives and does some peephole optimization. Not necessarily as compact, but much faster. Forth still provides a flexible, structured programming environment compared to raw assembly, to say nothing of Forth's metaprogramming facilities. I'd also say that a fair amount of Forth's compactness comes from the programming style- lots of tiny subroutines allowing extensive code reuse.
That's interesting. Sorry for not getting it right away. It's making more sense on second reading. :)
Now I'm curious how you'd encode subroutine calls. Your example was completely inlined but the call-sequence will be critical in determining how compact the code ends up.
Also, were you thinking of using a cross-compiler and keeping the dictionary out of the DCPU-16s memory? Let me know if you create a public repo. I'd be interested to see what you come up with!
No worries, I should've been more detailed in my original post- what I'm describing is definitely an atypical (if not unprecedented) Forth. I agree, the call-sequence is critical. My initial thought was to ensure that we always leave the stack pointers in "return stack mode" before calling, and return them to that state before exiting a routine. Thus, procedures:
: A dup * ;
: B 2 A A 1 A ;
would look like:
A:
SET Y, SP
SET SP, X // switch to data
SET PUSH, PEEK
SET A, POP
SET B, POP
MUL A, B
SET PUSH, A
SET X, SP
SET SP, Y // switch to return
SET PC, POP
B:
SET Y, SP
SET SP, X // switch to data
SET PUSH, 2
SET X, SP
SET SP, Y // switch to return
JSR A
JSR A
SET Y, SP
SET SP, X // switch to data
SET PUSH, 1
SET X, SP
SET SP, Y // switch to return
JSR A
As you can see, a little bulky, but doable. Optimizing stack operations to make better use of the registers can get rather nasty- still thinking about the best way to approach it. I'm definitely thinking in terms of a cross-compiler, and ignoring things like defining words for the sake of simplicity, at least at first. I'll drop you a comment if I get a prototype working.