No worries, I should've been more detailed in my original post- what I'm describing is definitely an atypical (if not unprecedented) Forth. I agree, the call-sequence is critical. My initial thought was to ensure that we always leave the stack pointers in "return stack mode" before calling, and return them to that state before exiting a routine. Thus, procedures:
: A dup * ;
: B 2 A A 1 A ;
would look like:
A:
SET Y, SP
SET SP, X // switch to data
SET PUSH, PEEK
SET A, POP
SET B, POP
MUL A, B
SET PUSH, A
SET X, SP
SET SP, Y // switch to return
SET PC, POP
B:
SET Y, SP
SET SP, X // switch to data
SET PUSH, 2
SET X, SP
SET SP, Y // switch to return
JSR A
JSR A
SET Y, SP
SET SP, X // switch to data
SET PUSH, 1
SET X, SP
SET SP, Y // switch to return
JSR A
As you can see, a little bulky, but doable. Optimizing stack operations to make better use of the registers can get rather nasty- still thinking about the best way to approach it. I'm definitely thinking in terms of a cross-compiler, and ignoring things like defining words for the sake of simplicity, at least at first. I'll drop you a comment if I get a prototype working.