The update that describes why 4KB of RAM was necessary was interesting to me. In 1998 I wrote a Java Virtual Machine in assembly language for a PIC-like microcontroller that only had 4k of ROM (actually it was 4k instructions, but they were 12-bit words giving 6kB). I managed to get all of the important features of the language to fit, including exception handling and an interactive debugger. However by the end, each time I wanted to add a new instruction to my code I had to find something else to optimize first so it would still fit in 4k.
By the end I convinced myself that 4k was the minimum possible code size for implementing a JVM.
As an aside, the SX microcontroller only had 256 bytes (yes bytes!) of RAM internally so I used an external 32kB SRAM for the Java heap and stack. The 256 bytes of internal RAM held the internal state of the interpreter and memory manager.
Around the same time, I wrote a JVM for the PalmPilot [1]. I think it had to fit into 128K of ram with an additional 256K of memory for data (like the class libraries). I was able to fit the entire JDK 1.0.2 runtime into it by transforming the byte code.
The hardware was interesting too. It bit-banged the video display circuit, leaving only the re-trace interval for general purpose computation, including running BASIC programs. Yet the result was an extremely simple and inexpensive design.
Every early microcomputer had to come up with a solution to the video display problem. As I understand it, the Apple II had a bus architecture that interleaved the clock timing of the video circuit and the CPU, sharing a single bus. The Commodore computers had custom video graphics chips. And so forth.
Reading back over the assembly code gives me fond memories - but I am glad I don't spend my nights debugging my code with an oscilloscope and flashing LEDs anymore. It is a bit more productive coding in Ruby!
The JVM had some limitations to fit into 4k: no floating point, 16-bit integers and no dynamic linking. There was a PC program that took the Java class files, statically linked them and translated the bytecode to a more compressed form for download to the chip.
At the time I wrote this code there was a lot of talk about Java specific processors. Ultimately that was a mistake because Moore's law meant the x86 architecture (and JIT technology) got faster more quickly that anyone could get chips to market.
These days I am about as far from embedded systems as you can get. I write Ruby on Rails code for aha.io. There is a lot of satisfaction in seeing the perfect waveform on your oscilloscope when the assembly code finally works - but these days I think there is much more satisfaction in being able to crank out a complex algorithm with some elegant Ruby and have it being used by customers before going home for dinner.
Well, it was 16 years since I wrote most of this so my memory is a bit fuzzy...
The stack and heap are both stored in an external 32kB SRAM. The stack is accessed simply by pushing and popping using the JVMPush and JVMPop routines around line 4553. The CPU is 8-bit, bit the JVM is 16-bit, so everything takes two operations to write both bytes. You can see the stack frame format at line 48.
Java objects and arrays are allocated on another stack that acts as the heap. Objects are allocate only - they are never freed. This isn't as big a deal as you might think since in embedded applications the code tends to just repeat the same operations over and over so you write your code to reuse the same instances instead of allocating new objects (which is slow anyway).
_do_new on line 2044 allocates a new object. Arrays are allocated at line 1991 in _l_j_newarray.
The nice thing about the JVM (at least version 1 which this implements) is that there are not many variations on operations - and if you statically link you can reduce some of the variations to common cases too.
I had a hunch it might be like this. The embedded Java I did on the Dallas part was using fixed sized arrays.
I'd actually love an annotation for Python that ran it under the null-collector. Lots of times in short run or steady state programs one doesn't generate any garbage and the constant GC or ref count over head could done away with.
Did you look @ the Bob language? It was the spiritual seed for Java by David Betz http://www.xlisp.org/
By the end I convinced myself that 4k was the minimum possible code size for implementing a JVM.
The JVM was used in the Parallax Javelin Stamp - a Java version of the BASIC Stamp (http://en.wikipedia.org/wiki/BASIC_Stamp).
As an aside, the SX microcontroller only had 256 bytes (yes bytes!) of RAM internally so I used an external 32kB SRAM for the Java heap and stack. The 256 bytes of internal RAM held the internal state of the interpreter and memory manager.