Hacker News new | past | comments | ask | show | jobs | submit login
History of T (2001) (paulgraham.com)
69 points by swatson741 on July 15, 2023 | hide | past | favorite | 22 comments



A totally tangential comment on CS at Yale during that era: Yale was very nice to the local towns and allowed high school students to take classes, so in 1981 I took their intro to CS. The bulk of the course was run in APL, with a bit of pascal at the end.

But the thing I remember most? The computer room was in the basement of the music annex, which had a bunch of practice rooms. So the geeks would all be hacking away on a bunch of VT100s while the sounds of musicians training to be professionals rained down from above. It was a great place to hack on machines!

[I then took Sussman/Abelson's intro course (SICP) at MIT about a year later, which was quite a contrast. Or maybe I was much more mature, having left high school. :) ]


Related:

History of T (2001) - https://news.ycombinator.com/item?id=17784416 - Aug 2018 (38 comments)

Olin Shivers: History of T - https://news.ycombinator.com/item?id=8897473 - Jan 2015 (1 comment)

History of T - https://news.ycombinator.com/item?id=6778754 - Nov 2013 (42 comments)

Olin Shivers: History of T - https://news.ycombinator.com/item?id=3646093 - Feb 2012 (1 comment)

History of T - https://news.ycombinator.com/item?id=255245 - July 2008 (27 comments)


It has a homepage, with sources, now:

http://mumble.net/~jar/tproject/


In particular, I find it amusing the comment about CLOS using send and smalltalk message passing semantics, xlisp was the last common lisp I remember that used smalltalk style message passing for it's OO, and it deprecated it in the early 90s in favor of a new system before rewriting itself as a scheme instead of a common lisp.

In fact, when we were taught common lisp in '91, we were specifically warned that the xlisp that we were using was an outlier in using messages, and that we should semi-ignore that when we came across a 'real' common lisp variant.


Can someone in the know please elaborate on why dynamic scoping was considered more efficient than lexical scoping?


Dynamic scoping is usually implemented with shallow binding, where a single value cell for a variable is located at a fixed address, with each new binding saving and restoring the enclosing value on the stack. This allows constant time variable access, even in an interpreter working directly from S-expressions without a compilation phase. Lexical scoping in a pure interpreter requires a linear search through an environment. In compiled code, lexically-scoped variable access is more efficient than dynamically-scoped variable access, since it can be implemented with either a stack ___location at a fixed offset in the frame, or inside a closure object at a fixed offset.


Because it's just one global instead of constant space added to all the closures/stacks. One map of symbols to values, not a whole world of them. If I had to guess.


i don't think dynamic scoping saves you space; with either dynamic scoping or lexical scoping, each call to a function that binds three variables must allocate space for three values. the only difference from that perspective is that, with dynamic scoping (and shallow binding!), the newly allocated space holds the old values of the variables, while with lexical scoping (or deep binding) it holds the new values

some implementations of dynamic scoping with shallow binding store each of the three values in a separate cons cell, doubling the space cost, but that's not essential to the dynamic/lexical distinction


You can get away with having a single global state variable so long as it's an association table and you keep track of what variables you've introduced. I believe that the reason why people thought it would be slow then was that no one had used a data structure that could store multiple results for the same value.


useful context is that in most programs a very significant fraction of the instruction mix consists of reads of local variables; another significant fraction consists of passing arguments to subroutines and calling them. so the efficiency of these two operations is usually what is relevant

with dynamic scoping, the current value of a given variable is always in the same ___location in memory, so you can just fetch it from that constant ___location; when you enter or exit a scope that binds that variable, you push or pop that value onto a stack. a function pointer is just a (tagged) pointer to a piece of code, typically in machine language but sometimes an s-expression or something

with lexical scoping, in an environment that supports recursion, the usual implementation is that you have to index off the frame pointer to find the current ___location of the variable. worse, if your lexical scoping supports nested scopes (which is generally necessary for higher-order programming), you may need to follow index off the frame pointer to find the closure display pointer, then index off the closure pointer to find the variable's value. and then, if it's mutable, it probably needs to be separately boxed (rather than just copied into the closure when the closure is created, which would mean changes made to its value by the inner scope wouldn't be visible to the outer scope that originally created it), which means that accessing its value involves indexing off the frame pointer to find the context or closure display pointer, indexing off the closure display pointer to find the pointer to the variable, and then dereferencing that pointer to get the actual value. also, supporting closures in this way means that function pointers aren't just pointers to compiled code; they're a dynamically allocated record containing a pointer to the compiled code and a context pointer, like in pascal, with a corresponding extra indirection (and hidden argument, but that's free) every time you call a function.

there's a tradeoff available where they can just be those two pointers, instead of taking the approach I described earlier where the closure display is a potentially arbitrarily large object; but in that case accessing a variable captured from a surrounding scope can involve following an arbitrarily long chain of context pointers to outer scopes, and it also kind of requires you to heap-allocate your stack frames and not garbage-collect them until the last closure from within them dies, so it's a common approach in pascal (and gcc uses it for nested functions in c, with a little bit of dynamic code generation on the stack to keep its function pointers to a single word) but not in lisps

dybvig's dissertation about how he wrote chez scheme is a good source for explanations on this, and also explains how to implement call/cc with reasonable efficiency. http://agl.cs.unm.edu/~williams/cs491/three-imp.pdf

current cpus and operating systems favor lexical scoping more than those in the late 01970s and early 01980s for a variety of reasons. one is that they are, of course, just much faster and have much more memory. also, though, address sizes are larger, and instruction sizes are smaller, so it's no longer such a slam-dunk trivial thing to just fetch a word from some absolute address; the absolute address doesn't fit inside a single instruction, so you have to compute it by offsetting from a base register, fetch it from memory (perhaps with a pc-relative load instruction), or compose it over the course of two or more instrutions. and currently popular operating systems prefer all your code to be position-independent, so it may be inconvenient to put the variable value at a fixed absolute address anyway; if you try, you may find upon disassembling the code that you're actually indexing into a got or something. finally, though i don't know if this figured into the lispers' world, 8-bit cpus like the 6502 and 8080 were especially terrible at indexing, in a way that computers hadn't been since the 01950s and haven't been since, which made the performance gain of absolute addressing more significant

i don't know enough about addressing modes on machines like the vax, the dorado, and the pdp-10 to know what the cost was, but on arm and risc-v (and so i assume mips) you can index off the stack pointer with an immediate constant for free; it doesn't even require an extra clock cycle

a couple of points in favor of the efficiency of lexical scoping: when you have a cache, having all your local variables packed together into a stack frame is better than having them scattered all over the symbol table; popping a stack frame is a constant-time operation (adding a constant to the stack pointer, typically), while restoring n values on exiting a dynamic scope requires n operations; and the space usage of local variables on a call stack only includes the variables currently in scope, rather than all the distinct variable names used anywhere in the program, as is the case with lexical scoping

in a sense, the way you handle callee-saved registers in assembly language is pretty much the same as how you handle dynamically-scoped variables: on entry to a subroutine that changes r4 or rbp, you save its current value on a stack, and on exit you restore that value, and any value you put there can be seen by your callees. oddly enough, this turns out to be another efficiency advantage for lexical scoping; with dynamic scoping, when you call another function, the language semantics are that it could change any of your local variable values before it returns, so it's not safe for the compiler to cache its value in a cpu register across the call. (unless the variable is assigned to that cpu register for your entire program, that is, which requires whole-program optimizations that run counter to the lisp zeitgeist.) this has become an increasingly big deal as the relative cost of accessing memory instead of a cpu register has steadily increased over the last 40 years

i'm no expert in this stuff, though i did implement a subset of scheme with recursion, lexical scoping, closures, and enough power to run its own compiler (and basically nothing else): http://canonical.org/~kragen/sw/urscheme so i could totally be wrong about any of this stuff


Don't forget that lexicals can be subject to some nice optimizations. A lexical variable that is loaded with a constant and never assigned can be constant-propagated, and disappear. Lexical variables that aren't captured in closures can be moved into registers.

There are more opportunities to do these kinds of things with lexical variables compared to dynamic, because we don't know what happens with a dynamic variable when some function is called that we know nothing about. If a scope calls any functions at all whose definitions are not available or cannot be analyzed, then it has to be assumed that any dynamic variable used in the scope is also accessed and mutated by those functions, which defeats these optimizations.

When native code is generated, VM temporary registers can be allocated to real machine registers; any variables that have been moved to VM temp registers benefit.

Your observation is spot on that the discipline for saving and restoring registers is like binding and unbinding dynamic variables. But there is only a fixed, relatively small number of registers, which makes it practical to save and restore them transparently at thread context switches. This means that even under multi-threading, procedures can just save and restore registers as if everything were single-threaded. Dynamic variables under threading cannot be treated by saving and restoring. That would require each thread to have its own, which would be expensive to switch. (The top-level/global bindings of dynamics have to be seen by all threads, also, which means that if threads have their own file of dynamic variables, there nevertheless has to be some indirection so that threads can share the global bindings.)

Under a deep binding strategy for dynamic scope, you have an environment chain of scopes for dynamic variables. That scope is rooted at a single pointer, which can be turned into a thread context: so one word of thread context switches dynamic scopes. With deep binding, some optimizations are possible: when a scope is entered that references a dynamic variable, that variable can be looked up in the deep environment to retrieve its value cell, and that value cell can be forwarded into the lexical frame or a register. All the accesses of the variable in the scope refer to that register, which provides a quick indirection to the variable, without the environmental lookup having to be repeated. Dynamic variables can also be forwarded to aliases at the top of the environment for faster lookup next time. We cannot move a dynamic variable from a deeper frame to the top frame, because that would make it disappear from some upstream scope we have to return to. But we can plant an alias which references the same ___location.


now that i understand your comment after a few readings, these are all good points (and, for the sake of those following along, you have a lot more expertise in this than i do)

thank you


as usual, thanks for commenting


you're welcome


As someone who loves Lisp and PL research in general, I enjoyed reading this article! I'm familiar with the original Scheme papers and I've been slowly reading Compiling with Continuations. It's really cool hearing the personal accounts of those who worked on interesting projects!


I think this is from 2001 per Wayback. I hate that PG's essays don't have dates.


PG's essays do have dates. Try clicking through to all the items in http://paulgraham.com/articles.html. I can't remember ever seeing one that didn't have month and year.

OP wasn't written by PG, though. There's no index for such posts that I'm aware of, so I don't know what their information architecture is.


Same. Even if essays are timeless, it still helps to know the context in which they were written.


Olin Shivers’s PhD thesis is critical reading for anyone looking to analyze modern dynamic languages. Super interesting. Look back to look forward.


He taught my CS 101 course in undergrad. One of the most brilliant people I’ve ever met.

Towards the end of the course he guided us through how to build a Y combinator, and that alone was the best computer science education that I’ve ever received; I was left with a deep interest and curiosity which I’ve found more valuable than any coursework I can remember before or since.


Small error in this essay: the “west coast” lisp Interlisp was being developed at Xerox PARC at the time but had also originated on the east coast when Danny Bobrow left MIT for BBN (IIRC air was originally called BBN Lisp). Interlisp for the PDP-10 moved with him to PARC.


> I fashionably decried premature optimisation in college without really understanding it until I once committed an act of premature opt so horrific that I can now tell when it is going to rain by the twinges I get in the residual scar tissue.

What an amazing line!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: