Rendered at 21:54:26 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
praptak 2 days ago [-]
SBCL uses a single zero bit to tag integers. This trick means the representation of n is just 2n, so you can add the values directly without any decoding.
It obviously also means that all the other tag values have to use 1 as the last bit.
guenthert 1 days ago [-]
That also implies that NIL cannot be represented as 0, which is a pity since testing the Z flag would be quick. I'd think someone ran the numbers and found the chosen encoding superior, but that would have been long ago (in the CMUCL code base).
twoodfin 1 days ago [-]
I suspect with modern CPU pipelining and branch prediction that most of the (very interesting) debates about exactly how to tag values and pointers have become inconsequential above the noise level.
Would love to see recent work demonstrating this isn’t true!
HexDecOctBin 1 days ago [-]
Why do they use the bottom bit for tag and not the top bit?
kryptiskt 1 days ago [-]
Traditionally it has been done because the last three bits in an object pointer typically are always zero because of alignment, so you could just put a tag there and mask it off (or load it with lea and an offset, especially useful if you have a data structure where you'd use an offset anyway like pairs or vectors). In 64-bit architectures there are two bytes at the top that aren't used (one byte with five-level paging), but they must be masked, since they must be 0x00 or 0xff when used for pointers. In 32-bit archs the high bits were used and unsuitable for tags. All in all, I think the low bits still are the most useful for tags, even if 32-bit is not an important consideration anymore.
birdgoose 1 days ago [-]
The sibling comment explains why we prefer to use the lower bits as a tag (these are guaranteed to be zero if the value is a pointer on a 64-bit system).
Another reason why we wouldn’t want to use the top bit is that, as the parent comment suggested, the tagged pointer representation of a fixnum integer isn’t a pointer at all but is instead twice the number it represents. Generally speaking, we represent integers in twos-complement representation which uses that top bit to determine if the value is positive or negative.
thecloudlet 2 days ago [-]
That’s so cool. I did not know that.
vintagedave 1 days ago [-]
FYI thecloudlet, the last quote from a Reddit user at the end seems to have duplicated content (copy-paste error?)
I read both articles and am looking forward to your next! I’d be interested in understanding more about the relationship of EMacs to GCC since you noted the authors were the same and the internals were written with some compiler awareness.
thecloudlet 1 days ago [-]
Thank you for the kind reminder! I have removed the duplicate.
You made a great point. Since the original authors are the same, the fundamentals of the Emacs C core are indeed highly compiler-optimized. I hope I can come up with a way to fully understand and write about that history and relationship. (But to be honest, diving into that level of compiler history is a really hard topic to tackle!)
Thanks for the great inspiration and for taking the time to read!
_dky 1 days ago [-]
One similarity I noticed was the intermediate representation from GCC frontend looks and feels very Lispy. Could be just a coincidence too.
vintagedave 11 hours ago [-]
That is fascinating. I never never seen GCC IR, only LLVM IR, which to me is more assembly-like in a generic no-specific-machine kind of way.
I (briefly) searched for more info but this SO answer[0] implies they're quite similar. I'd be interested in understanding more. I feel like much of computer science has normalised to specific approaches that 'won' in the industry, where Lisp is an example of something interesting that largely didn't win -- with notable exceptions like HN itself! [1] :) And so hearing about uses of things like Lisp concepts in areas we might not know they're being used would be wonderful HN content.
Symbols are just list of numbers. Variable is just a nameless place in memory, but often associated with a symbol.
Numbers in symbols are printed out as ASCII-characters when it seems appropriate, like after SETQ.
Or we could decide that number-list that ends with 0 and contains only range(0x21,0x7F) is printed out as symbol. Does not matter, it is just syntactic sugar.
And We do not need strings for much anything. We could of course decide that number-list with ord('"') is printed out as string. The reader could also follow this protocol.
I had all this figured out at one time. And I dont remember any major issues. B-)
__patchbit__ 1 days ago [-]
Is AI able to measure the ratio of craft to cruft source lines of code in Emacs over its lifetime?
sillysaurusx 1 days ago [-]
Not AI, but I studied it extensively for about 6 months. I was trying to port Emacs to JS, line by line, about eight years ago.
I love Emacs' design. I think the cruft is minimal, and pretty much every line of code I studied had a good reason for being there.
And I also think there's a lot to learn from studying how Emacs is implemented. Few people will probably do this, but it was a nice experience for me. I learned a lot about gaps, text properties, how buffers are implemented, how the eval function works (it's surprisingly complicated between buffer-local variables and thread-local variables, but it's hard to think of a simpler alternative), and how intervals are implemented.
thecloudlet 1 days ago [-]
It’s not easy… I’m working to understand it. Recording the process through the way.
sillysaurusx 1 days ago [-]
I'd like to hear about it as you go! If you have any questions, definitely contact me. Love me some emacs.
thecloudlet 1 days ago [-]
Thanks!
scroy 1 days ago [-]
Why were you trying to port it to JS?
sillysaurusx 1 days ago [-]
It was fun, and I wanted to see how far I could get. Plus emacs in a browser without emscripten was appealing to me.
It obviously also means that all the other tag values have to use 1 as the last bit.
Would love to see recent work demonstrating this isn’t true!
Another reason why we wouldn’t want to use the top bit is that, as the parent comment suggested, the tagged pointer representation of a fixnum integer isn’t a pointer at all but is instead twice the number it represents. Generally speaking, we represent integers in twos-complement representation which uses that top bit to determine if the value is positive or negative.
I read both articles and am looking forward to your next! I’d be interested in understanding more about the relationship of EMacs to GCC since you noted the authors were the same and the internals were written with some compiler awareness.
You made a great point. Since the original authors are the same, the fundamentals of the Emacs C core are indeed highly compiler-optimized. I hope I can come up with a way to fully understand and write about that history and relationship. (But to be honest, diving into that level of compiler history is a really hard topic to tackle!)
Thanks for the great inspiration and for taking the time to read!
I (briefly) searched for more info but this SO answer[0] implies they're quite similar. I'd be interested in understanding more. I feel like much of computer science has normalised to specific approaches that 'won' in the industry, where Lisp is an example of something interesting that largely didn't win -- with notable exceptions like HN itself! [1] :) And so hearing about uses of things like Lisp concepts in areas we might not know they're being used would be wonderful HN content.
[0] https://stackoverflow.com/questions/40799696/how-is-gcc-ir-d...
[1] https://news.ycombinator.com/item?id=43089415
Symbols are just list of numbers. Variable is just a nameless place in memory, but often associated with a symbol.
Numbers in symbols are printed out as ASCII-characters when it seems appropriate, like after SETQ.
Or we could decide that number-list that ends with 0 and contains only range(0x21,0x7F) is printed out as symbol. Does not matter, it is just syntactic sugar.
And We do not need strings for much anything. We could of course decide that number-list with ord('"') is printed out as string. The reader could also follow this protocol.
I had all this figured out at one time. And I dont remember any major issues. B-)
I love Emacs' design. I think the cruft is minimal, and pretty much every line of code I studied had a good reason for being there.
And I also think there's a lot to learn from studying how Emacs is implemented. Few people will probably do this, but it was a nice experience for me. I learned a lot about gaps, text properties, how buffers are implemented, how the eval function works (it's surprisingly complicated between buffer-local variables and thread-local variables, but it's hard to think of a simpler alternative), and how intervals are implemented.