Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How do you maintain personal annotations for code you don't control?
49 points by weinzierl 4 months ago | hide | past | favorite | 41 comments
I spend significant time reading and understanding codebases that I don't control (open source libraries, internal legacy systems, etc.). As I build understanding, I need to document my insights, gotchas, and mental models - but these notes are purely personal and shouldn't be part of the actual codebase.

My challenges:

1. These annotations need to be tightly coupled with specific locations in the source code (particular functions, variables, or even specific lines)

2. The underlying code changes regularly (new versions, updates from maintainers) which can break the connection between my notes and the code

3. My notes are private - they include half-formed thoughts, questions, and sometimes critical observations that wouldn't be appropriate as public comments

4. I want to preserve this knowledge across different machines and working environments

I've tried various approaches: - Local IDE bookmarks (lost between sessions) - Separate markdown files (hard to maintain precise code references) - Private forks with comments (becomes unmaintainable as source evolves)

I'm curious how others solve this problem. Do you have a systematic approach for maintaining personal annotations on code that's not under your control? How do you handle the challenge of the code evolving while keeping your notes relevant?

Would especially love to hear from people working with large codebases or those who regularly need to dive deep into external dependencies.




I recently came across a VS Code extension that does pretty much what you're looking for -

> Out-of-Code Insights is a Visual Studio Code extension that allows you to add annotations, notes, and comments without modifying your source files.

https://marketplace.visualstudio.com/items?itemName=JacquesG...

GitHub:

https://github.com/JacquesGariepy/out-of-code-insights/


Neat, I was looking for something like that from some time.


This is the first time I've ever heard of someone keeping private source-line-attached notes in a codebase. I work with very large codebases, but if I discover things about the codebase that required spelunking, I generally turn them into comments or documentation.

Of the requirements that you've laid out, I'd suggest that you need to either relax requirement 2 or 3:

If you relax requirement 2, you could keep your notes in a private fork.

If you relax requirement 3, and make your notes suitable for public consumption, you could submit your notes as comments and make the codebase easier for everyone to understand. (Or, at least, you could submit some of your comments, making the remainder easier to maintain privately.)


You wouldn't even need to relax requirement 2 too much, rebasing your commented fork on the trunk would actually help you keep your comments up to date.


The closest I've come to doing something like this is commenting poorly-commented code, and keeping my in-progress comments in a branch that I regularly rebase.

You said that becomes unmaintainable as the source evolves, but that's surely a fundamental property of keeping notes on changing code? You have to do work keeping your private comments up to date with any method.


If you're working within git, maybe `git notes` fit your use case? You can basically attach notes to various Git objects, without changing the objects themselves.

https://git-scm.com/docs/git-notes


It’s an interesting option, but AFAICT, it can only be used to add a note to a commit. So the note might not be that close to a particular function/variable/etc.


Is this what GitHub reviews use?


No, they're unrelated.


But wouldn’t it be nice if they were stored in git notes? It’ll never happen for a commercial git hosting product, because they want it to be hard to leave their service (you lose your PR review comment history), and storing them in git makes it too easy to migrate all your history to a competitor.

Building an open source code review system using git notes would be great though.


Leo editor allows to keep in sync its outline which combines your annotations and external files.

Obviously it isn't bulletproof and needs maintenance when it can't merge external changes automatically.

https://leo-editor.github.io/leo-editor/


I came here to post this.

To expand: With Leo editor, you convert the document/file into a tree of nodes (one way to do this is to make each function a node - they have plugins to do it automatically for well known languages like C++). Let's say you make a particular function a node. You can then make a new document in your own filesystem which has your notes, but you can make a "live" copy of the node linking to that function.

You now can see your notes along side that function. If you modify the "live" node, it will actually modify the original source file. Similarly, if the code changes (e.g. with a git pull), then Leo tends to do a good job of updating the references so that your node still points to the correct function.

The editor is a bit weird to learn, but once you get the hang of it, it's extremely powerful. I used this technique often while debugging messy bugs. I'd have my own document with live nodes to the test case, the test collateral, relevant source code, etc. Each node was simply a view to a portion of some corresponding file. This way, even though everything related to the test was scattered across several files, I could see everything related to the bug (test + source code) all in one document.

It's the one powerful feature that has yet to be replicated in Emacs.


> It's the one powerful feature that has yet to be replicated in Emacs.

Emacs cannot do this. There are many Emacs libraries or packages that need this feature (Org babel is a big one, transclusion is another), and have to work around its absence in hacky ways.


Not sure why you got downvoted - you are correct.[1]

I don't know if there is any fundamental limitation in Emacs/Elisp that prevents it, or that no one has succeeded in doing it. I suspect it's the latter. I also suspect the problem is trying to shoehorn this to work with Org Mode (which is what I want as well), but there may be a significant impedance mismatch between the Org code base and this feature.

Frankly, trying to manipulate the Org tree using Elisp is a nightmare compared to how simple it is in Leo (using Python). I've been trying to do with Org mode what is fairly basic in Leo: Traverse the tree, make changes, copy the tree to another file, with some headlines demoted/promoted due to rules, etc. Although I finally got something working, it took a lot of research as well as multiple packages. Whereas in Leo, people without a programming background manage to write the Python code to do this very easily.

[1] karthink has written a number of heavily used Emacs packages. He likely knows what he's talking about.


Coincidentally, I'm in the middle [1] of building something for https://CoCalc.com that is exactly what you're describing. For collaborative document editing (e.g., google drive and overleaf) it's a common feature, but for code editors it isn't. CoCalc is both. Anyway, nothing to see yet, but you might want to check with us in a month. After thinking about this problem a lot recently, I think it’s critical to store the comment locations with all versions of the file, so you don’t lose comment locations, or at least maximize the information you have available to locate comments when they get lost.

[1] https://github.com/sagemathinc/cocalc/pull/8071


I maintain a branch with my comments inline.

If the underlying code changes, I just update my comments.


> Separate markdown files (hard to maintain precise code references)

That shouldn’t be difficult. Most code repository systems support links to exact line numbers in specific commits, for example like [0]. Even in the event that the links stop working, you can still identify the commit hash, file name and line number from the URL.

[0] https://github.com/curl/curl/blob/3b057d4b7a7e6b811245fd0312...


This is my approach as well. I write a doc to myself with commit+line links for code, relevant snippets from documentation linked to the source, and annotated screenshots where that makes more sense, also linked if the source is web based.


I don't anymore and when I did the code didn't change much. But I haven't seen anyone mentioning processing the AST. Some things would break between changes, but if the language the code uses has a good AST traversal library, you could assign your notes to parts of the tree rather than source code locations, falling back to source code locations when that fails. It would still need manual maintenance, but would at least be less fragile than using solely line locations.


BABLR should eventually offer strong support this use case!


I keep comments committed in a separate branch.

The lack of syncing doesn't bother me, because the purpose of taking notes always falls into one of these categories:

1. I read the code to get an idea of how something works. The code is there to make examples/variable names concrete, but I don't need to know the exact implementation.

If the notes need to sit in the code, usually that's because the answer spans multiple methods (eg "what does an e2e request look like?"). A set of comments on outdated code is always good enough for me.

Otherwise, a lot of times the answer can be summarized in one line (eg "where is the state tracked?" -> in FooBarClass). These can go into personal notes.

2. I need to know the implementation and it is complex and hard to follow.

If I need to know the implementation, either it is because I'm actively working on it, or I need to make [complex idea] more concrete in my head.

If it's the former, usually I'll have memorized it by the time I read through it.

If it's the latter, by the end of it I'll have gotten the main idea and it's fine to forget the implantation details.


I do my absolute best to write code that does not require many or an comments or annotations because of the pain points described. I assume you're not referring to things like documenting "infrastructure" or "overall design" or "how to get started" as they don't change much and I just put those in a readme in the repo. For the nuts and bolts itself, this involves

Carefully naming variables and classes in obvious and consistent ways. I will spend time refactoring code so that it is named consistently and behaves as named.

Very small functions and classes (but not smaller than they need to be). This lets me use more named functions which gives me more description. It also typically gives me a nice hierarchy of how things occur, so whatever main "driver" function I have is pretty declarative and light on logic. It avoids big "god" functions or classes which tend to get cluttered and are often the hardest to break down or read.

Enforce obvious and established patterns. These again go in names, but if I'm using CQRS, then I'll have lots of CQRS, handler, registrar, etc in the names. If I have a factory it has Factory in the name. When you see these you know what and how things are organized.

Related to the above, no "clever" code and no inconsistent code. I'll write more "inefficient" code if it's not a bottleneck rather than something tight which was a premature optimization. If it's not normal for the established patterns, but could be forged into something consistent, I do the latter.

Lots and lots of tests. Tests describe behavior which tends to be pretty immutable OR if I have a requirement on behavior change, the test will fail at some point and needs to be reconsidered so gets my renaming attention. That last part is very important. Most testing frameworks let add plain language names/failure conditions, so if the behavior has changed the test starts going red and it doesn't let you forget about it. Those often become my documentation/annotations.

I will use comments when I've written something that needs to be structured outside of the above. These tend to be rare and typically pretty dense "black box" places, like when I've implemented a numerical or other very specific algorithm. As such they don't tend to get touched very often and I will write unit tests to make sure behavior is enforced.


I used codestream with two of my previous teams and absolutely loved it. I don’t remember if you can keep annotations private but I see plenty of value in allowing the rest of the team to see what questions/note you have. In any case, I believe they open sourced the whole thing so you could see how they handled code changes


GitHub issue comments. You can link to code in GitHub that's anchored to a specific commit. If it's in the same repo GitHub will inline the code into the issue git you. For separate repos I sometimes link and then manually copy in the code block myself.


I use Sublime Text and put my notes in a file. I don't use file/line references but rather name the thing I'm noting (e.g. class/method/variable). Other times I'll use a commit and a literal string as a (nearly) unique reference.


> Private forks with comments (becomes unmaintainable as source evolves)

If swdev-grade merging tools are not sufficient to get it done then that's probably a bad sign for your requirements being possible to be met


That's a good point. Good enough but a pain. I was hoping for something more tailored to my usecase.


>1. These annotations need to be tightly coupled with specific locations in the source code (particular functions, variables, or even specific lines)

2. The underlying code changes regularly (new versions, updates from maintainers) which can break the connection between my notes and the code

Maybe depend on more loosely coupled notes?

You say they "need", but realistically they don't really need "to be tightly coupled with specific locations in the source code", that's just a nice to have.


I would like to see (better) solutions not only for source code, but general web-pages and applications. For example, bookmarks in a browser are ok, but it would be a lot better if you could easily annotate and later reference / rank / prioritize. A browser is a pretty good proxy to the world's knowledge including source code. It be nice if they would level up in these regards.

There are tools for aspects of all these areas, but still feel unsolved (easy, feature-full).


I’d try to drop requirement 3. Any insights made could be beneficial to somebody else working on the code (especially in closed-source environments only touched by people employed by your organization).

Re: critical tone, instead of saying “this is a useless garbage fire” maybe something like “it is not yet apparent how this interacts with blah blah.” There’s always a way to phrase it where it’ll plant the seeds of how you want the reader to feel about it without being overt.

My 2c, anyway.


The weAudit VSCode extension [1] works pretty well. It's designed for security work, but there's no reason why you couldn't use it for general note-keeping.

[1] https://blog.trailofbits.com/2024/03/19/read-code-like-a-pro...


https://github.com/nobiot/org-remark

handy, if you’re in the emacs ecosystem.


Maybe a combination of private fork with comments and separate markdown files with notes (maybe in the same private fork)

Consider using special "symbols" in comments like "MYDOCS_XXX" that you search for in your modified version of the code base, and refer to in other places. These will survive renames of function names etc by the upstream authors.


A lot of times I just take notes in a personal fork, but that's imperfect for all of the obvious reasons.

I also take notes in my notes app. This obviously is imperfect too, but the codebases I work on aren't typically churning so much that these notes become out of date too quickly.



you describe a nightmare. which the only solution is to keep one single commit with all the comments on a branch.

update and rebase the branch. solve conflicts if they changed code around the comments. anything else you will be just delaying this exact same chore and possibly making it impossible down the road.


I make a mind map in FreeMind with method/property names and even pieces of code as nodes.


I teach a class on computer graphics, where I want to embed my working source code into my web based explanations, so perhaps the following could help you

I have my source code in one directory, and in another I use Sphinx to make the documentation. In the documentation, I reference certain sections of code, which you can do by line number, or you can do by some pattern to begin and end.

Since I control all my source code, I put in comments with certain flags for regions of code.

I can then reference said section of code as follows

  .. literalinclude:: ../../src/demo06/demo.py
     :language: python
     :start-after: doc-region-begin define uniform scale
     :end-before: doc-region-end define uniform scale
     :linenos:
     :lineno-match:
     :caption: src/demo06/demo.py
https://github.com/billsix/modelviewprojection/blob/master/b...

The generated book is here https://billsix.github.io/modelviewprojection/

For your purposes, using a third party's code, I would make a new git repository, and copy the current status of their code in, I would then annotate the sections that I want to with comments, And then generate the documentation using Sphinx, referencing you annotations of their code


I use a reMarkable to write thoughts as they happen


[flagged]


This sounds like human-generated nonsense


Indeed, when one discovers someone changed how parameters are interpreted... the results of keeping parallel copies of documentation form a contradiction.

i.e. your input can become less than helpful, and becomes a liability or outright errata.

Have a nice day, =3




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: