Hacker News new | past | comments | ask | show | jobs | submit login
Where should the debugger set a breakpoint? (pernos.co)
101 points by hasheddan on Nov 12, 2021 | hide | past | favorite | 17 comments



Interesting article. I'm curious how another debugging feature, reverse stepping works. Does anyone have a source on that?


There are a lot of different techniques. In Pernosco we more or less do "SELECT program_state WHERE time=T_current-1". gdb's built in reverse debugging feature creates an undo stack when executing forwards, and executing backwards pops modifications off of that stack. rr uses forward execution to emulate reverse execution, backing up to a checkpoint, executing forwards once to figure out where to stop, and then executing a second time from that checkpoint to the stopping point.


To add to that for people who don't know how rr works: rr records the entire state of the program by saving all the return values of syscalls. When replaying the execution, you can run the executable natively, only replacing syscalls by their recorded values.

In GDB, in addition to reverse stepping, you can also use checkpoints (save the entire state of the program) and restore them later. GDB's checkpointing feature is based on fork. "Going back" is done by "exec()'ing" the checkpoint, replacing the current execution by the saved one. One big limitation is that forking does not work for debugging programs with multiple processes, and fork is not well defined when multi-threading too. As part of my PhD, I extended [1] gdb using its Python API and allowed checkpointing with CRIU (Checkpoint/Restore In Userspace) which does not have this limitation. The "small" difficulty is that both GDB and CRIU use ptrace to attach to the process, and it is not possible to attach to the same process twice. The trick was to send SIGSUSPEND to the program, then detach gdb, then run CRIU on the suspended program, then reattach to the process with GDB and let it continue. This is hacky but it does work, amazingly, including breakpoints and watchpoints which are restored. CRIU even restores the right PID number by using setpid.

An advantage of checkpointing is that you can restore the execution and try another path (rr cannot change the execution, it can only replay what it recorded). You are not limited by the maximum size of an undo stack. But you need to save in advance. With rr, you can explore the whole execution, any state of the program is reachable. It's easier and more robust than checkpointing. I'd argue that's what is most useful most of the time. Checkpointing can be useful to explore multiple executions without having to run the program from the start, or to explore normally difficult to reproduce / unreachable states by modifying values with the debugger.

[1] https://gricad-gitlab.univ-grenoble-alpes.fr/jakser/verde

[2] https://criu.org/


rr is capable of very limited diversion from the originally executed trace. This functionality exists to support things like `call myProgramObject->dump()` in gdb during replay. It will give up immediately on any non-trivial syscall though.

Being able to change something and explore another execution path is interesting but it's extremely difficult to do in general. I think it would be very useful for verifying fixes, which is something that tools like rr are not particularly helpful with.


How does rr work with multi-threading? I would imagine threads can mess with the order that syscalls are executed.


rr emulates a single core machine and only one thread in the process is making process at any given time


I recall that in either gdb or Qt Creator's gdb wrapper, parameters were sometimes being uninitialized when I breakpoint the function declaration line, then be initialized when I single-step to on/before the first line of code. I can't replicate it now though.

----

I'm interested in the promise of omniscient debugging to partially replace both debuggers, tracing tools, and logging (including printf debugging), for bugs in complex systems which you can reproduce in controlled environments (not in production). However I don't believe it will be perfect even there, because I've noticed that when debugging many binaries (especially non-symboled or worse-yet stripped ones) in gdb, it prints many variable values as "optimized out". This requires special debug builds which partly alleviates the problem (I still get better results with compiled-in debug prints).

I've recently been tracing the execution of the pipewire daemon. Since it's a large library filled with virtual function calls constructed from C macros, it's painful to statically analyze by reading the code. I found a Callgrind profiling trace to be useful as a control flow analysis tool. Unlike perf, it logs every function called, counts the calls to each function, records all callers and how many times each caller called a function, and even records how many times each line of code calls another function (can distinguish between two calls to the same child in the same function).

For example, I recently used KCacheGrind to identify that spa_alsa_open() was called 6 times, 4 times by one caller and once each by 2 other callers. And it returned early twice, and continued onto snd_pcm_open() 4 times.

I also wanted to trace all ALSA functions being called across the trace (ltrace didn't work well since libasound.so.2 was loaded via dlopen, and I couldn't figure out how to make ltrace not burn CPU intercepting calls from one .so function to another). This was difficult to extract from the the Callgrind trace, and I had to rig together a script using `gprof2dot --format=callgrind -n 0 -e 0`, rg to convert the output XML file into a CSV of callers and callees, and csvsql to isolate all calls from non-snd-containing functions into snd-containing functions, so I could read the source code to see where these ALSA functions were being called. The result didn't tell me the order or parameters of the function calls, so I had to infer the order in some cases, and insert logging statements in other cases.

I think Pernosco is promising in this area, but I haven't tried it yet because it's commercial and has limited free uploads. One idea (I don't know if it's implemented or not) is having the ability to run SQL-like queries to list all calls from one binary to another .so library. Another idea (on top of the caller-callee lists you already have) is creating control flow graphs like KCacheGrind.


> in gdb, it prints many variable values as "optimized out"

To a large extent this is still just compiler bugs where variable locations are not preserved by various compiler optimizations where they could be.

Some of the rest of these cases are fixable with some extensions to debuginfo that leverage omniscient debugging. For example we need a DWARF extension for variable locations which means "in PC range <START>-<END>, the value of variable <X> is the value it hand when the PC was last at ___location <LOC>". This would take care of situations where the register containing <X> has been reused because <X> is dead. It's no trouble for an omniscient debugger to look backwards in time to get <X>'s value at <LOC>.

> I think Pernosco is promising in this area, but I haven't tried it yet because it's commercial and has limited free uploads

Feel free to email us at [email protected] and we can maybe work something out.

> One idea (I don't know if it's implemented or not) is having the ability to run SQL-like queries to list all calls from one binary to another .so library. Another idea (on top of the caller-callee lists you already have) is creating control flow graphs like KCacheGrind.

We could do those things but we don't. Before implementing features like that we'd want to understand what the underlying problem is that you're trying to solve --- maybe there's a more direct way.


I realized that Address Sanitizer recording the stacktrace of an allocation and printing it upon a memory error is a very limited form of introspection, almost like a subset of what rr can capture. It would be cool to take an ASAN-enabled binary and ask the sanitizer who created a particular memory allocation, to trace causality across an unfamiliar codebase and learn the code structure.

> Feel free to email us at [email protected] and we can maybe work something out.

Tell me if my email got spam-filtered or not.

> We could do those things but we don't. Before implementing features like that we'd want to understand what the underlying problem is that you're trying to solve --- maybe there's a more direct way.

The reason I wanted SQL-like queries is because I was debugging a complex bug (PipeWire setup for low latencies, using ALSA to output through a USB audio interface, gets stuck in XRUN during startup), and wanted to find all calls that PipeWire made to libasound.so.2, to build a self-contained test case reproducing the bug. But I lost interest in the exact userspace API calls once I started tracing execution in the kernel, and I lost interest in the bug altogether once I discovered that USB audio latencies on Linux are awful even on the "low-latency" modes of PipeWire/jack2.

I do think this is useful in more situations though. For example, GTK3 and GTK4 talk to the X server in different ways, causing GTK4 apps to have incorrect positioning/title bars on KWin and XFCE. It would be cool to take a rr dump, then log all calls into Xlib/xcb in an automated fashion (I never did find the difference, the GTK4 maintainers closed my bug report and didn't investigate, and I don't know enough about X11/Xorg or kwin/xfce to diagnose from the symptoms and my gdb poking). Or perhaps log all interaction between a volume control and libpulse, and so on.

----

Why do I like KCacheGrind-style call graphs? Partly because it's the closest automated visualization to my preferred technique of a hierarchical/tree execution trace. The way I learn/debug, refactor, and design codebases is by tracing control flow as bullets in a "notes" Google Doc. I have a screenshot at https://drive.google.com/uc?id=1KLmGsQS7c9pKan6byh0oZ7VYXSB6....

I generally use tracing when I want to see what high-level operations affect particular low-level state, or to find the effects of a function call, or to see how a program enters an specific/erroneous state. I usually read type definitions and record them in a "types" section, then hand-trace the execution of relevant functions and record the control flow tree (with or without the values of variables) as nested bullet points. Crucially, I turn function calls into nested blocks (just like local scopes), and write down which branches are taken in which situations. The resulting tree (assuming no loops) can be read from top to bottom as a straight-line representation of what the CPU (or C abstract machine) is doing, where causes always come before effects. Whereas in the raw source code, functions are out of order or even in separate files, and virtual calls are impossible to statically resolve without nonlocal information, obscuring control flow to readers.

I personally find this approach very effective for learning code, understanding bugs, and tracing the effects of refactors and bugfixes. Though I find it time-consuming to build control flow bullets while reading/learning existing code, and faster to write bullets before writing new code. And I don't know if others find this technique as useful as I do.

----

I don't know if there's a better way for me or others to understand code, but I haven't found any yet. UML is a joke(?). Doc generators (like Rustdoc and Doxygen) are built to teach APIs and abstract away implementations, not to teach implementations. Sourcetrail is a (discontinued) code browsing/reading tool. In my experience, it functioned like IDE "go to definition" but with fancier graph-based presentation, and didn't help me understand code better than an IDE did. Maybe it was because it didn't show multiple levels of function calls on a single screen, or didn't let me annotate code with values and branches taken when running the code, or it didn't produce "notes" summarizing what the code does in a particular situation, which I can read without having the source on hand.

I like how in Google Docs, I can use highlighting to color-code types/variables in different expressions as having the same value. I dislike Google Docs's slow loading and bloat, deep bullet indentations, bullet nesting limit, and lack of code blocks. I dislike how Markdown interprets asterisks and underscores in code as italics (in-band coding ambiguity) unless you spam backslashes (impairing plaintext readability) or backticks, lacks color-coding, and lacks cloud sync. Other than those issues, Markor is a pretty good WYSIWYG-ish Markdown editor (I find it awkward to cross-reference in a dual-pane editor, and editing Markdown in a monospace code editor is "good enough" if you're fine with monospace and no images).

----

Unfortunately this technique doesn't extend well to asynchronous code, where one function "schedules" another function to run, in an event loop in the same or another process. Though I feel asynchronous code intrinsically makes it difficult to understand what causes what (I often give up debugging when I see a process in poll() waiting for "who knows what"). I usually struggle at first, then once I figure out, I hand-wave "this causes this to happen" in my notes (in the above pic, core_method_marshal_sync in one process "invokes" core_method_demarshal_sync in another).

(I wonder if you can more easily trace poll() causailty, by running a process in strace and looking at what created the file descriptor that poll() is blocking on. But I haven't had the chance to practice this much. Maybe I'll try on the Firefox extension "copy multiple tabs" notification hang.)


> in gdb, it prints many variable values as "optimized out".

This is because your contract with the C/C++ compiler is not "produce machine instructions that faithfully implement the program as written", but "produce machine instructions that have the same side-effects as the program".

Anything that is not explicitly marked as observable (and whose state is thus a side-effect) is fair game. Arguments passed to functions out of scope (remember LTO...), memory same, volatile-qualified storage... that's about it. Debuggability of an optimised program is an accident, not an objective.


I suppose. When I debug system software, I usually recompile Arch Linux packages with options=(!strip debug), which turns off stripping and enables debug symbols. I don't know if it disables optimizations which impair debugging though.


That's not really true. A ton of intentional work has been done to make optimized builds debuggable ... unfortunately, not enough yet, but it is definitely "an objective".


Have you seen the demos about how Pernosco can help with printf debugging? Things like being able to click any line of output to timewarp the debugger to the call that printed it, or being able to add print statements and see the results without even having to re-execute the program.


I looked at the list of demos a while back, but didn't watch in detail at the time.


I suggest you give it try, it's quite remarkable (disclaimer : I'm a pernosco customer) . You need to adjust your mindset to the way it works (I haven't seen any similar tool elsewhere, so to me it looks very different from a traditional forward debugger), but it's well worth spending a little bit of time with it.

In particular, because it shows the whole 'dataflow history' (how a specific value flows through your program), it makes learning an unknown codebase (or a codebase you think you know for that matter...) much easier. Other features such as 'show me all calls to function f along with their parameters' help provide similar insights.

I don't like debuggers, and used to delay as much as possible reaching for gdb and its cousins, but with pernosco, it's pretty much the opposite : I know that I will find my (potentially nasty) bug within a few minutes, by focusing on the 'big picture' provided by pernosco instead of wasting time working around the limitations of a regular debugger.


I tried uploading a trace... which promptly failed due to Qt running RDRAND and Pernosco detecting a desync on playback.

I do think it's a cool idea (assuming the binaries are symboled and everything). However in my recent debugging, I've noticed that sometimes I recompile and run programs to change what logging I add, or rerun programs because I screwed up my breakpoints (pernosco could fix), but sometimes I recompile to change the code behavior and observe how the visible output of an app changes (which probably can't be simulated in a trace in general, aside from identifying what downstream code would be affected by changing a variable). So this won't be a complete solution for debugging, especially given that uploading more teaces costs extra money, but I assume it will be a powerful tool to have.


Yeah we had to hack around this Qt bug https://bugreports.qt.io/browse/QTBUG-91071 but it's fixed now.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: