Hacker News new | past | comments | ask | show | jobs | submit login
Pandoc 3.0 (pandoc.org)
504 points by zczc on Jan 19, 2023 | hide | past | favorite | 72 comments



I've been frustrated by Markdown previews not supporting Pandoc features, so I created a Pandoc-based Markdown preview for VS Code [1]. The preview supports all Pandoc extensions to Markdown syntax, because Pandoc itself generates the preview. There is also optional support for code execution with Jupyter kernels. I'm currently in the process of adding support for non-Markdown formats (including scroll sync), plus taking advantage of some of the new Pandoc 3.0 features.

[1]: Examples and animations: https://codebraid.org/presentations/scipy2022/. Installation for VS Code: https://marketplace.visualstudio.com/items?itemName=gpoore.c.... Installation for VSCodium: https://open-vsx.org/extension/gpoore/codebraid-preview.


Funny. During my bachelor thesis I added Pandoc as a renderer to an Atom markdown preview extension. (instead of actually writing my thesis)

https://github.com/atom-community/markdown-preview-plus/pull...

Old is new, the editor and the extension are now defunct. What was best about this exercise, I got so well versed with the markdown and Pandoc features at the time, that I didn’t need the preview at all.


The release 3.0 is unusable on windows. When running pandoc —version, it spawns off many instances and won’t return the value. Anyone has encountered such an issue?


Does that mean that you are adding support for latex etc?

this is awesome, thank you for your work.


Yes, I'm adding support for arbitrary text-based formats, including LaTeX. So it will be possible to write LaTeX and see a live HMTL preview generated by Pandoc.

In principle, it should be possible to create a PDF preview with proper SyncTeX support for synchronizing LaTeX source and PDF preview locations, but that gets complicated when Pandoc+LaTeX generate the PDF. It may be best to leave LaTeX-PDF previews to dedicated LaTeX previewers that don't involve Pandoc.


> Yes, I'm adding support for arbitrary text-based formats

That would be extremely powerful, and also would allow you to differentiate your extension from the Quarto one.


I actually released my extension around the same time that the Quarto extension came out. Quarto is great for documents running R code or needing some of Quarto's advanced document features. My extension has scroll sync and the preview updates live while you type. If you need code execution, you can use multiple Jupyter kernels per document and execute inline code. Also, code execution is non-blocking, so the preview still updates when you type, and code output appears live as it becomes available.


Pandoc is a great piece of software. As a university teacher and researcher, I use it in three ways:

1. I write markdown for my website and for the websites for my research projects and simply generate standalone html out of it. Done.

2. When we create electronic exams, the exam platform takes questions using a html-backed rich text editor. We write down our exam questions using markdown, create html document fragments, that we simply paste into the exam platform.

3. When students do electronic exams, we receive xml files from our exam platform. We use python to pass on submissions to different submission checkers (akin to autograders or static analysis) and create yaml files with the student submission and grading suggestions and static analysis annotations. We manually review and grade and comment within the yaml file (that works incredibly well), collect all the data using python and generate markdown reports for each student, including their submission, our comments and scoring. We pass this markdown through pandoc, creating well layouted pdfs which we either print and hand out or send out electronically.

Pandoc fits our yaml+markdown-based processes very well. Only for the actual research papers we still write LaTeX and build pdfs without pandoc.


Interesting! I use a very similar process for creating exams and student projects, but am the only one in my department who does so. Are any of your processes/tools publicly available? (Mine are basically cobbled together in Haskell and Python.)


Sorry, same. There's such a myriad of e-learning platforms in Germany and I guess it's the same for most countries.

I would believe the same goes for our own research static analysis and autotrading platform (in our case SQL) which probably every CS department also has quite a few of.

I wouldn't put my hopes up for anything publicly available that fits your platform and has a bus factor higher than 1.


Ah well, I can dream, I guess :)

Every now and then, I ponder putting some of my scripts together into something I could actually hand over to someone else, but have not yet had the time.


for the research papers which you write in LaTeX you should have a look at MonsterWriter.

Disclaimer: I'm the creator of MonsterWriter and very keen to receive feedback and learn about how universities and their students write papers, thesis, ...


Though I don't really like you advertising, thank you for the suggestion. As a computer science researcher I'll give you some feedback why your application is a total deal-breaker for me and my colleagues:

* It's not running on Linux. Nobody in our department runs windows or mac.

* We already have huge BibTex citation libraries that we use in papers and just reference the necessary papers. These citation library files grow and grow. I won't manually add citations for each paper.

* We collaborate and version through git. If collaborative writing and version control does not work at least as easy as our plaintext-git-handling, that's a hard no.

* You do know that for conference or journal submission word and LaTeX templates with given page limits in these templates are given, right? How would I use, say, LNCS in MonsterWriter? Writing seems not to be page-based. How do I know that I'm over the limit?

* My wife is a researcher in the social sciences, and they extensively use MS Word's change tracking and merging feature to write papers. If MonsterWriter does not support this in an accessible and visually appealing manner, it would be a hard no for her as well.

With your feature set, you're not really targeting researchers, even if you think you do.


Thank you for your time to answer, very much appreciated :)


It seems you're being downvoted for proposing a tool you created and disclaimed as such. I think that's perfectly fine and in the spirit of HN.


Just tried opening it. It's looks nice, but I'm going to write some quick, slightly negative, comments, based on your claims about using it.

The table formatting is not good enough. It's not obvious how to left-justify a column. It's also not clear how to line a column up along "." (which I often use for numbers). Both of these are fairly easy in LaTeX.

The outputted LaTeX looks OK, but it's not obvious how to format -- most journals, and Universities (for PhDs) will have a fixed style you have to use. I suppose I could take the LaTeX and randomly hack it, but then I need to learn LaTeX to fix any issues that causes.


both fair points, tables need still improvement.

Regarding the outputted LaTeX, the idea is to grow the amount of supported templates. So there would be templates for every important journal. For now the focus is to make the thesis template flexible enough that it works for most bachelor/master thesis.


It's cool you're using SetApp, thank you.

For HN reading along, SetApp is a way to distribute apps and get paid outside the app store. Really, that exists.

// Disclosure: Unless you are disavowing your ability as author to offer a recommendation that can be trusted, you probably mean "Disclosure" not "Disclaimer". Disclosure = here is my potential bias. Disclaimer = YMMV, no warranties express or implied.


You can also download the app without SetApp. No Subscription needed!


I love Pandoc. I don't often write "documents for office consumption" but when I do, I just write a markdown file and spit out docx or PDF. I was congratulated more than once on how coherent my documents are in their structure.

Plus, having a git history is a great boon.


It's also not too difficult to hook up a GH actions job to generate the documents with pandoc and spit them out directly into dropbox/sharepoint for "non-techie consumption". Great for semi-technical documentation that bis/sales/support people need to be in the loop on.


Oh that's a great idea! I wish I had pandoc in my university days--I ended up writing a lot of (non-technical) papers in latex just because I hated using word for the task.


Out of every tool I’ve ever used to make a .docx file from Markdown, Pandoc is the only one that has consistent results with converting Markdown headers to Word styles rather than just a bigger font size. Lots of Markdown tools in my tool belt, and would love to know of any more that can do this, because it’s really useful on the (unfortunate) occasions something needs to live as a Word doc.


I do wish there was an easy way to create Word document titles from H1s in Markdown. It makes sense that they should be converted to top-level headings, but it adds an annoying bit of friction to my workflow.


Oh, but there is!

    pandoc --shift-heading-level-by=-1 input.md -o output.docx
This will promote level-2 headings to level-1, and promote a level-1 heading at the top of the document to the document's title.


Oh really? I've tried --shift-heading in the past and it worked to move headings up a level, but not to the title. I'll have to read the docs more carefully and give it another go. Thank you.


Quarto, excellent software for building publications, websites, etc…, is leveraging Pandoc: https://quarto.org/


Pandoc is such a great conversion program, and this new 3.0 release has so many improvements, especially for figures.

I write in markdown and export to PDF and using pygments for code syntax coloring, with .tex files to adjust layouts, tables, and the like.

https://github.com/SixArm/pandoc-from-markdown-to-pdf


Love this program. Means I can write in plain text, markdown or zim-wiki syntax, and export to word no hassles.

If I'm writing markdown I use pandocs version as it has support for advanced tables.

Brilliant software.


That's great news. I've been waiting for years for a dedicated 'Figure' element. The workaround was pretty brittle. It'll make pandoc-plot [0] easier to maintain as well.

[0]: https://github.com/LaurentRDC/pandoc-plot


Does it still automatically generate "smart" quotes (which are anything but) from traditional ones during conversion?

Love the tool, but this is the most awful default setting I've seen in a program in a while, especially if you include any code that depends on quotes not being mangled.


If you specify that you just want bog-standard markdown then it will not generate smart quotes. To wit:

- `pandoc foo.tex -t markdown foo.md` will not produce smart quotes.

- `pandoc foo.tex -t markdown-smart foo.md` will produce smart quotes.


I think the examples you give might be the opposite of what happens - in the docs:

https://pandoc.org/MANUAL.html#extension-smart

The meaning of the -smart extension on option names is inverted in some cases, and enabled by default on markdown output.


This I agree with. I don't know the exact current status, but having debugged related rendering issues many times over the years, I wish it had always been hard to enable conversion to so-called smart quotes, rather than hard to prevent it.


I should add: the above is about the only quibble I can think of, which is impressive. I love love love pandoc! It's a highly dependable and capable swiss army docs tool. I use it constantly, eg to help generate CLI help text and HTML, man, info and plain text manuals from (mostly) markdown sources. Huge congrats and thanks to the developers for their hard work and for this latest release.


It’s configurable, and, in any case, Pandoc will not alter the quotes in your code blocks nor in inline code.


I love pandoc. With it's lua filters, I love using it for generating html and blog posts, one thing which always annoys me about most static website generating tools is they make you use some very limited templating language, when I just want to use a proper programming language.

My only irritation -- while I understand why one would want to do it for neatness, it's annoying that the "pandoc" package no longer provides the "pandoc" program! Maybe instead introducing "pandoc-core" and renaming "pandoc-cli" to "pandoc" would be better (it would certainly avoid breaking existing scripts, like mine).


If I understand correctly the package pandoc-cli installs the pandoc binary [1], so it shouldn't beak that many scripts.

[1]: https://github.com/jgm/pandoc/blob/535bd0393fe7b2f287903b942...


This (I also generate my blog using pandoc). In another case, I wanted to go from Markdown to groff -mom and it was a totally straightforward matter with a custom Writer in Lua.


I've been looking for a tech stack to replace latex for decades now. As a very recent development, the combination of pandoc+weasyprint (plus a little bit of homebrewed pandoc filter magic) has now become good enough for my needs, and I have finally been able to take the plunge. Feels great.

For those who are a little less adventurous and who happen to be in the social sciences, humanities, journalism, etc., pandoc+msword is also definitely worth looking into. It's a much better tech stack than standalone msword. -- It's really only in the STEM fields that, in my mind, there really is no way around latex.


No way around LaTeX, but Pandoc + LaTeX is still a much nicer experience than plain LaTeX in my experience.


Pandoc powers my little static site maker:

cf. https://github.com/adityaathalye/shite/blob/master/bin/templ...

  __shite_templating_compile_source_to_html() {
      # If content has front matter metadata, it is presumed to be in a format
      # that the content compiler can safely process and elide or ignore.
      local file_type=${1:?"Fail. We expect file type of content like html, org, md etc."}
  
      case ${file_type} in
          html )
              pandoc -f html -t html
              ;;
          md )
              pandoc -f markdown -t html
              ;;
          org )
              pandoc -f org -t html
              ;;
      esac
  }


Mine too. I author in org and have a plugin that converts the org files to rst files.


Ah, I make it compile partial HTML (just page content), and stick that into my own HTML page template(s) (written as HEREDOCS :D).

That's part of the joy of using Pandoc. I can pipeline it, no problem.

Like this:

cf. https://github.com/adityaathalye/shite/blob/master/bin/templ...

  cat "${watch_dir}/sources/${url_slug}" |
      __shite_templating_compile_source_to_html ${file_type} |
      __shite_templating_wrap_content_html ${content_type} ${watch_dir} |
      __shite_templating_wrap_page_html \
          > "${watch_dir}/public/${html_url_slug}"
Templates look like this. Notice the $(cat -) in the middle. That's how the HTML content produced by Pandoc gets injected in the middle of everything else.

  shite_template_common_default_page() {
      local maybe_page_id=${shite_page_data[page_id]:+"id=\"${shite_page_data[page_id]}\""}
      local maybe_canonical_url=${shite_page_data[canonical_url]:+"<link rel=\"canonical\" href=\"${shite_page_data[canonical_url]}\">"}
  
      cat <<EOF
  <!DOCTYPE html>
  <html lang="en">
      <head>
          $(shite_template_common_meta)
          $(shite_template_common_links)
          ${maybe_canonical_url}
      </head>
      <body ${maybe_page_id}>
        <div id="the-very-top" class="stack center box">
            $(shite_template_common_header)
            <main id="main">
              $(cat -)
            </main>
            $(shite_template_common_footer)
        </div>
      </body>
  </html>
  EOF
  }
edit: substantiate Pandoc's role.


In my case I'm using Pelican which has builtin support for rst, so it was easier for me to just convert to rst rather than full blown HTML.


Since I've rolled my own SSG, I wanted it to compile $FORMAT -> HTML directly, so I write fewer bugs :D

I chose Pandoc because it does a reasonably OK job compiling orgmode, _and_ has good support for other formats I use from time to time (e.g. markdown, ASCIIDOC).

Before this, I was using hugo, with a compile cycle similar to your setup, viz. org -> markdown (via ox-hugo), and then hugo did the md -> HTML thing. hugo sort of supports org -> html, but their batteries-included compiler is not very good. Points for trying, though.

edit: typo, context


I've been using Pandoc to write Latex-lite for a couple years now. Just write .md files with basic Markdown syntax for all the major text content, and add some Latex when I need to do something more particular. Best of both worlds, really.


Do you want to change yve world? Write software like this. Pandoc is great.


One day I wish to see the AsciiDoc(tor) Reader. I'd love to be freed from Ruby as AsciiDoc is superior to Markdown and most other lightweight markup syntax options in features and syntax. This lack of features is why we have an incompatible group of Markdown syntax forks (aka "flavors" to mask that forks are incompatible).


As others wrote, Pandoc is Haskell so it compiles to a fairly efficient binary.

But more importantly, unlike the various Markdown flavors or AsciiDoc, it is incredibly extensible thanks to the combination of custom filters and the possibility to add HTML classes and attributes. One can write filters to leverage the class/attribute information and perform transformations at the AST level, which basically lets you define a DSL with an arbitrary number of custom elements.

I wrote a collection of filters for the publication of a large online legal playbook. Not only did Pandoc make it possible to introduce different kind of custom elements that don't exist in plain Markdown or AsciiDoc, but by using different filters it was possible to use a single Markdown source to generate both the book and various summaries such as a list of examples, a list of civil code clauses etc. I don't know Haskell that well so I used Rust for the filters, but that worked very well.

Pandoc is IMO a very underrated tool.


> Pandoc is Haskell so it compiles to a fairly efficient binary.

This is nebulous. Haskell's compiled binaries are not ideal, for a number of reasons.[^1] GHC does very little to optimise for many typical metrics of "efficient". The binaries it produces are enormous because it (unavoidably) bundles the runtime along with the program itself, and there is a lot of empty space in the binaries. Shrinking them can improve startup times significantly especially on spinning rust drives.

That said, Haskell programs are at least _compiled_, and they do result in binaries which, if well written, can result in running times comparable to (or, sometimes, shorter than) your average hand-rolled C code that achieves the same goals.

Of course, none of this casts any shadow on the fact that Pandoc is, indeed, an excellently engineered piece of software that stands as a testament to the value of Haskell for real-world business logic and problem solving.

[^1]: This problem is fairly well-understood in the Haskell community: https://dixonary.co.uk/small


Take a look at https://djot.net/, you might like that.


It does do some things better and I appreciate calling it a new name and 'starting over' instead of another fork, but what's not covered is metadata. If I want to add author, license, tags, keywords, description, etc. there is no in-document way to do this. Almost all other media format types from images, audio, to other documents like ODF have a way to do metadata and this (and Markdown) doesn't cover said important use case.

Seems there's a long-standing (for the project) open issue where it's still being mulled over.

Imports are also very nice for writing longer texts--especially how AsciiDoc lets you +1 all of your headings so the heading hierarchy works as a standalone document and a part of a larger whole.


Yeah, it’s definitely a wip at this point, though it is already general enough to express meta and imports. You could write

    # Title
    : key = value
    
    ```include
    ./examples/hello.rs
    ```
today and write a simple filter to extract meta from the first definition list and resolve includes.


I appreciate that you pointed it out though to give me a chance to reevaluate my thoughts on the project. It seems it has a better trajectory than when I had last looked.


I've successfully used it from Clojure (I think it's through JRuby). With a few lines of code you can configure AsciiDoctor to whatever you need. It's way easier than fiddling at the command-line (I couldn't immediately understand how to get extensions and how it played with whatever version of the software I got through `apt`). It'd be good to have alternatives just for the sake of it - but I didn't find anything particularly lacking

Here is how I made some reveal slides

    (import [org.asciidoctor
             Asciidoctor
             OptionsBuilder
             SafeMode])

    (let [input-file    (clojure.java.io/file
                          "path/to/adoc/file")
          adoctor       (org.asciidoctor.Asciidoctor$Factory/create)
          reveal-option (doto
                              (org.asciidoctor.OptionsBuilder/options)
                          (.backend
                              "revealjs")
                          (.safe
                              org.asciidoctor.SafeMode/UNSAFE)
                          (.attributes
                              (.attribute
                            (org.asciidoctor.AttributesBuilder/attributes)
                            "revealjsdir"
                            "../reveal.js")))]
    (.requireLibrary
        adoctor
        (into-array
          String
          ["asciidoctor-revealjs"]))
    (.convertFile
        adoctor
        input-file
        reveal-option))
You get all the codez from Maven so you don't need to install anything on your system

    {'org.asciidoctor/asciidoctorj-revealjs {:mvn/version "5.0.0.rc1"}
     'org.asciidoctor/asciidoctorj-pdf      {:mvn/version "1.6.2"}
     'org.asciidoctor/asciidoctorj          {:mvn/version "2.5.3"}
The maintainers seem very responsive and active on Github. It's not as nice as a spec and multiple implementations - and I guess you're locked in to one library, but at least it's not as bad as Orgmode - where you're locked in to an editor as well


Yes, the maintainer is great. I have at this point just used Nix and post-processed Asciidoctor instead of trying any sort of other tools, but it gets tricker as you noted if you want to use it inside something else. It's not a compiled binary nor is it a C lib other languages could get at. Much of that could be attributed to the spec being quite.


Pandoc is written in Haskell.


AsciiDoc is written in Python.

Though when it comes to annoyance with Markdown forks: AsciiDoctor is basically that to AsciiDoc. It's mostly compatible, but when it isn't, it really bites.


The more important part is that you end up with a binary instead of needing an interpreted language which makes the tooling a mess. Python and Ruby are the same thing to most people.


fantastic software, never build it from source, or if you have to, make sure you have an OS that bundles all the Haskell dependencies into a single meta package


Haskell tooling went from awful to best in class after stack came out. Look for “Quick stack method” in the installing page- It should be easy to build from source now with just a few commands. No doubt will take a long time to compile all the packages and you might still have issues tracking down any non-Haskell dependencies (c libraries).


this got replaced with ghcup iirc


Pandoc is really easy to build with just Haskell and Cabal installed via ghcup, without needing any Haskell packages from your distro.


It or one of its libraries did not compile for me OOTB in Gentoo. Granted it's marked as nonstable (~*). That's unfortunate, because I really liked to use it when I was mainly running Debian. Though I didn't really put any debugging effort into making it work.


I did this years back, it took, ehm, quiet some time. I use a binary now :D ISTR it starts with building a self-hosting Haskell compiler...


Yeah, I ended up upgrading my machine's ram when I was building it from source. It's sizable.


One new feature that will make Python documentarians happy is the `—-list-tables` flag for rST output: You can now convert any table to the list table syntax of reStructuredText, which is, in many's opinion, superior to classic tables with ASCII borders.


I'm in a job where I pretty much never need to output a PDF, but whenever the occasional thing comes around, pandoc is always there for me. Such a useful tool.


I use pandoc to convert GitHub style markdown to PDF/EPUB ebooks. The default output is good and there are plenty of customization options too. I didn't know LaTeX/CSS but stitched a few things together with help from stackexchange sites to customize the output produced. Later came to know there are third-party templates that I could've used/started with.


Man I'd love an asciidoc(tor) reader for pandoc sooo much. The existing toolchain is a big pain.


Since Pandoc has Lua inbuilt I wonder if it can also run LuaLatex in full? Because then it could support really all features of LaTeX and become a kind of SuperLaTeX.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: