Pandoc 3.0

gpoore · on Jan 19, 2023

I've been frustrated by Markdown previews not supporting Pandoc features, so I created a Pandoc-based Markdown preview for VS Code [1]. The preview supports all Pandoc extensions to Markdown syntax, because Pandoc itself generates the preview. There is also optional support for code execution with Jupyter kernels. I'm currently in the process of adding support for non-Markdown formats (including scroll sync), plus taking advantage of some of the new Pandoc 3.0 features.

[1]: Examples and animations: https://codebraid.org/presentations/scipy2022/. Installation for VS Code: https://marketplace.visualstudio.com/items?itemName=gpoore.c.... Installation for VSCodium: https://open-vsx.org/extension/gpoore/codebraid-preview.

leipert · on Jan 19, 2023

Funny. During my bachelor thesis I added Pandoc as a renderer to an Atom markdown preview extension. (instead of actually writing my thesis)

https://github.com/atom-community/markdown-preview-plus/pull...

Old is new, the editor and the extension are now defunct. What was best about this exercise, I got so well versed with the markdown and Pandoc features at the time, that I didn’t need the preview at all.

iconhacker · on Jan 19, 2023

The release 3.0 is unusable on windows. When running pandoc —version, it spawns off many instances and won’t return the value. Anyone has encountered such an issue?

dosshell · on Jan 19, 2023

Does that mean that you are adding support for latex etc?

this is awesome, thank you for your work.

gpoore · on Jan 19, 2023

Yes, I'm adding support for arbitrary text-based formats, including LaTeX. So it will be possible to write LaTeX and see a live HMTL preview generated by Pandoc.

In principle, it should be possible to create a PDF preview with proper SyncTeX support for synchronizing LaTeX source and PDF preview locations, but that gets complicated when Pandoc+LaTeX generate the PDF. It may be best to leave LaTeX-PDF previews to dedicated LaTeX previewers that don't involve Pandoc.

noiwillnot · on Jan 19, 2023

> Yes, I'm adding support for arbitrary text-based formats

That would be extremely powerful, and also would allow you to differentiate your extension from the Quarto one.

gpoore · on Jan 19, 2023

I actually released my extension around the same time that the Quarto extension came out. Quarto is great for documents running R code or needing some of Quarto's advanced document features. My extension has scroll sync and the preview updates live while you type. If you need code execution, you can use multiple Jupyter kernels per document and execute inline code. Also, code execution is non-blocking, so the preview still updates when you type, and code output appears live as it becomes available.

maweki · on Jan 19, 2023

Pandoc is a great piece of software. As a university teacher and researcher, I use it in three ways:

1. I write markdown for my website and for the websites for my research projects and simply generate standalone html out of it. Done.

2. When we create electronic exams, the exam platform takes questions using a html-backed rich text editor. We write down our exam questions using markdown, create html document fragments, that we simply paste into the exam platform.

3. When students do electronic exams, we receive xml files from our exam platform. We use python to pass on submissions to different submission checkers (akin to autograders or static analysis) and create yaml files with the student submission and grading suggestions and static analysis annotations. We manually review and grade and comment within the yaml file (that works incredibly well), collect all the data using python and generate markdown reports for each student, including their submission, our comments and scoring. We pass this markdown through pandoc, creating well layouted pdfs which we either print and hand out or send out electronically.

Pandoc fits our yaml+markdown-based processes very well. Only for the actual research papers we still write LaTeX and build pdfs without pandoc.

phlummox · on Jan 19, 2023

Interesting! I use a very similar process for creating exams and student projects, but am the only one in my department who does so. Are any of your processes/tools publicly available? (Mine are basically cobbled together in Haskell and Python.)

maweki · on Jan 19, 2023

Sorry, same. There's such a myriad of e-learning platforms in Germany and I guess it's the same for most countries.

I would believe the same goes for our own research static analysis and autotrading platform (in our case SQL) which probably every CS department also has quite a few of.

I wouldn't put my hopes up for anything publicly available that fits your platform and has a bus factor higher than 1.

phlummox · on Jan 20, 2023

Ah well, I can dream, I guess :)

Every now and then, I ponder putting some of my scripts together into something I could actually hand over to someone else, but have not yet had the time.

WolfOliver · on Jan 19, 2023

for the research papers which you write in LaTeX you should have a look at MonsterWriter.

Disclaimer: I'm the creator of MonsterWriter and very keen to receive feedback and learn about how universities and their students write papers, thesis, ...

maweki · on Jan 19, 2023

Though I don't really like you advertising, thank you for the suggestion. As a computer science researcher I'll give you some feedback why your application is a total deal-breaker for me and my colleagues:

* It's not running on Linux. Nobody in our department runs windows or mac.

* We already have huge BibTex citation libraries that we use in papers and just reference the necessary papers. These citation library files grow and grow. I won't manually add citations for each paper.

* We collaborate and version through git. If collaborative writing and version control does not work at least as easy as our plaintext-git-handling, that's a hard no.

* You do know that for conference or journal submission word and LaTeX templates with given page limits in these templates are given, right? How would I use, say, LNCS in MonsterWriter? Writing seems not to be page-based. How do I know that I'm over the limit?

* My wife is a researcher in the social sciences, and they extensively use MS Word's change tracking and merging feature to write papers. If MonsterWriter does not support this in an accessible and visually appealing manner, it would be a hard no for her as well.

With your feature set, you're not really targeting researchers, even if you think you do.

WolfOliver · on Jan 19, 2023

Thank you for your time to answer, very much appreciated :)

tambourine_man · on Jan 19, 2023

It seems you're being downvoted for proposing a tool you created and disclaimed as such. I think that's perfectly fine and in the spirit of HN.

CJefferson · on Jan 19, 2023

Just tried opening it. It's looks nice, but I'm going to write some quick, slightly negative, comments, based on your claims about using it.

The table formatting is not good enough. It's not obvious how to left-justify a column. It's also not clear how to line a column up along "." (which I often use for numbers). Both of these are fairly easy in LaTeX.

The outputted LaTeX looks OK, but it's not obvious how to format -- most journals, and Universities (for PhDs) will have a fixed style you have to use. I suppose I could take the LaTeX and randomly hack it, but then I need to learn LaTeX to fix any issues that causes.

WolfOliver · on Jan 19, 2023

both fair points, tables need still improvement.

Regarding the outputted LaTeX, the idea is to grow the amount of supported templates. So there would be templates for every important journal. For now the focus is to make the thesis template flexible enough that it works for most bachelor/master thesis.

Terretta · on Jan 19, 2023

It's cool you're using SetApp, thank you.

For HN reading along, SetApp is a way to distribute apps and get paid outside the app store. Really, that exists.

// Disclosure: Unless you are disavowing your ability as author to offer a recommendation that can be trusted, you probably mean "Disclosure" not "Disclaimer". Disclosure = here is my potential bias. Disclaimer = YMMV, no warranties express or implied.

WolfOliver · on Jan 19, 2023

You can also download the app without SetApp. No Subscription needed!

bronikowski · on Jan 19, 2023

I love Pandoc. I don't often write "documents for office consumption" but when I do, I just write a markdown file and spit out docx or PDF. I was congratulated more than once on how coherent my documents are in their structure.

Plus, having a git history is a great boon.

yabones · on Jan 19, 2023

It's also not too difficult to hook up a GH actions job to generate the documents with pandoc and spit them out directly into dropbox/sharepoint for "non-techie consumption". Great for semi-technical documentation that bis/sales/support people need to be in the loop on.

qbasic_forever · on Jan 19, 2023

Oh that's a great idea! I wish I had pandoc in my university days--I ended up writing a lot of (non-technical) papers in latex just because I hated using word for the task.

mrehler · on Jan 19, 2023

Out of every tool I’ve ever used to make a .docx file from Markdown, Pandoc is the only one that has consistent results with converting Markdown headers to Word styles rather than just a bigger font size. Lots of Markdown tools in my tool belt, and would love to know of any more that can do this, because it’s really useful on the (unfortunate) occasions something needs to live as a Word doc.

Veen · on Jan 19, 2023

I do wish there was an easy way to create Word document titles from H1s in Markdown. It makes sense that they should be converted to top-level headings, but it adds an annoying bit of friction to my workflow.

fiddlosopher · on Jan 19, 2023

Oh, but there is!

    pandoc --shift-heading-level-by=-1 input.md -o output.docx

This will promote level-2 headings to level-1, and promote a level-1 heading at the top of the document to the document's title.

Veen · on Jan 19, 2023

Oh really? I've tried --shift-heading in the past and it worked to move headings up a level, but not to the title. I'll have to read the docs more carefully and give it another go. Thank you.

kelsolaar · on Jan 19, 2023

Quarto, excellent software for building publications, websites, etc…, is leveraging Pandoc: https://quarto.org/

jph · on Jan 19, 2023

Pandoc is such a great conversion program, and this new 3.0 release has so many improvements, especially for figures.

I write in markdown and export to PDF and using pygments for code syntax coloring, with .tex files to adjust layouts, tables, and the like.

https://github.com/SixArm/pandoc-from-markdown-to-pdf

account-5 · on Jan 19, 2023

Love this program. Means I can write in plain text, markdown or zim-wiki syntax, and export to word no hassles.

If I'm writing markdown I use pandocs version as it has support for advanced tables.

Brilliant software.

cosmic_quanta · on Jan 19, 2023

That's great news. I've been waiting for years for a dedicated 'Figure' element. The workaround was pretty brittle. It'll make pandoc-plot [0] easier to maintain as well.

[0]: https://github.com/LaurentRDC/pandoc-plot

zdw · on Jan 19, 2023

Does it still automatically generate "smart" quotes (which are anything but) from traditional ones during conversion?

Love the tool, but this is the most awful default setting I've seen in a program in a while, especially if you include any code that depends on quotes not being mangled.

johnday · on Jan 19, 2023

If you specify that you just want bog-standard markdown then it will not generate smart quotes. To wit:

- `pandoc foo.tex -t markdown foo.md` will not produce smart quotes.

- `pandoc foo.tex -t markdown-smart foo.md` will produce smart quotes.

zdw · on Jan 19, 2023

I think the examples you give might be the opposite of what happens - in the docs:

https://pandoc.org/MANUAL.html#extension-smart

The meaning of the -smart extension on option names is inverted in some cases, and enabled by default on markdown output.

simonmic · on Jan 19, 2023

This I agree with. I don't know the exact current status, but having debugged related rendering issues many times over the years, I wish it had always been hard to enable conversion to so-called smart quotes, rather than hard to prevent it.

simonmic · on Jan 19, 2023

I should add: the above is about the only quibble I can think of, which is impressive. I love love love pandoc! It's a highly dependable and capable swiss army docs tool. I use it constantly, eg to help generate CLI help text and HTML, man, info and plain text manuals from (mostly) markdown sources. Huge congrats and thanks to the developers for their hard work and for this latest release.

leephillips · on Jan 19, 2023

It’s configurable, and, in any case, Pandoc will not alter the quotes in your code blocks nor in inline code.

CJefferson · on Jan 19, 2023

I love pandoc. With it's lua filters, I love using it for generating html and blog posts, one thing which always annoys me about most static website generating tools is they make you use some very limited templating language, when I just want to use a proper programming language.

My only irritation -- while I understand why one would want to do it for neatness, it's annoying that the "pandoc" package no longer provides the "pandoc" program! Maybe instead introducing "pandoc-core" and renaming "pandoc-cli" to "pandoc" would be better (it would certainly avoid breaking existing scripts, like mine).

y04nn · on Jan 19, 2023

If I understand correctly the package pandoc-cli installs the pandoc binary [1], so it shouldn't beak that many scripts.

[1]: https://github.com/jgm/pandoc/blob/535bd0393fe7b2f287903b942...

sramsay · on Jan 19, 2023

This (I also generate my blog using pandoc). In another case, I wanted to go from Markdown to groff -mom and it was a totally straightforward matter with a custom Writer in Lua.

gyulai · on Jan 19, 2023

I've been looking for a tech stack to replace latex for decades now. As a very recent development, the combination of pandoc+weasyprint (plus a little bit of homebrewed pandoc filter magic) has now become good enough for my needs, and I have finally been able to take the plunge. Feels great.

For those who are a little less adventurous and who happen to be in the social sciences, humanities, journalism, etc., pandoc+msword is also definitely worth looking into. It's a much better tech stack than standalone msword. -- It's really only in the STEM fields that, in my mind, there really is no way around latex.

tikhonj · on Jan 19, 2023

No way around LaTeX, but Pandoc + LaTeX is still a much nicer experience than plain LaTeX in my experience.

adityaathalye · on Jan 19, 2023

Pandoc powers my little static site maker:

cf. https://github.com/adityaathalye/shite/blob/master/bin/templ...

  __shite_templating_compile_source_to_html() {
      # If content has front matter metadata, it is presumed to be in a format
      # that the content compiler can safely process and elide or ignore.
      local file_type=${1:?"Fail. We expect file type of content like html, org, md etc."}
  
      case ${file_type} in
          html )
              pandoc -f html -t html
              ;;
          md )
              pandoc -f markdown -t html
              ;;
          org )
              pandoc -f org -t html
              ;;
      esac
  }

BeetleB · on Jan 19, 2023

Mine too. I author in org and have a plugin that converts the org files to rst files.

adityaathalye · on Jan 19, 2023

Ah, I make it compile partial HTML (just page content), and stick that into my own HTML page template(s) (written as HEREDOCS :D).

That's part of the joy of using Pandoc. I can pipeline it, no problem.

Like this:

cf. https://github.com/adityaathalye/shite/blob/master/bin/templ...

  cat "${watch_dir}/sources/${url_slug}" |
      __shite_templating_compile_source_to_html ${file_type} |
      __shite_templating_wrap_content_html ${content_type} ${watch_dir} |
      __shite_templating_wrap_page_html \
          > "${watch_dir}/public/${html_url_slug}"

Templates look like this. Notice the $(cat -) in the middle. That's how the HTML content produced by Pandoc gets injected in the middle of everything else.

  shite_template_common_default_page() {
      local maybe_page_id=${shite_page_data[page_id]:+"id=\"${shite_page_data[page_id]}\""}
      local maybe_canonical_url=${shite_page_data[canonical_url]:+"<link rel=\"canonical\" href=\"${shite_page_data[canonical_url]}\">"}
  
      cat <<EOF
  <!DOCTYPE html>
  <html lang="en">
      <head>
          $(shite_template_common_meta)
          $(shite_template_common_links)
          ${maybe_canonical_url}
      </head>
      <body ${maybe_page_id}>
        <div id="the-very-top" class="stack center box">
            $(shite_template_common_header)
            <main id="main">
              $(cat -)
            </main>
            $(shite_template_common_footer)
        </div>
      </body>
  </html>
  EOF
  }

edit: substantiate Pandoc's role.

BeetleB · on Jan 19, 2023

In my case I'm using Pelican which has builtin support for rst, so it was easier for me to just convert to rst rather than full blown HTML.

adityaathalye · on Jan 20, 2023

Since I've rolled my own SSG, I wanted it to compile $FORMAT -> HTML directly, so I write fewer bugs :D

I chose Pandoc because it does a reasonably OK job compiling orgmode, _and_ has good support for other formats I use from time to time (e.g. markdown, ASCIIDOC).

Before this, I was using hugo, with a compile cycle similar to your setup, viz. org -> markdown (via ox-hugo), and then hugo did the md -> HTML thing. hugo sort of supports org -> html, but their batteries-included compiler is not very good. Points for trying, though.

edit: typo, context

snet0 · on Jan 19, 2023

I've been using Pandoc to write Latex-lite for a couple years now. Just write .md files with basic Markdown syntax for all the major text content, and add some Latex when I need to do something more particular. Best of both worlds, really.

quijoteuniv · on Jan 19, 2023

Do you want to change yve world? Write software like this. Pandoc is great.

toastal · on Jan 19, 2023

One day I wish to see the AsciiDoc(tor) Reader. I'd love to be freed from Ruby as AsciiDoc is superior to Markdown and most other lightweight markup syntax options in features and syntax. This lack of features is why we have an incompatible group of Markdown syntax forks (aka "flavors" to mask that forks are incompatible).

binarycoffee · on Jan 19, 2023

As others wrote, Pandoc is Haskell so it compiles to a fairly efficient binary.

But more importantly, unlike the various Markdown flavors or AsciiDoc, it is incredibly extensible thanks to the combination of custom filters and the possibility to add HTML classes and attributes. One can write filters to leverage the class/attribute information and perform transformations at the AST level, which basically lets you define a DSL with an arbitrary number of custom elements.

I wrote a collection of filters for the publication of a large online legal playbook. Not only did Pandoc make it possible to introduce different kind of custom elements that don't exist in plain Markdown or AsciiDoc, but by using different filters it was possible to use a single Markdown source to generate both the book and various summaries such as a list of examples, a list of civil code clauses etc. I don't know Haskell that well so I used Rust for the filters, but that worked very well.

Pandoc is IMO a very underrated tool.

johnday · on Jan 19, 2023

> Pandoc is Haskell so it compiles to a fairly efficient binary.

This is nebulous. Haskell's compiled binaries are not ideal, for a number of reasons.[^1] GHC does very little to optimise for many typical metrics of "efficient". The binaries it produces are enormous because it (unavoidably) bundles the runtime along with the program itself, and there is a lot of empty space in the binaries. Shrinking them can improve startup times significantly especially on spinning rust drives.

That said, Haskell programs are at least _compiled_, and they do result in binaries which, if well written, can result in running times comparable to (or, sometimes, shorter than) your average hand-rolled C code that achieves the same goals.

Of course, none of this casts any shadow on the fact that Pandoc is, indeed, an excellently engineered piece of software that stands as a testament to the value of Haskell for real-world business logic and problem solving.

[^1]: This problem is fairly well-understood in the Haskell community: https://dixonary.co.uk/small

matklad · on Jan 19, 2023

Take a look at https://djot.net/, you might like that.

toastal · on Jan 19, 2023

It does do some things better and I appreciate calling it a new name and 'starting over' instead of another fork, but what's not covered is metadata. If I want to add author, license, tags, keywords, description, etc. there is no in-document way to do this. Almost all other media format types from images, audio, to other documents like ODF have a way to do metadata and this (and Markdown) doesn't cover said important use case.

Seems there's a long-standing (for the project) open issue where it's still being mulled over.

Imports are also very nice for writing longer texts--especially how AsciiDoc lets you +1 all of your headings so the heading hierarchy works as a standalone document and a part of a larger whole.

matklad · on Jan 19, 2023

Yeah, it’s definitely a wip at this point, though it is already general enough to express meta and imports. You could write

    # Title
    : key = value
    
    ```include
    ./examples/hello.rs
    ```

today and write a simple filter to extract meta from the first definition list and resolve includes.

toastal · on Jan 26, 2023

I appreciate that you pointed it out though to give me a chance to reevaluate my thoughts on the project. It seems it has a better trajectory than when I had last looked.

geokon · on Jan 19, 2023

I've successfully used it from Clojure (I think it's through JRuby). With a few lines of code you can configure AsciiDoctor to whatever you need. It's way easier than fiddling at the command-line (I couldn't immediately understand how to get extensions and how it played with whatever version of the software I got through `apt`). It'd be good to have alternatives just for the sake of it - but I didn't find anything particularly lacking

Here is how I made some reveal slides

    (import [org.asciidoctor
             Asciidoctor
             OptionsBuilder
             SafeMode])

    (let [input-file    (clojure.java.io/file
                          "path/to/adoc/file")
          adoctor       (org.asciidoctor.Asciidoctor$Factory/create)
          reveal-option (doto
                              (org.asciidoctor.OptionsBuilder/options)
                          (.backend
                              "revealjs")
                          (.safe
                              org.asciidoctor.SafeMode/UNSAFE)
                          (.attributes
                              (.attribute
                            (org.asciidoctor.AttributesBuilder/attributes)
                            "revealjsdir"
                            "../reveal.js")))]
    (.requireLibrary
        adoctor
        (into-array
          String
          ["asciidoctor-revealjs"]))
    (.convertFile
        adoctor
        input-file
        reveal-option))

You get all the codez from Maven so you don't need to install anything on your system

    {'org.asciidoctor/asciidoctorj-revealjs {:mvn/version "5.0.0.rc1"}
     'org.asciidoctor/asciidoctorj-pdf      {:mvn/version "1.6.2"}
     'org.asciidoctor/asciidoctorj          {:mvn/version "2.5.3"}

The maintainers seem very responsive and active on Github. It's not as nice as a spec and multiple implementations - and I guess you're locked in to one library, but at least it's not as bad as Orgmode - where you're locked in to an editor as well

toastal · on Jan 19, 2023

Yes, the maintainer is great. I have at this point just used Nix and post-processed Asciidoctor instead of trying any sort of other tools, but it gets tricker as you noted if you want to use it inside something else. It's not a compiled binary nor is it a C lib other languages could get at. Much of that could be attributed to the spec being quite.

thekaleb · on Jan 19, 2023

Pandoc is written in Haskell.

chungy · on Jan 19, 2023

AsciiDoc is written in Python.

Though when it comes to annoyance with Markdown forks: AsciiDoctor is basically that to AsciiDoc. It's mostly compatible, but when it isn't, it really bites.

toastal · on Jan 19, 2023

The more important part is that you end up with a binary instead of needing an interpreted language which makes the tooling a mess. Python and Ruby are the same thing to most people.

tetris11 · on Jan 19, 2023

fantastic software, never build it from source, or if you have to, make sure you have an OS that bundles all the Haskell dependencies into a single meta package

gregwebs · on Jan 19, 2023

Haskell tooling went from awful to best in class after stack came out. Look for “Quick stack method” in the installing page- It should be easy to build from source now with just a few commands. No doubt will take a long time to compile all the packages and you might still have issues tracking down any non-Haskell dependencies (c libraries).

bitmapper · on Jan 20, 2023

this got replaced with ghcup iirc

josephcsible · on Jan 19, 2023

Pandoc is really easy to build with just Haskell and Cabal installed via ghcup, without needing any Haskell packages from your distro.

jesprenj · on Jan 19, 2023

It or one of its libraries did not compile for me OOTB in Gentoo. Granted it's marked as nonstable (~*). That's unfortunate, because I really liked to use it when I was mainly running Debian. Though I didn't really put any debugging effort into making it work.

mrspuratic · on Jan 19, 2023

I did this years back, it took, ehm, quiet some time. I use a binary now :D ISTR it starts with building a self-hosting Haskell compiler...

Tyr42 · on Jan 19, 2023

Yeah, I ended up upgrading my machine's ram when I was building it from source. It's sizable.

theletterf · on Jan 19, 2023

One new feature that will make Python documentarians happy is the `—-list-tables` flag for rST output: You can now convert any table to the list table syntax of reStructuredText, which is, in many's opinion, superior to classic tables with ASCII borders.

Brendinooo · on Jan 19, 2023

I'm in a job where I pretty much never need to output a PDF, but whenever the occasional thing comes around, pandoc is always there for me. Such a useful tool.

asicsp · on Jan 19, 2023

I use pandoc to convert GitHub style markdown to PDF/EPUB ebooks. The default output is good and there are plenty of customization options too. I didn't know LaTeX/CSS but stitched a few things together with help from stackexchange sites to customize the output produced. Later came to know there are third-party templates that I could've used/started with.

jszymborski · on Jan 19, 2023

Man I'd love an asciidoc(tor) reader for pandoc sooo much. The existing toolchain is a big pain.

amai · on Jan 19, 2023

Since Pandoc has Lua inbuilt I wonder if it can also run LuaLatex in full? Because then it could support really all features of LaTeX and become a kind of SuperLaTeX.