Difference between revisions of "Convert text documents with Pandoc"

From Parallel Library Services
Jump to navigation Jump to search
Line 1: Line 1:
https://pandoc.org
https://pandoc.org


Pandoc is a "universal document converter" which converts from one markup language to another.
Pandoc is a "universal document converter" which converts from one markup language (e.g. [[HTML]], [[Markdown]] to another. Here are some basic recipes for converting documents.


In this guide, we try converting downloaded wiki pages (plain text in the .wiki format) to HTML files.
You can find instructions for installation on [https://pandoc.org/installing.html the Pandoc website] for your particular operating system. Once you have pandoc installed, open a terminal session to use its command line interface.


More extensive documentation is available in [https://pandoc.org/MANUAL.html the official Pandoc manual] or through the [[command line]] by typing  
More extensive documentation is available in [https://pandoc.org/MANUAL.html the official Pandoc manual] or through the [[command line]] by typing  
Line 9: Line 9:
<pre>man pandoc</pre>
<pre>man pandoc</pre>


== Getting started ==
=== Common pandoc arguments===


You can find instructions for installation on [ the Pandoc website] for your particular operating system. Once you have pandoc installed, open a terminal session to use its command line interface.
<code>-f</code> or <code>--from</code> Option which is followed by the input format;


=== Example 1: Convert an HTML string to Markdown ===
<code>-t</code> or <code>--to</code> Option which is followed by the output format;
Enter a string of HTML and pipe it to pandoc:


<pre>echo "<h1>Hello Pandoc</h1><p>from html to markdown</p>" | pandoc -f html -t markdown</pre>  
<code>-s</code> or <code>--standalone</code> Option produces output with an appropriate header and footer;


=== Example 2: Convert a MediaWiki file to HTML===
<code>-o</code> or <code>--output</code> Option for file output


# Save the content of a wiki page on to a [[Plain text|plain-text]] file, example: <code>page.wiki</code>
== Convert plain text files (.txt) that are structured with [[Markdown]] to PDF ==
# Convert:
For this, you need to have an up-to-date version of BasicTeX installed. On Mac, first install with homebrew:


<pre>pandoc page.wiki -f mediawiki -t html -o page.html</pre>
<pre>brew install BasicTex</pre>


=== Common pandoc arguments===
Then, do:


<code>-f</code> or <code>--from</code> Option which is followed by the input format;
<pre>pandoc MANUAL.txt --pdf-engine=xelatex -o example13.pdf</pre>


<code>-t</code> or <code>--to</code> Option which is followed by the output format;
== Convert downloaded wiki pages (plain text in the .wiki format) to HTML files ==


<code>-s</code> or <code>--standalone</code> Option produces output with an appropriate header and footer;
=== Example 1: Convert an HTML string to Markdown ===
Enter a string of HTML and pipe it to pandoc:


<code>-o</code> or <code>--output</code> Option for file output;
<pre>echo "<h1>Hello Pandoc</h1><p>from html to markdown</p>" | pandoc -f html -t markdown</pre>  


<code>page.wiki</code> MediaWiki input filename
=== Example 2: Convert a MediaWiki file to HTML===


== Changing the default template ==
# Save the content of a wiki page on to a [[Plain text|plain-text]] file, example: <code>page.wiki</code>
 
# Convert:
<pre>
pandoc --from markdown --to html5 --print-default-template=html5 > template.html
pandoc --from markdown --to html5 --template template.html input.md -o output.html
</pre>


<pre>pandoc page.wiki -f mediawiki -t html -o page.html</pre>


[[Category:Cookbook]]
[[Category:Cookbook]]

Revision as of 14:29, 11 October 2021

https://pandoc.org

Pandoc is a "universal document converter" which converts from one markup language (e.g. HTML, Markdown to another. Here are some basic recipes for converting documents.

You can find instructions for installation on the Pandoc website for your particular operating system. Once you have pandoc installed, open a terminal session to use its command line interface.

More extensive documentation is available in the official Pandoc manual or through the command line by typing

man pandoc

Common pandoc arguments

-f or --from Option which is followed by the input format;

-t or --to Option which is followed by the output format;

-s or --standalone Option produces output with an appropriate header and footer;

-o or --output Option for file output

Convert plain text files (.txt) that are structured with Markdown to PDF

For this, you need to have an up-to-date version of BasicTeX installed. On Mac, first install with homebrew:

brew install BasicTex

Then, do:

pandoc MANUAL.txt --pdf-engine=xelatex -o example13.pdf

Convert downloaded wiki pages (plain text in the .wiki format) to HTML files

Example 1: Convert an HTML string to Markdown

Enter a string of HTML and pipe it to pandoc:

echo "<h1>Hello Pandoc</h1><p>from html to markdown</p>" | pandoc -f html -t markdown

Example 2: Convert a MediaWiki file to HTML

  1. Save the content of a wiki page on to a plain-text file, example: page.wiki
  2. Convert:
pandoc page.wiki -f mediawiki -t html -o page.html