Difference between revisions of "Convert text documents with Pandoc"

From Parallel Library Services
Jump to navigation Jump to search
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
https://pandoc.org
https://pandoc.org


Pandoc is a "universal document converter" which converts from one markup language to another.
Pandoc is a "universal document converter" which converts from one [[Markup language|markup language]] to another. Here are some basic recipes for converting documents.


In this guide, we try converting downloaded wiki pages (plain text in the .wiki format) to HTML files.
You can find instructions for installation on [https://pandoc.org/installing.html the Pandoc website] for your particular operating system. Once you have pandoc installed, open a terminal session to use its [[command line]] interface.


More extensive documentation is available in [https://pandoc.org/MANUAL.html the official Pandoc manual] or through the [[command line]] by typing  
More extensive documentation is available in [https://pandoc.org/MANUAL.html the official Pandoc manual] or through the [[command line]] by typing  


<pre>man pandoc</pre>
<syntaxhighlight lang="bash">man pandoc</syntaxhighlight>


== Getting started ==
==Common pandoc arguments==


You can find instructions for installation on [ the Pandoc website] for your particular operating system. Once you have pandoc installed, open a terminal session to use its command line interface.
<code>-f</code> or <code>--from</code> Option which is followed by the input format;


=== Example 1: Convert an HTML string to Markdown ===
<code>-t</code> or <code>--to</code> Option which is followed by the output format;
Enter a string of HTML and pipe it to pandoc:
 
<code>-o</code> or <code>--output</code> Option for file output
 
For example, using this command tells pandoc to use the file <code>example.md</code>, change it from <code>markdown</code> to <code>html</code>, with the output file to be named <code>example.html</code>:
 
<syntaxhighlight lang="bash">pandoc example.md -f markdown -t html -o example.html</syntaxhighlight>
 
This will save a new version of the file <code>example.html</code>, in html in the same directory the command is run from.
 
== Converting plain text Markdown ==
Files in Markdown can be saved in either <code>.txt</code> or <code>.md</code> formats. Both are [[plain text]] formats.
 
=== .md to .mediawiki ===
To convert files from Markdown to Mediawiki syntax (the same syntax this wiki uses):


<pre>echo "<h1>Hello Pandoc</h1><p>from html to markdown</p>" | pandoc -f html -t markdown</pre>  
<syntaxhighlight lang="bash">pandoc -w mediawiki filename.md -o filename.wiki</syntaxhighlight>


=== Example 2: Convert a MediaWiki file to HTML===
=== .md to .pdf ===
You can easily convert these pages into PDFs with flowing text, using a pdf engine.


# Save the content of a wiki page on to a [[plain-text]] file, example: <code>page.wiki</code>
For this, you need to have an up-to-date version of BasicTeX installed. On Mac, first install with homebrew:
# Convert:


<pre>pandoc page.wiki -f mediawiki -t html -o page.html</pre>
<syntaxhighlight lang="bash">brew install BasicTex</syntaxhighlight>


=== Common pandoc arguments===
This may take a while, and after installing BasicTex, you should close the terminal session and then begin a new one. Once you are in a directory that contains a [[Plain text|plain text]] you want to convert (e.g. MANUAL.txt or example.md), you can run commands to convert these into PDF using the pdf engine <code>xelatex</code>.


'''-f''' - option standing for “from”, is followed by the input format;
<syntaxhighlight lang="bash">pandoc MANUAL.md --pdf-engine=xelatex -o example13.pdf</syntaxhighlight>


'''-t''' - option standing for “to”, is followed by the output format;
<syntaxhighlight lang="bash">pandoc MANUAL.txt --pdf-engine=xelatex -o example13.pdf</syntaxhighlight>


'''-s''' - option standing for “standalone”, produces output with an appropriate header and footer;
== Convert HTML files ==


'''-o''' - option for file output;
=== Example 1: Convert an HTML string to Markdown ===
Enter a string of HTML and pipe it to pandoc:


'''page.wiki''' - mediawiki input filename
<syntaxhighlight lang="bash">echo "<h1>Hello Pandoc</h1><p>from html to markdown</p>" | pandoc -f html -t markdown</syntaxhighlight>


== Changing the default template ==
=== Example 2: Convert a MediaWiki file to HTML===


<pre>
# Save the content of a wiki page on to a [[Plain text|plain-text]] file, example: <code>page.wiki</code>
pandoc --from markdown --to html5 --print-default-template=html5 > template.html
# Convert:
pandoc --from markdown --to html5 --template template.html input.md -o output.html
</pre>


<syntaxhighlight lang="bash">pandoc page.wiki -f mediawiki -t html -o page.html</syntaxhighlight>


[[Category:Cookbook]]
[[Category:Cookbook]]
[[Category:Pandoc]]

Latest revision as of 21:06, 2 November 2021

https://pandoc.org

Pandoc is a "universal document converter" which converts from one markup language to another. Here are some basic recipes for converting documents.

You can find instructions for installation on the Pandoc website for your particular operating system. Once you have pandoc installed, open a terminal session to use its command line interface.

More extensive documentation is available in the official Pandoc manual or through the command line by typing

man pandoc

Common pandoc arguments

-f or --from Option which is followed by the input format;

-t or --to Option which is followed by the output format;

-o or --output Option for file output

For example, using this command tells pandoc to use the file example.md, change it from markdown to html, with the output file to be named example.html:

pandoc example.md -f markdown -t html -o example.html

This will save a new version of the file example.html, in html in the same directory the command is run from.

Converting plain text Markdown

Files in Markdown can be saved in either .txt or .md formats. Both are plain text formats.

.md to .mediawiki

To convert files from Markdown to Mediawiki syntax (the same syntax this wiki uses):

pandoc -w mediawiki filename.md -o filename.wiki

.md to .pdf

You can easily convert these pages into PDFs with flowing text, using a pdf engine.

For this, you need to have an up-to-date version of BasicTeX installed. On Mac, first install with homebrew:

brew install BasicTex

This may take a while, and after installing BasicTex, you should close the terminal session and then begin a new one. Once you are in a directory that contains a plain text you want to convert (e.g. MANUAL.txt or example.md), you can run commands to convert these into PDF using the pdf engine xelatex.

pandoc MANUAL.md --pdf-engine=xelatex -o example13.pdf
pandoc MANUAL.txt --pdf-engine=xelatex -o example13.pdf

Convert HTML files

Example 1: Convert an HTML string to Markdown

Enter a string of HTML and pipe it to pandoc:

echo "<h1>Hello Pandoc</h1><p>from html to markdown</p>" | pandoc -f html -t markdown

Example 2: Convert a MediaWiki file to HTML

  1. Save the content of a wiki page on to a plain-text file, example: page.wiki
  2. Convert:
pandoc page.wiki -f mediawiki -t html -o page.html