Pandoc Html To Markdown



  1. Pandoc Html To Markdown Download
  2. Pandoc Convert Html Table To Markdown
  3. Pandoc Html To Markdown Code

Markdown content on DEV. I'd been looking for a way to convert my notes to webpages. Typically I wrote my notes in.txt form and then went through them and added links, formatting. When I was ready to blog them. Whereas Markdown was originally designed with HTML generation in mind, pandoc is designed for multiple output formats. Thus, while pandoc allows the embedding of raw HTML, it discourages it, and provides other, non-HTMLish ways of representing important document elements like definition lists, tables, mathematics, and footnotes. 然而,pandoc 的目標與原始 markdown 的最初目標有著方向性的不同。在 markdown 原本的設計中,HTML 是其主要輸出對象;然而 pandoc 則是針對多種輸出格式而設計。因此,雖然 pandoc 同樣也允許直接嵌入 HTML 標籤,但並不鼓勵這樣的作法,取而代之的是. For example, markdownstrict+footnotes is strict Markdown with footnotes enabled, while markdown-footnotes. You can use pandoc to produce an HTML +.

I prefer to use Microsoft Word for most of my writing. I prefer Word because its spell and grammar checker is superior to every other word processor or text editor I have tried. In addition, word has text to speech build in. I use text to speech to have my text spoken to me in order to catch errors and I catch a lot of errors this way. While I write my blog posts in English, English is not my first language and I need these tools to keep spelling and grammar errors to a minimum.

I use the static site generator Pelican for this blog and it generates the blog from ether restructured text or markdown files. I have written about Pelican in my blog post The Static Site Generator Pelican VS WordPress.

I have been using Pandoc to convert markdown to Word documents or PDFs for years. A Google search for a way to convert from Word to markdown did not give any usable result. Therefore, up until now I have just copied and pasted the text making sure not to do any markdown syntax until after I had done spell checking in Word.

Pandoc Html To Markdown Download

Then a couple of weeks ago I was reading the Pandoc docs to solve a different problem and I came across the section where it is described how Pandoc can convert from docx to markdown. I do not know if this is new or why Google did not find this for me but I immediately forgot the problem I was trying to solve and began testing it.

It turns out to be quite simple to convert a docx to markdown. The following example is from the Pandoc demos site.

However the generated markdown from the above command has a few issues.

The lines are only 80 characters long. I do not know why an 80-character line length is the default but I do not like it. This is fortunately quite easy to fix with the option –no-wrap.

Pandoc Convert Html Table To Markdown

Pandoc Html To Markdown

Links do not use the reference style. I prefer the reference style links because it makes the text less cluttered by moving the link it self to the bottom of the file. This is also easy to fix with the option –reference-links.

With the two options added the command looks like this.

Now the generated markdown is very readable and close to what I would write myself. I only use Word to write text with simple formatting like lists, italic, bold, and links. The syntax for images and code I add to the generated markdown file along site the metadata that Pelican needs. Although I do not use it at this time, Pandoc can extract images from a docx.

Pandoc

Pandoc Html To Markdown Code

The option to extract images from the docx file and more can be found at the Pandoc options page.

Edit: The option page url has changed and is now http://pandoc.org/README.html#reader-options

So there you have it, sometimes what you need is right under your nose :).

pandoc-docx-md.bat
:: pandoc-docx-md.bat
::
:: Don't show these commands to the user
@ECHOoff
:: Set the title of the window
TITLEConvert docx to markdown with Pandoc
:: Select file marker
:selectfile
:: Clear any preexisting filename variables
SETfilename=
:: Ask which file we're converting.
SET /p filename=Which file? (Don't include the .docx file extension):
:: Feedback
ECHO Running pandoc...
:: Run pandoc
CALL pandoc -f docx -t markdown_mmd '%filename%'.docx --output='%filename%'.md --atx-headers --wrap=none --toc --extract-media=''
:: Feedback
ECHO Done. Ready for another file.
:: Let the user easily run that again
SETrepeat=
SET /p repeat=Hit enter to convert another file, or any other key and enter to stop.
IF'%repeat%'''GOTO selectfile
:: Otherwise end
:end
pandoc-docx-md.sh
#!/bin/bash
cd -- '$(dirname '$0')'
# That tells the system to use a Bourne shell interpreter,
# and then tells OSX to run this script from the current directory.
# Don't echo these commands:
set +v
repeat=
while [ '$repeat'='' ]
do
# Clear any preexisting filename variables
filename=
# Ask which file we're converting.
echo'Which file? (Don't include the .docx file extension): '
read filename
# Feedback
echo'Running pandoc...'
# Run pandoc
pandoc -f docx -t markdown_mmd '$filename'.docx --output='$filename'.md --atx-headers --wrap=none --toc --extract-media=''
# Feedback
echo'Done. Ready for another file.'
# Let the user easily run that again
repeat=
echo'Hit enter to convert another file, or any other key and enter to stop. '
read repeat
# Otherwise end
done

commented May 23, 2018

Dear

Thanks for the script to convert docx to markdown.

I am looking to generate the correct markdown for BITBUCKET.

I used the format 'gfm'. When I am readind with a standard markdown viewer, the pictures are correctly renderised.
But when I push to the Bitbucket

Now, when I am reading with a warkdown viewer, i can't see :

  • pictures
  • Table of content : is generated but not link
  • Table of Images : is generated but not link

Do you have any ideas ? or which format do I need to use to fix that ?

Best Regards,
Youssef

commented Aug 22, 2018

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment