TeX to HTML

I want to create a TeX to HTML service so that I can upload notes in TeX and convert them to HTML. I am going to try to start a habit of reading a research paper per day, taking notes in TeX, and uploading the notes on this site.

Date Created:
Last Edited:

Introduction


to HTML conversion has long been tricky. There are basically three approaches:

  1. Assume that your code is basically just flavored markup, and use a converter that reads such simple , like pandoc (or ). If your code is simple, this works great.
    1. Blog Post Giving an Example
  2. Use a package that compiles your using itself but provides added info in the resulting file, then turn that file into HTML. That's the approach TeX4HT and Lwarp use.
  3. LaTeXML: This is a reimplementation of the kernel, but it outputs to XML instead of to DVI. It does natively support a large number of popular packages and classes, but packages it does not support can be loaded and "compiled" using the --includestyles flag.
    1. This is the solution that Richard Zach recommends.


TeX4HT


TeX4ht is a system for converting documents written in TeX/LaTeX/ConTeXt/etc. to HTML, various XML flavors, braille, etc., optionally using MathML.

Features

  • it supports most LaTeX packages and custom commands
  • it supports various input formats
  • extensive support for modification of the output

Links

Basic Invocation for Modern Output

Tex4ht can be invoked in several ways. The original way is to use the htlatex command. To convert a LaTeX source file.tex to HTML5 that uses UTF-8, with MathML:

$ htlatex file.tex "xhtml,html5,mathml,charset=utf-8" " -cunihtf -utf8"

An easier way is to use make4ht. The following command produces the same output as the previous one, HTML5 in UTF-8 encoding with MathML:

$ make4ht file.tex "mathml"
# If you wat to have MathJax rasterize the MathML
$ make4ht file.tex "mathml,mathjax"
# Perhaps the best method of all is to insert LaTeX into the HTML output, ans have MathJaX rasterize the LaTeX
$ make4ht file.tex "mathjax"

Documentation

TeX4ht is a system that converts LaTeX to various output formats, including HTML.

Basic Usage

$ make4ht filename.tex

By default, TeX4ht converts to html. You can convert to other formats using the -f option:

$ make4ht -f odt filename.txt

Due to the fact that tex4ht requires you to edit the .tex file, I think I am going to stick with LaTeXML.

LWarp


The lwarp package converts  to HTML by using  to process the user’s document and directly generate HTML tags. External utility programs are only used for the final conversion of text and images. Math may be represented by SVG images or MathJax. More than 500 packages and classes are supported, of which more than 60 also support MathJax.

is popular due to the language's visibility, stability, and portability of plain-text markup, regular expression search and replace of both text and formatting commands, easy revision control, the ability to handle large and complex documents, extensive programming capabilities, and the large number of user-supplied packages solving real-world problems. In many cases, it's still faster to type a few arguments than it is to open a dialog box and select and fill in entries, and a powerful programming text editor is more responsive thana word processor.

to HTML is needed because of the rise of self-publishing and the need for scientists, professors, and engineers to publish their own papers on their own websites.

Both HTML5 and CSS3 are quite capable, to the point where they can be used to produce technical books. Nevertheless, there are some practical problems to overcome in order to create a good conversion form to HTML.

to HTML with lwarp

The lwarp package produces an HTML version of your document with accessibly rendered mathematic while allowing you to use macros, theorem environments, tikz pictures, and all the other bells and whistles you are used to. This note tells you how to get started with lwarp.

Long Story Short

Starting with file.tex, add the following snippet right after the \documentclass line:

\usepackage[mathjax]{lwarp}

Run pdflatex on file.tex as usual enough times to resolve references. Then, in the terminal, invoke this incantation:

lwarpmk html

I am not going to use lwarp, so I am going to stop reading here. LaTeXML seems easier.


LaTeXML: A LaTeX to XML/MathML/HTML Converter


LaTeXML

In the process of developing the Digital Library of Mathematical Functions, we needed a means of transforming sources of our material into XML which would be used for further manipulations, rearrangements and construction of the web site. In particular, a true 'Digital Library' should focus on the semantics of the material, so we should convert the mathematical material into both content and presentation MathML. At the time, we found no suitable software to our needs, so we bean the development of in-house.

The approach of this software is to emulate as far as possible (in Perl), converting / document into 's XML format. That format is then further transformed into HTML of various flavors, with MathML and SVG.

Usage

In most cases, all that should be need to convert file to XML and then to HTML would be:

$ latexml --dest=mydoc.xml mydoc
$ latexmlpost --dest=somewhere/mydoc.html mydoc.xml

This will carry out default transformation into HTML5, which represents mathematics using MathML. Different file extensions (or the --format option) imply different output formats, including XHTML, HTML 4 w/images for math, JATS, TEI. There are also options to split large documents into several pages, or to combine multiple documents into a single site.

The functionality of latexml and latexmlpost are conveniently combines into the single executable latexmlc, without creating the intermediate XML file. The above commands are equivalent to:

$ latexmlc --dest=somewhere/mydoc.html mydoc

Download

$ sudo dnf install LaTeXML # RPM-based system, this installs preqrequisites as well
$ sudo yum intall LaTeXML # RPM based alternative

C:> choco install latexml # Windows, this may require to have TeX downloaded

Prerequisites

These are installed using the commands above.

  • Perl Modules
  • Image::Magick or Graphics::Magick
  • UUID::Tiny
  • perl-doc

Installing Prerequisites:

$ sudo dnf install \  perl-Archive-Zip perl-DB_File perl-File-Which \  perl-Getopt-Long perl-Image-Size perl-IO-String perl-JSON-XS \  perl-libwww-perl perl-Parse-RecDescent perl-Pod-Parser \  perl-Text-Unidecode perl-Test-Simple perl-Time-HiRes perl-URI \  perl-XML-LibXML perl-XML-LibXSLT \  perl-UUID-Tiny texlive ImageMagick ImageMagick-perl # RPM-based systems

This software is in the public domain and is not subject to copyright protection.

The Manual

The design goals of are:

  • Faithful emulation of 's behavior
  • Easily extensible
  • Lossless, preserving both semantic and presentation cues
  • Use an abstract -like, extensible document type
  • Infer the semantics of the mathematical content

Using LaTeXML

The main commands provided by the system are:

  • latexml for converting and BibTeX sources to XML
    • Converts document (or standard input) to XML. It loads any required definition bindings, reads, tokenizes, expands and digests the document creating an XML structure. It then performs some document rewriting, parses the mathematical content and writes thee result to an XML file.
    • Useful options:
      • --verbose or --quiet depending on whether or not you want to see detail of progress and debugging messages being printed during processing. They can be added multiple times to get more / less details.
      • --path={directory}: Dictionaries to search (in addition to the working directory) for various files can be specified using this command.
      • --includestyles can be used to tell LaTeXML to process style files (It doesn't process these files by default).
  • latexmlpost for various postprocessing tasks including conversion to HTML, processing images, conversion to MathML and so on
    • Command carries out a set of appropriate transformations in sequence:
      • scanning of labels and ids
        • Collects information about all labels, ids, indexing command, cross-references and so on to be used in following stages
      • filling in the index and bibliography
        • An index is built from \index markup
        • When a document contains a request for bibliographies, typically de to the \bibliography{..} command, the postprocessor will look for the named bibliographies.
      • cross-referencing
        • In this stage, the scanned information is used to fill in the text and links of cross-references within the document. The option --urlstyle can control the format of urls within the document.
      • conversion of math
        • Some specific of the mathematics can be requested with these options:
          • --mathimages - converts math to png images
          • --presentationmathml creates Presentation MathML
          • contentmathml - creates Content MathML
          • --openmath - creates OpenMath
          • --keepXMath - preserves XMath
      • conversion of graphics and picture environments to web format
          • Conversion of graphics (e.g., form the `graphic(s|x) packages' \includegraphics) can be enabled or disabled using --graphicsimages or --nographicsimages. Similarly, the conversion of picture environments can be controlled with --pictureimages or --inopictureimages
      • applying an XSLT stylesheet
        • If you wish to restyle the generated HTML either by adding CSS or by customizing the XSLT, change its functionality by adding JavaScript, or even generate an alternative output format with XSLT, some combination of the following options will be useful.
          • --nodefaultsources - Omits the default resources
          • --css=stylesheet.css - Adds a new CSS stylesheet
          • --javascript=program.js - Adds a JavaScript
    • The output format is determined by the file extension of the --destionation option or by the option --format. The recognized formats are:
      • html or html5
      • html4
      • xhtml
      • xml
  • latexmlc combines both latexml and latexmlpost into a single command, with some extra functionality
$ latexmlc doc.tex --dest=doc.html # Converts doc to simple HTML5 document
$ latexml --dest=doc.xml doc # converts TeX to XML
$ latexmlpost doc --dest=doc.html # converts XML to HTML

Architecture

The casual user needs only a superficial understanding of the architecture. The processing is broken into the following stages:

  • Digestion
  • Construction
  • Rewriting
  • Math Parsing
  • Serialization

Flow of Data Though LaTeXML's Digestive Track


Customization

The processing of the document, its conversion into xml and ultimately to XHTML or other formats can be customized in various ways, at different stages of processing and in different levels of complexity.. By far, the easiest way to customize the style of the output is by modifying the CSS.

Commands

  • latexml [options] texfile
    • Transforms / file into XML.
    • If texfile is '-', latexml reads the source from standard input. If texfile has an explicit extension of .bib, it is processed as a Bib bibliography.
  • latexmlpost [options] xmlfile
    • Postprocesses an xml file generated by latexml to perform common tasks, such as convert math to images and processing graphics inclusions for the web.
  • latexmlc
    • An omni-executable for LaTeXML, capable of stand-alone, socket-server and webservice conversion. Supports both core processing and post-processing.
    • Can be used to create ePub documents
  • latexmlmath [options] texmath
    • Transforms a TeX/LaTeX math expression into various formats
    • If texmath is '-', latexmlmath reads the from standard input. If any of the output files are '-', the result is printed on standard output.


Comments

You must be logged in to post a comment!

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


Insert Chart

ESC

View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language