HTML fixer

THIS PROJECT IS NO LONGER MANTAINED, YOU CAN USE THE SOURCE CODE IN YOUR PROJECT AS YOU WANT

LAST UPDATE: 06/07/2010
STOP bad html inserted by your clients or by the users of your community!

This PHP class lets you clean and repair html code. Here is a quick list of the magic things it can do (it’s really good when you don’t have the possibility to install the Html Tidy module of PHP).

WHAT IT DOES:

  1. delete closed tags without their opening tag
  2. fix open tag without close, closing them automatically
  3. check bad nesting and fix them (if you have a bold inside a bold… or a paragrah that contains a table…)
  4. fix bad quotes in attributes (open quotes where missing…)
  5. merge different styles attributes in the same tag
  6. remove html comments
  7. remove empty tags and more bad tags

How does it works?
it’s a bit complex to explain, it analyzes char by char the html code, detecting nodes, watching inside each node to fix quotes, attributes, and more and finding their closing tags. Save every node found and it’s inner content in a matrix.
And then it reads the matrix to re-build the fixed html.
The matrix stores open tags, closed tags and content and lets count the errors.

Watch a demo of the HTML FIXER CLASS with debug.
Watch a demo of the HTML FIXER CLASS with a textarea to insert dirty html code.
Download the class and the example.

HISTORY
NEW. version 2.05 date 06/07/2010
bug fixed on quotes by emmanuel (at) evobilis.com

version 2.04
added css style filter by Martin Vool

version 2.03
strips php code.

version 2.02
fixed a bug with non closing quotes.