This class can solve many problems coming from user generated html content or to fix html content before making some hard work with your bots! (It’s specially usefull for web sites without the Html Tidy module of PHP).
Hre is a quick list of the magic things it can do.
- delete closed tags without their opening tag
- fix open tag without close, closing them automatically
- check bad nesting and fix them (if you have a bold inside a bold… or a paragrah that contains a table…)
- fix bad quotes in attributes (open quotes where missing…)
- merge different styles attributes in the same tag
- remove html comments
- remove empty tags and more bad tags
It works ina complex way since it analyzes the html code char by char and search for tags. When a tag is found start the work of cleaning attributes, then store data found in a matrix and search for the closing tags.
The data saved in the matrix are later used to re-build the correct fixed html.
It’s very simple to use, suppose you have a variable with the dirty html:
$a = new HtmlFixer(); $clean = $a->getFixedHtml($dirty_html);