The CDATA contains an HTML document. In some cases, the document
<html> and ends with
</html>; in others the
html element is implied. Generally the HTML includes a
head element with a CSS stylesheet. The HTML body often begins
<BR>. The actual content ranges from trivial to simple:
just discarding the CSS and tags yields readable results.
This element has the following attributes.
This always contains
en in the corpus.