D.1.5 The html Element

Parent: text
Contents: CDATA

The CDATA contains an HTML document. In some cases, the document starts with <html> and ends with </html>; in others the html element is implied. Generally the HTML includes a head element with a CSS stylesheet. The HTML body often begins with <BR>. The actual content ranges from trivial to simple: just discarding the CSS and tags yields readable results.

This element has the following attributes.

Required: lang

This always contains en in the corpus.