Orientation · Instruction

Well-formed Tags

Righteousness | Opening and Closing Tags | Empty Tags | Delimiters | Nesting | Case Insensitivity | Special Characters

In the section, "What is HTML?," I said that tags are the parts of speech of a web page. That was a slight simplification. Tags and the text they enclose are called elements and, strictly speaking, elements are the parts of speech of a web page. I mention this because there are two ways to look at constructing a web page:

  1. If you are taking an existing text page and turning it into HTML, you will probably put tags around each section of text, marking it up. The document already exists; you are carving it up into headlines and paragraphs and lists and (please don't do this at home, kids) blinks. From this perspective, tags predominate.

  2. If you are creating a brand-new web page without benefit of a prior document, you will probably create tags and enter text in the tags in close coordination (type a sentence and turn it into a headline element, then create a following paragraph tag and fill it in with another sentence). From this perspective, elements predominate.

Righteousness

In this page, I describe how to make well-formed tags. "Well-formed" is an actual technical term; if you want to really talk tech, you would describe the well-formedness of an element. Some of the rules that follow can be (and too frequently are) broken. Can you get away with it? Perhaps, for now. But, when a new version of a browser requires more strict well-formedness, as has happened on more than one occasion, those pages that used to work fine suddenly become a mess. For any experienced websmiths reading this, XML (a newer protocol that may or may not displace HTML) requires rigid well-formedness; my advice is to get disciplined now.

Opening and Closing Tags

One of the most fundamental of tags is the paragraph tag:

<P>This is good.</P>

In most cases, what we call a tag is actually a pair of tags: an opening tag (<P>) and a closing tag (</P>). In the example above, the words "This is good." are said to be enclosed by a paragraph tag, and the entire package is a paragraph element.

Empty Tags

Some tags do not enclose any text. For instance, the horizontal-rule tag (<HR>), which creates a line such as ...


... is an element unto itself. Such a tag is called an empty tag or empty element, and it does not require a closing tag.

If an element is not empty, it should have both an opening tag and a closing tag. In the case of the paragraph tag, this is the most frequently broken rule. Some people think you do not need </P> at the end of a paragraph. You can get away with this. For now.

Delimiters

A tag always starts with a less-than sign and ends with a greater-than sign (a.k.a. angle brackets), as in <P>. In tech talk, a tag is delimited by angle brackets.

A closing tag always has a virgule ("/" - a.k.a. oblique stroke or slash, but not backslash, which is "\") immediately following the <, as in </P>.

There are no spaces between < and the code term, or between < and /, or between </ and the code term, or between the code term and >.

Nesting

Tags must nest neatly. In other words, elements cannot overlap. Consider the phrase, "See Titanic!"

The proper way to code it is ...

<STRONG>See <CITE>Titanic!</CITE></STRONG>

... while non-well-formed code would be ...

<STRONG>See <CITE>Titanic!</STRONG></CITE>

The innermost element must start and end within the opening and closing tags of the outer element.

Also notice that — in general — an outer tag affects everything it encloses, including the contents of enclosed elements. In the example above, the <STRONG> tag makes both "See" and "Titanic!" bold, even though "Titanic!" is enclosed in a <CITE> tag, which makes its contents italic. This behavior is true only in general and not universally. There are important and useful exceptions.

Case Insensitivity

HTML tags are not case-sensitive. Thus, <CITE> and <cite> mean the same thing. However, if XML catches on, you will find that XML is strongly case sensitive, so my advice is to pick a preference, whether it be upper-case or lower-case, and stick with it.

Special Characters

Some of you who have been paying close attention may be wondering, "If angle brackets are tag delimiters, how is he getting those pointy brackets to show up in the browser?" Simple. What you see in your browser as "<" is not coded in the web page as < — rather, it is &lt; (and of course, I can get &lt; to show up because there's a special code for &).

HTML gives you two ways to display non-standard characters and characters that normally could not be rendered in the browser: numeric character references and character entity references.

There are hundreds of character references, one for every letter, upper and lower case (for those languages that have case) in most major alphabets, living and dead. If you really wanted to, you could code all the words of your content in character references. But you don't have to.

A few character references will be quite useful to you, which I've summarized in the table below. One that takes a bit of explaining is the non-breaking space (&nbsp;). When a browser renders HTML, it will usually break a long line of text at a space between words that will allow it to display as complete a line as possible. Sometimes, however, you want to keep a pair of words together, such as "Mrs. Peel," so that if the line breaks at that point, both words will stay together either at the end of one line or the beginning of the next. The way to do this is to not put in a regular space, but the non-breaking space: Mrs.&nbsp;Peel.

character entity reference numeric character reference rendered character entity reference numeric character reference rendered
&lt; &#60; < &gt; &#62; >
&nbsp; &#160; (non-breaking space) &amp; &#38; &
&ndash; &#8211; or &#150; &copy; &#169; ©
&mdash; &#8212; or &#151; &deg; &#176; °
&bull; &#8226; &middot; &#183; ·

A few of these characters (&#8211;, &#8212; and &#8226;) will render in current versions of Microsoft Internet Explorer, but not Netscape Navigator. For the dashes, &#150; and &#151; are more reliable.

There are many more possible character entities, including math symbols, Greek letters and diacritical marks. For the full rundown, see the W3C HTML 4.01 Recommendation, specifically www.w3.org/TR/html4/sgml/entities.html.

Other Topics

  1. What is HTML?
  2. Well-formed Tags
  3. Block Tags
  4. Lists
  5. In-line Tags
  6. Tag Attributes
  7. Links
  8. Images
  9. <FONT>
  10. Function Tags
  11. Tables
  12. Frames
  13. Cascading Style Sheets
  14. Platform-Independent Design
  15. User-Friendly Design
  16. Client-Friendly Design
  17. Starter Page
  18. Glossary
  19. Bestiary

 

Philadelphia
Yearly
Meeting
Home · What's New · Publications · Library · Calendar · Web Posting Policy
Local Friends Meetings · PYM Standing Committees · Site Map · Staff
Search www Search pym.org
Website Copyright © 1997-2008, PYM
Query the Webmanagers

Last modified: Wednesday, February 18, 2004 at 08:18 AM