Web Typesetting - Part 2: Basics

by Randy Finch

As I mentioned in my last column, Web pages are created in HTML (HyperText Markup Language). HTML files are in ASCII format so as to be totally portable across computer platforms. An HTML file contains the text that will appear on the Web page along with some special tags for formatting the text, referencing graphic files, and creating hypertext links to other documents. This is exactly the way AmigaGuide help files work. However, AmigaGuide uses tags that are different from HTML tags.

State of HTML

HTML is based on SGML (Standard Generalized Markup Language), a large document processing system. SGML describes the general structure of documents, not their actual layout as does a word processing file or a Postscript file. Therefore, SGML and HTML focus on the content of the document rather than its appearance. They assume that there are common elements in all documents such as titles, headings, paragraphs, etc. The codes embedded withing an HTML document identify these elements, but leave it up to the reader software (such as a Web browser) to decide how these elements should be displayed on the computer monitor. Thus, different Web browsers may display the same HTML document differently. It is a good idea to look at your HTML files with several different browsers to make sure they look acceptable.

The HTML specification is now at version 2.0. A much more robust version 3.0 is being discussed, but is not yet finalized. Some of the more advanced browsers already support features from this version despite the lack of finalization. If you want to be sure that most browsers can read your documents, only use the features of HTML 2.0. Some browsers support features that are not in any version of the HTML specifications (although they could be added at a later time). Netscape Navigator is the most notorious for this. Its additional features are know as Netscape extensions. Microsoft's Internet Explorer has upped the ante by adding its own custom extensions. Other browsers might support their own extensions. However, it takes a very popular browser for the extensions to ever be used.

State of Web Browsers

Netscape is by far the most popular browser and should be used for testing your HTML files. Microsoft's Internet Explorer is becoming popular since it is very powerful, yet free. NCSA Mosaic is the father of Web browsers and is a good one for testing. It is also free. AMosaic is the Amiga incarnation of NCSA Mosaic. There are also non-graphical browsers such as Lynx. You can usually find Lynx on the Internet Service Provider's (ISP) computer if it is Unix based. If you want to make sure your HTML page also looks good without any graphics, try Lynx or turn off graphics in your graphical browser if it supports this feature.

There are many other Web browsers on the market and most of the online services such as America Online, CompuServe, and Prodigy have browsers with their connect software. Many of these other browsers are variations of NCSA Mosaic. There is no way to test your pages on each of these because there are simply too many. So just pick two or three that you specifically want your Web pages to look good on and use them. It is a good idea to test your HTML files on a local drive before uploading them to your ISP's server.

To give you an idea of how different the same HTML file can look on different browsers, look at Figures 1, 2A, 2B, and 3. Figure 1 is my home page (http://fly.hiwaay.net/~rcfinch/) as displayed by Netscape under Windows 95 (256 colors). Figures 2A and 2B are the same file displayed by AMosaic under AmigaDOS 3.1 (16 colors). (See Rob Hayes' column in the March 1996 issue of AC for information about how AMosaic looks under different versions of AmigaDOS.) Finally, Figure 3 is my home page displayed by Lynx on my ISP's Unix server. Browsers are designed to ignore any coding that they cannot understand. This is nice in that you can include features for the more advanced browsers without worrying that you'll crash feature-challenged browsers. You should also be aware that some browsers are very lenient about improper HTML coding. If you only test your files with one of these browsers, you could be misled. It might display your page just fine, guessing correctly what you intended, whereas another stricter browser may balk. Let's now look at the HTML file that generated Figures 1-3.

HTML's Basic Tags

Listing 1 shows the complete code for my home page (it may have changed by the time you read this). It contains many of the features that you will use extensively when producing your Web pages. After perusing the listing, you will see that most, but not all, HTML tags have both a beginning and an end. Opening tags look like this: <TAG>. Closing tags look like this: </TAG>. An opening tag indicates that a particular element of the HTML file follows. The closing tag indicates that a particular element of the HTML code has ended. Opening tags can also have options, or attributes, associated with them such as <TAG attributes...>. Within any given opening and closing tags, only certain other tags are allowed.

All HTML files need to have an opening <HTML> tag and a closing </HTML> tag. These tags simply identify the entire enclosed text to be an HTML document. Some browsers don't care if these tags are included or not as long as the document contains valid HTML tags. However, it is a good idea to include these tags for the persnickety browsers. It also helps people viewing the document source to easily identify the file as HTML.

An HTML document contains two major sections: the header, enclosed by <HEAD> and </HEAD> tags, and the body, enclosed by <BODY> and </BODY> tags. The header allows several tags to reside inside. The most common is the title (<TITLE> ... </TITLE>). The text between the title tags is the official title of the document. Browsers typically display the title text in the title bar of the browser's window and use it in a menu when you add the document to your list of favorites. Other tags allowed in the header are less common and will be discussed as needed in future columns.

The body of an HTML document can contain many different tags for indicating the various elements of the text. Some of these are discussed below. The <BODY> tag also allows several different attributes. The one used in my home page specifies a background image (BACKGROUND="backgrounds/PaperRelief.gif") for the document to be displayed on. This is a proposed HTML 3.0 attribute and is not supported by some browsers. My home page uses a GIF image named PaperRelief.gif in a subdirectory (backgrounds) of the directory in which my home page document (home.html) resides. (Please note that Unix's file system is case-sensitive.) The image will appear as a tiled watermark in the document area of the browser and the document will be displayed over top of it. Be careful when using the background attribute. Certain color combinations can render the document unreadable. Other allowable attributes of the <BODY> tag are Netscape extensions and will only be usable by Netscape and other browsers that have implemented these non-standard extensions. All of these extensions deal with setting the color of the text, background, and hypertext links, within the document.

Text and Headings

Any free-floating text (text not surrounded by tags) within the body of an HTML document will appear in a font and size specific to each individual browser. Most browsers allow users to specify which font and size they want to use. Some text needs to be emphasized in some way within the document. One common method is to use larger and/or bolder text for headings. HTML supports six level of headings. Heading text is enclosed in the tags <Hn> and </Hn> where n=1, 2, 3, 4, 5, or 6. <H1> indicates a first level heading which is typically the largest and boldest. <H6> is the smallest heading text with <H2> through <H5> falling in between <H1> and <H6>. The exact way each of these headings appear is browser, or user, specific. I use <H1>, <H3>, and <H5> near the beginning of the body section of my home page document.

Breaks and Paragraphs

All formatting of text in an HTML document is accomplished using tags. Embedded carriage returns or line feeds in the HTML file are ignored (except in some special cases). By ignoring these control characters in the file, a browser can expand or contract the margins of the document based on the user-adjusted size of the document window. When you need to insert a line break in a document, the <BR> tag can be used. This causes the text after the tag to appear on a new line. My home page does not use this tag, but it can come in useful in many cases. The <BR> tag has one HTML 3.0 attribute, CLEAR="...", that directs the browser to move below any images next to the text before continuing. When text is a heading as described in the previous section, it is automatically separated from the surrounding text. No <BR> tag is needed.

You could use the <BR> tag to separate paragraphs, but there is a paragraph tag, <P>, available for this specific purpose. Most browsers will separate paragraphs with a blank line, but this is not guaranteed. If you look through my home page, you will notice that I use the paragraph tag in several places. HTML 3.0 adds an optional </P> for blocking off paragraph text in the standard HTML way. If the </P> is not used, a <P> produces an implied </P> since a new paragraph cannot be created without ending the previous one. The <P> tag has one HTML 3.0 attribute, ALIGN=CENTER, that directs the browser to center all the paragraph text.

Horizontal Rules

If you have done any Web browsing at all, you have probably seen pages that separate sections of text with a horizontal line. I use this feature in my home page. The <HR> tag is used for this purpose. Since the horizontal rule line resides on a line by itself, there is no need to insert a <BR> or <P> before or after it. The browser will automatically break the text at the <HR> tag. There are several Netscape attributes for the <HR> tag. They allow the thickness, width, alignment, and shading of the horizontal line to be specified.

Miscellaneous Tags

There are a couple of other tags I use in my home page that I would like to discuss before ending this month's column. These are the text centering, address, and font size tags.

You will find the <CENTER>...</CENTER> tags near the middle of Listing 1. The line looks like this:

<CENTER><H3>LINKS TO MY OTHER PAGES</H3></CENTER>

This takes the enclosed heading text and centers it on the page. Without the centering tags, the heading would be left-aligned. These tags are Netscape extensions and will not work in many browsers. Note how it looks in AMosaic (Figure 2B) versus Netscape (Figure 1).

The <ADDRESS>...</ADDRESS> tags enclose text that represents the address (usually E-mail) of someone. This is usually used for a signature by the Web page designer at the end of a page (see the end of Listing 1). At this time, browsers only format the text differently, usually by italicizing, but it could be used for other purposes in the future.

The <FONT SIZE=...>...</FONT> tags can be used to either set the actual size of the font or the relative size of the font. I use these tags in several places in the last half of Listing 1. In these cases, I increase the size of the font by one size step relative to the current font size by using the <FONT SIZE=+1>...</FONT> tags. A minus sign can be used instead of the plus sign to decrease the size of the font. If no plus or minus sign is used as in <FONT SIZE=3>...</FONT>, then the absolute size of the font is set. These tags are Netscape extentions and will be ignored by most browsers.

Closing Comments

Well, let's call it a month. Too much of this stuff will suck the brain out of your head faster than the Riddler's brain wave manipulation machine. Next month, I'll finish discussing my home page and the basics of HTML.


Back to list of articles
Back to Randy Finch's Home Page
Last modified on June 15, 1996
Randy Finch at webmaster@rcfinch.com