Theory 
NSW Schools
Maths
Science
Documentation
Efficiency
Media
Databases
E-commerce
Reporting
Reviews
Data
Simulation
Maths
Computing
Encoding
 

Printer Version of Page

Theory

HTML 2: Text processing, Characters & Fonts

  • MarkUp Languages
  • Text processing
    • Character Set
    • Character Encoding
    • ASCII
    • Extended ASCII & ISO-Latin 1
    • EBCDIC
    • Windows 1252
    • Unicode
  • Writing HTML Using ASCII
  • Fonts & Typefaces
  • References

MarkUp Languages

Word processors use formatting languages such as Rich Text Format (rtf), Microsoft Word (doc) and Lotus Word Pro (lwp), to precisely specify the format or layout of text. On the other hand, markup languages are concerned with the structure of text based information rather than the precise layout of individual characters.

The idea behind MarkUp languages is to establish an information management framework which structures data so that software produced by various vendors can present and manipulate information across a range of platforms and application packages.

HTML is a mixture between a formatting and a MarkUp language. HTML tags may represent structural entities such as:

  • <P> - Paragraphs
  • <UL> - Unordered Lists
  • <OL> - Ordered Lists
  • <LI> - List Elements

or specify specify specific layout characteristics:

  • <FONT color="red"> - Renders text in red color
  • <SUP> - Renders text superscripted
  • <I> - Renders text in italics

Text processing

Text processing refers to the ability to manipulate words, lines, and pages. Typically, the term text refers to text stored as ASCII codes (that is, without any formatting). Objects that are not text include graphics, numbers (if they're not stored as ASCII characters), and program code. (http://www.webopedia.com/TERM/T/text.html)

Character Set

All text consists of characters. The set of legal document characters is referred to as the character set.

  • An Alphabetic character set contains only letters of the alphabet and spaces
  • An Alphanumeric character set contains both letters of the alphabet the numbers 0 to 9 and spaces.

Character Encoding

The set of legal document characters together with their representation at the binary level is referred to as the character encoding

ASCII

ASCII (American Standard Code for information interchange) has 128 characters (27 = 128). This is just enough characters for all the English upper and lower case letters, digits 0-9, some special characters and other control characters such as line breaks.

In ASCII the English characters are represented as numbers, with each letter assigned a number from 0 to 127. Most computers use ASCII codes to represent text, which makes it possible to transfer data from one computer to another. (http://www.webopedia.com/TERM/T/text.html)

Text files stored in ASCII format are sometimes called ASCII files. Text editors and word processors are usually capable of storing data in ASCII format, although ASCII format is not always the default storage format. Most data files, particularly if they contain numeric data, are not stored in ASCII format. Executable programs are never stored in ASCII format. (http://www.webopedia.com/TERM/T/text.html)

Extended ASCII & ISO-Latin 1

There are several larger character sets that use 8 bits, which gives them 128 additional characters (28 = 256). The extra characters are used to represent non-English characters, graphics symbols, and mathematical symbols. Several companies and organizations have proposed extensions for these 128 characters. The DOS operating system uses a superset of ASCII called extended ASCII or high ASCII. A more universal standard is the ISO Latin 1 set of characters, which is used by many operating systems, as well as Web browsers. (http://www.webopedia.com/TERM/T/text.html)

A comprehensive list, which includes the extended ASCII character set, can be found at: http://www.webopedia.com/quick_ref/asciicode.asp

EBCDIC

The Extended Bindary Coded Decimal Interchange Code (EBCDIC) is used by some IBM mainframes. EBCDIC provides for 256 characters.

Windows 1252

Windows 1252 provides a basic character encoding for Microsoft Windows.

Unicode

Non English languages such as Chinese, Japanese and Korean can NOT be adequately represented using only 256 characters, so a different character encoding was developed. Unicode includes ASCII as a subset but also caters for many thousands of additional characters. Unicode can represent 64000+ characters. There is even a proposal to get Unicode to support the full set of Star Trek characters!

Writing HTML Using ASCII

You can get a web browser to write HTML special characters and all the plain English text by calling the decimal ASCII value using an escape sequence. I can write

<P>Bryan Hall!</P>

in a paragraph with the space and the exclamation mark like this.

<P>&#066;&#114;&#121;&#097;&#110;&#032;&#072;&#097;&#108;&#108;&#033;</P>

Paste the code into an HTML file and try it.

Task ASCII

Make a copy of a blank HTML page and rename it char_fonts.htm Put in a level 1 heading and title:

Title and H1: HTML Characters and Fonts

followed by a level 2 heading:

H2: Using ASCII Characters

then a paragraph in which you write your name using the ASCII characters

<P>Write your 1st name and surname here in ASCII also use the ASCII sequence for the space</P>

You can use the ASCII escape sequences shown at: http://www.webopedia.com/quick_ref/asciicode.asp

Fonts & Typefaces

In recent times the term font has been used to describe a type-face, which is a prescriptive definition of how to present the various characters available in any particular character set.

Characters conforming to any given typeface can be presented in a range of:

  • sizes (example 8 pt, 10 pt, 12 pt, ),
  • type style (italics, underline, superscript, subscript, ), and
  • stroke weights (normal, bold)

There are two main groups of typefaces (Computing Studies - GK Powers p. 273)

  • Serif typefaces have little tails or serifs at the end of their characters (examples include Times and Bookman).
  • Sans serif typefaces have characters going straight up and down (examples include Avant Garde and Helvetica).

Task Fonts Part B

Open your HTML file char_fonts.htm and after your name in ASCII put another level 2 heading:

H2: Fonts in HTML

Now make a table having 26 normal rows (plus a header row). Each row must represent one letter of the alphabet as shown.
 
Base Font Times Arial Wingdings Windings Symbol Bookman
Old Style
a A a A a A a A a A a A a A
b B b B b B b B b B b B b B
c B c C c C c C c C c C c C
... ... ... ... ... ... ... ... ... ... ... ... ... ...

  1. It is important that you see the characters corresponding to each of the fonts
  2. Do not attempt to manually code this page up
  3. Once you make row A the following rows should be made by using a carefully selected Case Sensitive Find and Replace

The body of the document should also contain the following notes:

Document Notes:

  1. On most machines the default font is times
  2. windings is a deliberate misspelling of wingdings. There is no such font as windings and consequently the Internet browser cannot render it since it is unable to find the font installed on the machine. In this case the Internet browser just inserts whatever its default font is. So be mindful of this fact if you use unusual fonts in your pages - they may not render on all computers and the user will not know! Automatic font substitution can also happen with word processors.

The following code was used to render the first two rows in the above table

<TABLE BORDER="1" CELLPADDING="5">

<TR VALIGN="TOP">
<TH COLSPAN="2">Base Font</TH>
<TH COLSPAN="2">Times</TH>
<TH COLSPAN="2">Arial</TH>
<TH COLSPAN="2">Wingdings</TH>
<TH COLSPAN="2">Windings</TH>
<TH COLSPAN="2">Symbol</TH>
<TH COLSPAN="2">Bookman<br>Old Style</TH>
</TR>

<TR>
<TD>a</TD>
<TD>A</TD>
<TD><font face="times">a</font></TD>
<TD><font face="times">A</font></TD>
<TD><font face="arial">a</font></TD>
<TD><font face="arial">A</font></TD>
<TD><font face="wingdings">a</font></TD>
<TD><font face="wingdings">A</font></TD>
<TD><font face="windings">a</font></TD>
<TD><font face="windings">A</font></TD>
<TD><font face="symbol">a</font></TD>
<TD><font face="symbol">A</font></TD>
<TD><font face="Bookman Old Style">a</font></TD>
<TD><font face="Bookman Old Style">A</font></TD>
</TR>

</TABLE><BR CLEAR="ALL">

Once you have finished the alphabet also do 5 extra rows for the numbers 1,2 & 3,4 & 5,6 & 7,8 & 9,0.
 
Base Font Times Arial Wingdings Windings Symbol Bookman
Old Style
1 2 1 2 1 2 1 2 1 2 1 2 1 2

Task Fonts Part C

In the html file char_fonts.htm directly underneath the above table make a level 2 heading:

H2: Text Styles

and then enter the following text:

<P>There are over 20 tags which determine Text style. These can be classified as either Logical or Physical styles. Logical styles are concerned purely with the purpose of the style, while physical styles specify the way the text is meant to look.</P>

You must also add to the file all of the logical and physical tags shown below. The word describing the style should be formatted in the relevant style. For example, you would code line 1 like this:

<LI>&lt;EM&gt;<EM>Emphasis</EM></LI>

H3: Logical Tags

  • <EM> Emphasis
  • <STRONG> Strong
  • <VAR> Variable
  • <DFN> Definition
  • <CITE> Citation
  • <ADDRESS>
    Address
  • <CODE> Code
  • <SAMP> Sample Computer Output
  • <KBD> Sample Keyboard Input

H3: Physical Tags

  • <I> Italics
  • <B> Bold
  • <U> Underline
  • <TT> Monospace typewriter
  • <STRIKE> Strike through text
  • <BIG> Big font
  • <SMALL> Small font
  • <SUB> Subscript font
  • <SUP> Superscript font

References

http://www.webopedia.com/

The XML Handbook. CF Goldfarb, P Prescod P. Prentice Hall (2001). ISBN 0 13 055068-X

Computing Studies and Introductory Course. GK Powers. Heinemann (1996). ISBN: 0 85859 924 4 Inline Image - Go To www.precisioninfo.com