How the Internet works

Basic Networking Terms

To understand how the internet works, it will be useful to understand the following basic networking terms:

Network
A group of interconnected computers which share information and resources.
Local area network (LAN)
A network of computers connected together in close proximity, such as a building, or a university campus.
Internetwork
A group of LANs, and/or standalone computers, all connected together over a wide geographical area.

The Internet is the ultimate internetwork, to which any computer or LAN in the world can connect.

How Communication Occurs

The internet works using the TCP/IP Protocol. (A protocol is simply a set of rules which govern how communication occurs over the network). To individually identify each computer on the internet, every connected machine is given a unique IP Address, which looks something like 203.45.117.14. In order to communicate with another computer on the internet, you need to know its IP address.

When a computer wants to send some information to another, TCP breaks that information down into packets, or small chunks of data. IP then takes the IP address of the destination computer, and physically sends the data across the internet. When sending to a computer on the same LAN, every computer on that network receives the data. Usually, only the computer who is the intended recipient accepts and processes the data, but by entering "promiscuous mode" any computer on that network could intercept the data. This can be a security risk.

When sending information to a computer not on the same LAN, hardware/software devices called routers are used to determine the physical path of the data through all the interlinked networks, between the sending and the receiving computers. Once the destination computer has received the packets, TCP reassembles them into their original form, and checks for any transmission errors.

Because raw IP addresses are cryptic and difficult to remember, the Domain Name Service (DNS) allows you to use hostnames (e.g. microsoft.com), which are translated into IP addresses by DNS servers. The allocation of domain names and IP addresses on the internet is handled by a North American company called InterNIC.

What Happens When You Visit a Website?

In order to understand what happens when you visit a website, it is necessary to understand the concept of client/server communications:

Client
A client is simply a computer or piece of software which requests a service from a server. For instance, when you visited this page, your web browser requested this HTML document from the web server it is stored on.
Server
As the name suggests, a server is a computer or piece of software which responds to client requests, returning to them the HTML documents, images, or programs which they asked for.

Web sites are simply collections of HTML documents, images, CGI programs, Java applets, multimedia presentations and so on, which are stored together on a web server computer (although all parts of a site do not necessarily have to be stored on the same physical machine). When you visit a website, your web browser contacts this server, and asks it to send the relevant documents.

Let's take a look at the detailed steps which take place when you visit this site:

  1. You type the URL http://www.paulgriffiths.net/ into your browser
  2. Your web browser contacts your ISP's DNS server, which translates the domain name (paulgriffiths.net) into a unique IP address, and returns that address to your browser.
  3. Your web browser uses that IP address to contact the web server that this site is stored on, and sends it the URL you initially typed in.
  4. The web server receives your browser's request, interprets the URL it received, and figures out that you want the home page of my site.
  5. The web server finds "index.html" (the filename of my homepage) on its hard drive, and sends it back to your web browser.
  6. Your web browser parses the HTML file it has just received, and realises that there are several images on the home page which it needs to display.
  7. Your web browser contacts the web server again, and asks it for each of the images it needs. The web server receives these requests, and returns each of the images to your browser.
  8. Once your web browser has all the images etc that it needs, it assembles the pieces together on the page, and displays it on your screen.

When you click on a hyperlink on my home page, the whole process is repeated for the new page.

What is HTML?

HTML stands for HyperText Markup Language. Despite the name, it is not a computer programming language. It consists of markup tags which tell the web browser what the elements of your page are, which enables it to determine how to display them.

For example, let's assume that you want to display Welcome to my Homepage! at the top of your main page, along with some introductory information. You could start by simply typing the following into the body of the HTML document:

Welcome to my Homepage!
This is where you can find out all sorts of things about me and some of my interests, as well as find links to other related pages on the internet.
Please use the links below to access more pages:
More about me
My interests
Links to other web sites
Thanks for stopping by!

If we open this with a web browser, we will get the following (probably unexpected) result:

[Screen Shot showing HTML output with no markup]

The formatting of our original code was not exactly sophisticated, but the browser hasn't even got that right. We have just one block of text - all the lines merge together into one paragaph.

The browser does this because it formats web pages according to their structure, that is, according to what the elements of the page actually represent, such as headings, paragraphs, lists, images and so on. At this point, aside from the new lines (which, aside from converting to a space character, the browser ignores) the browser has no way of determining what the elements in our page are, because we have not told it. Computers, whilst being powerful, are not smart, and if we don't tell the computer what to do, it will not do it right.

So, how do we go about telling the computer how to do this? As the name HTML suggests, we have to mark up the text in our document to tell the browser what the individual elements are. We do this using HTML tags. Most HTML tags enclose a block of text (some tags, such as those for inserting images, are standalone tags and do not enclose text) to mark it up. For instance, the h1 tag marks up the page's main heading. We place <h1> in front of the text of our heading and </h1> after the text of our heading. The p tag does the same thing with paragraphs.

Let's mark up our text into a main heading and six paragraphs, like this:

<h1>Welcome to my Homepage!</h1>
<p>This is where you can find out all sorts of things about me and some of my interests, as well as find links to other related pages on the internet.</p>
<p>Please use the links below to access more pages:</p>
<p>More about me</p>
<p>My interests</p>
<p>Links to other web sites</p>
<p>Thanks for stopping by!</p>

This gives us the following result:

[Screen Shot showing results of markup with heading and paragraph tags]

which looks a bit better, but there is still improvement to be made.

We have a list of three other pages on the site, so let's mark it up as a list. To do this, we use the li tag, which means "list item", around each of the three items. Also, we need to mark up the entire list with the ul tag, meaning "unordered list". Our code now looks like this:

<h1>Welcome to my Homepage!</h1>
<p>This is where you can find out all sorts of things about me and some of my interests, as well as find links to other related pages on the internet.</p>
<p>Please use the links below to access more pages:</p>
<ul>
<li>More about me</li>
<li>My interests</li>
<li>Links to other web sites</li>
</ul>
<p>Thanks for stopping by!</p>

We have put the opening and closing ul tags on separate lines to make the code look cleaner, and easier to follow, but it is not necessary to do this. The entire HTML document can be coded as a single line, if you like (although I do not recommend this!).

Now let's see what we have:

[Screen Shot showing the addition of marked up lists]

This is starting to look much better. However, we have a problem. We have provided references to three other pages on the web site, but we have given the user no way of viewing them! This bring us to the heart of the world wide web, the hyperlink. For reasons of convention, these are marked up using the a tag, or "anchor" tag. The a tag is the first one we encounter which uses an attribute to provide the browser with more information, in this case the URL of the files we are linking to. We use the href attribute as follows:

<h1>Welcome to my Homepage!</h1>
<p>This is where you can find out all sorts of things about me and some of my interests, as well as find links to other related pages on the internet.</p>
<p>Please use the links below to access more pages:</p>
<ul>
<li><a href="more.html">More about me</a></li>
<li><a href="int.html">My interests</a></li>
<li><a href="links.html">Links to other web sites</a></li>
</ul>
<p>Thanks for stopping by!</p>

Now we have something much more recognisable:

[Screen Shot showing the addition of marked up hyperlinks]

Finally, some finishing touches. Although the above code works just fine, technically there is a structure to HTML documents that we should follow. The entire document should be enclosed with html tags. The body of the document - which we have been working on - should be enclosed in body tags. Finally, there is an optional header section which is enclosed with head tags. One very important piece of information contained in the header section is the page's title, which is marked up by - yes, you've got it! - title tags.

Our completed code now looks like this:

<html>

<head>
<title>My Homepage!</title>
</head>

<body>
<h1>Welcome to my Homepage!</h1>
<p>This is where you can find out all sorts of things about me and some of my interests, as well as find links to other related pages on the internet.</p>
<p>Please use the links below to access more pages:</p>
<ul>
<li><a href="more.html">More about me</a></li>
<li><a href="int.html">My interests</a></li>
<li><a href="links.html">Links to other web sites</a></li>
</ul>
<p>Thanks for stopping by!</p>
</body>

</html>

which looks like:

[Screen Shot showing complete HTML document without CSS]

See how our title "My Homepage!" now shows up in the main title bar of the browser window? This is standard behaviour on most graphical browsers. The page's title never actually shows up in the body of the document, and neither does anything else in the header section. The header section is there to provide information about the page.

What is CSS?

Although we left the previous section with a clean and functioning webpage, it doesn't look anything special. The formatting, whilst appropriate, is very basic, with none of the page borders, backgrounds, special fonts and colours we often see on web pages.

This is because HTML should be used only for marking up structure, and not for specifying presentation. Early implementations of HTML contained a variety of tags such as center, b (for bold text), i (for italic text), font (for setting font face, size and colour) and others to specify how the page should be presented. All these tags are now strongly discouraged, and are in fact deprecated (slated for removal) in the current HTML recommendation.

There are some very good reasons why HTML should specify structure, and not presentation, which include:

So how do we include presentation information? By using style sheets. A style sheet can be included within the body of an HTML document, but it is usually far better to create a separate document for it, and to link it to the HTML document. So, we will make one final change to our HTML document, in the header section:

<html>

<head>
<title>My Homepage!</title>
<link rel="stylesheet" href="style.css" type="text/css" />
</head>

<body>
<h1>Welcome to my Homepage!</h1>
<p>This is where you can find out all sorts of things about me and some of my interests, as well as find links to other related pages on the internet.</p>
<p>Please use the links below to access more pages:</p>
<ul>
<li><a href="more.html">More about me</a></li>
<li><a href="int.html">My interests</a></li>
<li><a href="links.html">Links to other web sites</a></li>
</ul>
<p>Thanks for stopping by!</p>
</body>
</html>

We have used the link tag to specify that there is another file linked with this document, in this case, the style sheet. The link tag has three attributes, specifying the relationship of the linked file to this document, its location, and its type, respectively. Note that the link tag is a standalone tag - there is no closing </link> tag, because it does not mark up any text.

We'll create a separate file called style.css and enter the following:

body { background-color: #DDDDFF; margin-left: 10%; margin-right: 10%; border: 2px solid black; }
h1 { color: white; background-color: red; text-align: center; font-family: "Curlz MT"; border-bottom: 2px solid black; }
p { text-align: left; font-family: times; padding-left: 10px; padding-right: 10px; }
ul { background-color: white; margin-left: 10px; margin-right: 10px; border: 1px dotted black; }
li { font-family: sans-serif; list-style-type: circle; padding-top: 5px; padding-bottom: 5px; }

This specifies how each of the five elements we used should be formatted. In brief, we specify that:

Let's see what this looks like:

[Screen Shot showing complete HTML document with CSS style

This is, of course, a perfectly vile looking web page, but it does illustrate a little of what can be done with style sheets without having to even hint at presentation in the body of the HTML document. What's more, every page on your web site can use the same style sheet, so if you change the presentation on one page, it will change on them all. This can be very helpful when maintaining a large website. What's more, even if the user switches the style sheet off altogether, because we only used our HTML to mark up structure the page will still be perfectly readable on any type of hardware and/or software that recognises standard HTML.

Obviously, there is far more to both HTML and CSS than can be covered in this brief introduction. There are many good books and web sites available on the subject for those interested in exploring further.