The World Wide Web

Whilst this post is a simple introduction to the workings of the web. If you are reading this from an academic viewpoint I would recogmend first reading An Introduction to the Internet.

Packet Switching


Chop a big message up into little bits which are sent separately – not necessarily by the same route

Protocols
• Supply rules governing transfer and organisation of data between computers.
• Most protocols apply to three main phases
Connection setup

  • initiate connection between computers on network
  • creates virtual communication path

Data transfer

  • protocol allows data to be transferred
  • receiving computer decides whether to accept data based on handshaking

Connection release

  • protocol allows computers to cleanly terminate the connection

Protocol Layers

Layered approach principles:

  • each layer contains related functions
  • technology can be changed without affecting entire protocol (layer independence)
  • must provide for gradual growth of underlying network
  • hides details of underlying layers to improve maintenance

OSI Model (Open Systems Interconnection)
OSI reference model adopted by ISO in 1980s to standardize protocols. The Seven layer model:

  • Application
  • Presentation
  • Session
  • Transport
  • Network
  • Datalink
  • Physical

Web addresses: Internet Protocol

IP addresses are hierarchical numeric addresses:

  • makes management easier
  • makes routers simpler
  • not efficient

Current standard IPv.4, a 32 bit address

  • Gives 232 (4.3 billion) addresses
  • Not enough for the expanding web
  • Not designed for security, or current Internet size

Internet Protocol v.6
The next version is IPv.6, a 128 bit address. Written as 8 groups of 4 hexadecimal digits (2001:0db8:85a3:0000:0000:8a2e:0370:7334)

  • Gives 2128 addresses. Enough for approximately 295 addresses for every person on the planet
  • Needed for the “Internet of Things”
  • Good security features

Internet Names
Internet address are hierarchical – separated by a full stop, at the highest level:

  • US-centred domains: mil, edu, gov, com, org
  • extended with net, biz, …
  • national domains: uk, fr, de, ca, jp, …

Within national domains may be areas (in the UK: gov, ac, co, mil, org) then the organisation (microsoft, whitehouse, kelkoo, …) then any machine(s) or division(s) of the organisation.

Domain Names and Name Servers

Every Internet host has a name assigned by Internet registrars (Internet Network Information Center (www.internic.net))
DNS (Domain Name System) servers perform name to IP address mapping

  • nslookup is useful tool for exploring DNS services

Name servers hold records about services
There are 13 root name servers

  • ?.root-servers.net (where ? = A..M)
  • Lookups to the root servers are rare
  • Most lookups are handled by servers much closer to the requesting machine
  • new data regularly propagated to DNS servers

WWW Addresses
• Known as URLs – Uniform Resource Locators
• First element is the type of resource

  • http – hypertext transport protocol
  • ftp – file transport protocol
  • …followed by ://

• Next, an internet address:

Web servers

• Servers

  • Deal with requests for resources (pages, images, …)
  • Pages are organised into sites, related to organisations

• Web server software often on dedicated machine

  • a single machine may host many sites
  • a single site can be split over many servers
  • a site’s content may be mirrored in several places

• A firewall is software running on a server

  • restricts external access to organisation’s WWW resources
  • restricts access to resources outside organisation

Web server operation

Browsers
• Communicate page requests to servers
• Interpret user requests for pages (typed or selected)
• Check local store of recently accessed pages (cache)
• Request page or check page validity if page in cache
• Display web pages
• Call other applications to play movies, etc…
• Many other things…
• Browsers format and display WWW pages

  • Early leaders: NCSA Mosaic, Netscape
  • May be text-only: Lynx
  • Internet Explorer, Firefox, Safari, Opera, …

• Available for virtually any computer

  • All display web pages (almost) the same
  • Free
  • Money comes from added services

• Can provide user interface to virtually any information system

Client-server communication (simplified)

  1. URL is typed or selected: http://www.cmp.uea.ac.uk/index.jsp
  2. Browser asks DNS for IP address: http://www.cmp.uea.ac.uk
  3. DNS replies with address: 139.222.1.158
  4. Browser makes connection
  5. Browser sends GET request: GET index.jsp
  6. Server sends the file
  7. Browser displays text
  8. Browser then asks for each image referenced in index.jsp
  9. Browser displays images

HTTP (Hypertext Trasnsfer Protocol)
• Two main elements:

  • set of requests from browser to server
  • set of responses from server to browser

• GET request a page
• HEAD request a page’s header (to check modification)
• POST add data to a page (used for forms)
• PUT request server to store a page
• Server responds to all requests with a status

  • 200 OK
  • 304 Not modified
  • 404 Page not found
  • 403 Access forbidden

Mobile Internet
• 3G mobile phone licence sale raised £22bn for UK government in 2001
• Promise of effectively delivering Web to phones
• Need to present data for either:

  • low bandwidth devices like phones
  • high bandwidth devices like networked PCs

Major (non-technical) issues

• Who controls the protocol?

– Internet Engineering Task Force

• Who develops standards?

– W3C, MPEG, ISO

• Who issues addresses?

– private companies licensed by Internet Corporation for Assigned Names and
Numbers (ICANN)

• Who controls the content

– mostly limited by ISP terms and conditions

– increasing government regulation

• What is acceptable content and behaviour?

– depends on moral, ethical and political perspectives

Additional reading

• Berners-Lee T., Cailliau R., Luotonen A., Nielsen H. F., Secret A. (1994). The World Wide Web, Comm. ACM,
37(8), 76-82

• Web servers:  http://computer.howstuffworks.com/web-server1.htm!

• Personal view of early web developments http://wwwpdp.web.cern.ch/wwwpdp/ns/ben/TCPHIST.html !

• Berners-Lee T. (1999) Weaving the Web, Orion, 1999

• Tanenbaum A. (2006) Computer Networks, Prentice Hall – first two chapters




Computing Prehistory

The following post is (very) simplified post about computing prehistory and the people that influenced it. Given the time I hope to expand on this greatly in the future (either in this post or another). If anyone would anything expanded upon I will priorities requests.

History of algorithms
The history of algorithms can be traced back to:

  • The history of mathematics (logic)
  • The history of philosophy (thought)
  • The history of machines (mechanics and electronics)

Ancient History:

Aristotle (384BC – 322BC)
Aristotle founded the Lyceum (a category of educational institution defined within the education system of many countries). A tutor to Alexander the Great he was Polymathic (studied everything from ethics to poetry)

Aristotle died in 322BC having lost favour with the masses. Today it is difficult to interpret many of his texts as nobody today speaks ancient Greek and we are not always clear what    he meant. (e.g. the Greeks distinguished between number and length).

Pioneered devised deduction: X results from Y and Z if it is impossible for X to be false when Y and Z are true
Introduced syllogisms and induction (an argument that moves from the particular to the general)
Important concepts from the Greeks

  • Certain types of arguments are commonplace
  • Commonplace arguments can be generalized and solved.
  • Issues not tackled/understood by the Greeks include:
    • arguments can be represented using symbols
    • there is an algebra associated with such symbols
    • not everything is true or false (probability)

William of Ockham (1288-1347)

  • Examined what later became de-Morgan’s laws
  • Pre-dates the algebraisation of logic
  • More famously known for Ockham’s Razor (the more assumptions made, the less unlikely/probable the explanation).
  • Fled to France and Bavaria after a dispute with the Pope


René Descartes (1596 – 1650)

  • Famous for cogito ergo sum (I think, therefore I am)
  • But is also ackowledged as the first person to use algebra to represent logical quantities
  • Trained as a lawyer in France
  • Became a mercenary in the Netherlands
  • Censored some of his work to avoid falling out with the Church

George Boole (1815 – 1864)

• English mathematician and philosopher
• Inventor of Boolean logic (based around the truth of compound statements built from primitives which may be either true or false)
• One of the founders of computer science
• suggested, among other things, the use of symbols for logical concepts

Gottlob Frege (1848 – 1925)

  • The “father of logic”
  • Rather unusual notation (which led to his work being ignored)
  • Introduced (probably) the idea of predicates

Bertrand Russel (1872 – 1970)

  • Co-authored, with Alfred North Whitehead, Principia Mathematica (an attempt to derive all mathematics from a consistent set of axioms)
  • Became disillusioned with mathematics when he discovered that it was not as consistent as he hoped
  • Campaigner for votes for women, peace and gay rights.
  • Sacked from a Professorships after being accused of being “morally unfit”
  • Dismissed from Trinity College for anti-war protests

Kurt Gödel (1906 – 1978)

  • Developed the incredibly important and deep incompleteness theorem: (For any system that is powerful enough describe natural numbers: If the system is consistent then is cannot be complete & the consistency of the axioms cannot be proven within the system)
  • A friend of Einstein he became paranoid and starved himself to death

Technology:
Fortunately, several key people were ignorant of Gödel and Russell so they started building machines before realising that there may be problems. The key development needed for computers was electricity but before electricity there were several important developments…

Abacus
The Abacus, used by Romans, Babylonians, Japanese, Chinese and just about every other advanced civilisation allows for rapid addition, subtraction, multiplication and division

Pascal calculator

  • Designed by Blaise Pascal around 1652
  • Designed to ease the burdensome tasks of tax inspectors
  • Subtraction achieved using complement arithmetic

Bouchon loom

  • Instructions are punched in a roll of paper tape
  • Allows intricate weaving with fewer human operators
  • Hence the Jacquard loom and the luddites

Konrad Zuse (1910 – 1995)
Developped the first working programmable electronic computer, the Z3. The Z3 used the binary system and had 2000 relays, a clock speed of 5-10Hz and a word size of 22 bits.
Unfortunately the Z3 was destroyed during the second world war.

Alan Turing (1912 – 1954)

•Unified logic, electronics and introduced the new subject of Artificial Intelligence
•Developed the “Turing machine” to study the logic of universal computation.
•Also worked on computability and other fundamental things
•Educated at Sherborne and Cambridge
•Worked at Bletchley Park during the Second World War
•Troubled personal life led him to commit suicide

Additional Reading

• Mathematical perspective:

– “Evolution of Mathematical Concepts”, R.L.Wilder, OU Press
– “The makers of mathematics”, Stuart Hollingdale, Penguin
– “The development of logic”, W.Kneale and M.Kneale, Oxford University Press
– “Language, proof and logic”, Barwise and Etchemendy, Seven Bridges Press

• Computing history:

– “Alan Turing: the enigma of intelligence”, Andrew Hodges, Vintage.
– “The Universal computer: from Leibnitz to Turing”, Martin Davies, W W Norton and Co

• Other:

– “Gödel, Escher, Bach: an eternal golden braid”, Douglas R. Hofstadter, Penguin.
– “The New Turing Omnibus”, A.K.Dewdney, Computer Science Press
– “The Code Book”, Simon Singh, Fourth Estate.


An Introduction to HTML

Hypertext Markup Language (HTML)
HTML was designed to be content-oriented, but lax standards made (or still make) it difficult to precisely specify meaning or display. Using a simple tag structure there is no clean separation of display features from content.

HTML display:

  • Precise layout, use of fonts, … depends on browser
  • Can make a browser display text in a particular way…
  • Often looks very different on another machine or browser
  • Focus on content-based features – leave display to CSS

Tag syntax

HTML tags have:

  • name
  • optional attributes
  • enclosed in angle brackets <xxx>
  • attributes may have values (e.g. width =”6″, name=”coffee”)
  • tags may be nested
  • most need an end </xxx>
  • tag names are case insensitive
  • attribute values are case sensitive
  • Comments <!–    –>

Document Structure Tags

  • DOCTYPE: defines the HTML version used for the document (we’ll normally use XHTML 1.0 Strict)
  • html: defines the start and end of the HTML document
  • head: defines the non-printing section of the document (Used for embedding script, meta-tags, titles, styles, etc.)
  • title: defines what is displayed in the browser title bar
  • body: defines the printing section of the document
  • h1 indicates that the enclosed text is a first level heading
  • p indicates that the enclosed text is a paragraph

Example XHTML

<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”&gt;
<html xmlns=”http://www.w3.org/1999/xhtml&#8221; xml:lang=”en” lang=”en”>

<head>
<title>Example Page</title>
</head>
<body>
<h1>Hello World</h1>
<!– main content here –>
</body>
</html>

Text Formatting

<h1>…<h6> heading styles <hr> horizontal rule
<br /> line break <p> paragraph
<em> emphasised text <strong> strong text
<i> italic text <b> bold text

<html>
<head>
<title>CMPC1M01 Computing Systems I</title>
</head>
<body>
<h1>Hello World!</h1>
We will focus on text formatting tags such as:
<p>
<strong>strong</strong> (or <b>bold</b>) <br />
<em>emphasised</em> (or <i>italics</i>)
</p>
Older tags tend to specify how to do the formatting, but it is better to specify the desired effect.
<hr />
</body>
</html>

A note on formatting HTML

HTML is a mark-up language. Formatting of the document is based on the tags in the HTML document or CSS stylesheet.

White space and other formatting commands (newlines, tabs, etc.) are ignored. The only way around this is to use the <pre> tag which produces verbatim, but inelegant, results.

Images
Images can be used within web documents using tag <img>. <img> links to images which appear embedded within the document.

<img> Attributes:

  • src is a reference to the image file
  • height and width can be used to change the image dimensions
  • alt is used to show text for the image
  • border indicates whether an image border is used

An image map can be created to define “hot-spots” within an Image

<img src=”shaun.gif” alt=”SHAUN!”> <br />
<img src=”missing.jpg” alt=”Missing Image File”> <br />
<img src=”shaun.gif” width=”100″ alt=”SHAUN!”> <br />
<img src=”shaun.gif” width=”100″ height=”100″ alt=”SHAUN!”><br />

Common Image Formats

GIF:

  • usually for logos and artwork with up to 256 colours
  • file size is typically small
  • one colour can be set as transparent
  • can be used for animations
  • PNG is a common replacement for GIF

JPEG:

  • used for photo quality images
  • can contain millions of colours
  • File sizes are typically larger
  • can be controlled by file compression
  • the higher the compression, the lower the image quality

Image size

• High quality images are large, sometimes several megabytes in size. An image can be scaled to fit, but:

  • Some browsers may complain or not display it correctly
  • The image may be distorted if the resizing doesn’t respect the original proportions
  • Large images may take a long time to load (esp. over mobile connections)
  • Will a thumbnail do? with an option to see a high resolution version?

Hyperlinks (Anchor tags)

The hyperlink, or anchor, tag <a> is the most important tag in HTML. It allows the browser to jump to absolute or relative URLs including:

  • Pages in other sites (use absolute URLs)
  • Other web pages in the same site (use relative URLs)
  • Other locations in the same page (bookmarks)
  • Documents and files (images, PDFs, etc.)
  • Email addresses
  • Tag and attributes

The href attribute is the hyperlink reference. Content between <a> and </a> is the sensitive part of the document, the target attribute can be used to open a link in a new window.

<a name=”links”><h1>Useful Links</h1></a>
<a href=”http://www.google.com”>Google Home Page</a><br>
<a href=”http://www.w3c.org&#8221; target=”_blank”>W3C</a><br>
Notes from a lecture <a href=”lecture2.pdf”><img border=”0” src=”pdficon.gif”></a><br />
<a href=”l2_hyperlinks.htm#bookmark3″>The third bookmark on this page</a>

Lists

• There are three types of list available in HTML

  • ordered (numbered) lists, <ol>
  • unordered (bullet point) lists, <ul>
  • definition lists, <dl>

• Within these outer tags, items are declared using:<

  • <li> for ordered and unordered lists,
  • <dt> and <dd> for definition list terms and definitions

Entities and Colours
Entities, sometimes called escape sequences, allow special characters to be inserted, these are needed for formatting and character set limitations

&lt; < &gt; >
&pound; £ &amp; &
&quot; &nbsp Non-breaking space

Colours can be from a set of standard names, or RGB values

white black red blue green
#FFFFFF #000000 #FF0000 #00FF00 #0000FF
lightred lightblue lightgreen grey50
#550000 #005500 #000055 #7F7F7F

Additional reading


An Introduction to the Internet

“The internet is the most disruptive technology in history, even more than something like electricity, because it replaces scarcity  with abundance, so that any business built on scarcity is completely upturned when it arrives there.”

Eric Schmidt (Google CEO), 2 July 2010

The Internet is growing explosively with users doubling every four years. IPv4 has approximately 700 million hosts and is fast approaching its capacity.

Internet users: 10 year growth

Source: ITU

Mobile internet
Most Mobile mobile phones  today have more computing power than the Apollo moon landing programme. In fact, a Furby has more power than the moon landing. Even throughout the developing world mobile phones are popular and with 90% of world’s population in range of mobile signal (2009 compared to 61% in 2003) it is no surprise they have become a commodity in modern day life.

A little history

In the beginning computers did not communicate with each other, they acted as stand-alone machines used for performing very specific tasks. In the 1960’s scientists began experiments with using acoustic couplers to connect terminals to these computers before replacing these with modems in the 1980s. Bandwidth increased in stages as encoding and error correction techniques were  improved. Soon after higher bandwidth connections via dedicated lines and ADSL were increasingly available to businesses (from 64 kbit/sec).

Digital exchanges have allowed the widespread use of broadband in the last decade.

Protocols
Making one computer understand what another was saying used to be very difficult. Each manufacturer used their own protocols and different implementations of the same protocol were often incompatible. This led to the implementation of general acceptance of some protocols, some of which (such as TCP/IP) have become universal. With standards becoming widely implemented computers could be connected together into wide area networks (WANs). By using briges to connect these networks a world world network was born.

Early History of the Internet
In the 1960’s and 70’s computer networking was driven by military and academic research requirements. Computers were too large to transport and too expensive to reproduce but they still needed to be shared. Researched used to connections between these computers to develop simple ways of sending messages through the networks (such as email).

At the intital stages of the internets conception, most WAN’s were government funded project such as ARPANET by DoD ARPA in the USA and JANET by the UK Treasury.

By the mid 80’s most academic scientists connected by email on national networks and the National Science Foundation had just added five supercomputing centres (NSFNET) creating a high-capacity backbone that was needed to support development of the Internet. By this time, everyone was using TCP/IP.

The World Wide Web
Developed (mostly) by Tim Berners-Lee & Robert Caillou in Geneva (1989) the World Wide Web (WWW) began as a way for physicists to exchange information. Basic elements of the WWW included:

  • An addressing system
  • content (text, images, tables, etc…)
  • ways of retrieving and/or displaying information

The World Wide Web Commission (W3C) develops standards, reference implementations, tools, guidance, etc…relating to the Web.

Additional reading
• Cerf V.(2007) An Information Avalanche, IEEE Computer, 40(1), 104-105
• Berners-Lee T. (1999) Weaving the Web, Orion
• Tanenbaum A. (2002) Computer Networks, Prentice Hall (chapters 1-2)