Working with our design partners to deliver advanced web-based solutions

Home page

 

 
18 April 2006:XML and the alphabet soup

In this issue

Featured article
This month my topic is a tricky one to make understandable.  But it is important so here goes ... XML - stay awake at the back.
Business intelligence
Reading the web - its F shaped
News from the web
This month in the internet world.



XML and the alphabet soup

You can’t go anywhere in the technology world these days without coming across XML (Extended Markup Language).  If you want to pick up a news feed, publish data from your client's annual report, or upload information from a spreadsheet, the odds are you will use XML. Any problem that involves moving information about is likely to have an XML solution.

Granted this is not the most gripping subject, but it is so important and comes up so much these days I thought I would take a shot at explaining the concept and put it into context.  Stick with it and one day you may thank me.

XML solves a number of different problems, all to do with organising information:

  • passing information between corporate computer systems (Jargon term: middleware)
  • uploading and downloading information from your web site (Jargon term: import/export)
  • writing web pages (Jargon term: XHTML)

The last is the most familiar.  The language of the web (HTML) is a corrupted XML language, but there is a version of HTML that is proper XML (XHTML).   

Integrating coporate computer systems

The size and complexity of the data processing systems in a major corporate is mind-boggling. No-one really understands the whole thing, and these systems often don’t talk to each other.  When I was IT manager at a major London broker, we had almost as many different technologies as we had IT staff.  None of the systems were properly integrated – it was just too difficult.  People have developed systems to transfer information called middleware, but what we needed was a standardised way of organising that information.

Uploading and downloading information from a web site

Any large ecommerce site has the problem of uploading the catalogue which can often be thousands of items, and downloading customer information and orders back to the warehousing and accounting systems.  Traditionally we have used comma separated values (CSV) files.  These are as the name suggests items of information separated by commas (or if you are on the continent by semi-colons).  This format is not very satisfactory, and what we needed was a much better way of passing information.

Enter Markup languages

When you type a document on your computer you will happily put passages in bold, or select italics, or change the font.  It probably doesn’t worry you that the computer storage medium can’t store information directly in this format.  The coding used by computers can’t cope with the hundreds of different fonts and layout variants.  The program has to do all sorts of jiggery-pokery to store the extra information not only the words you have typed, but how they are to be presented.  This is called markup information.

Some time in the 1970s some guys at IBM came up with a standardised way of storing this extra information.  In the 90s this begat HTML (Hyper Text Markup Language) – the language used to store web pages.  As the XML language became more standardised it became necessary to bring HTML into line, which begat XHTML.  For those of you not familiar with it, here is some XHTML.


<p>This is a paragraph,
<b>this is in bold</b><br />
this is on the next line,<br /> 
and the para ends here</p>

So the markup information is indicated by thingies inside angle brackets called tags.  The thingies are of two types:

  • containers  or wrappers where there is a <CODE …>some stuff and </CODE>.
  • stand-alone elements <CODE ….. />

Enter XML

XML stands for Extended Markup Language. The title suggests it is something to do with marking up documents – but not in the least. 

Here is some XML that you might use to upload a catalogue to your web site

<PRODUCT>
     <SKU>A334567</SKU>
     <RRP />
     <PRICE>2.34</PRICE>
</PRODUCT>

Do you notice the similarity with XHTML? There are the thingies inside angle brackets, and the container and stand-alone elements.  The only difference is that the codes here are about products and prices instead of paragraphs and formatting. 

This turns out to be what the IT industry has needed for years, a simple way of passing data around in such a way that a standardised piece of software can extract the information so that computers can use it. 

The alphabet soup

Various organisations have developed a large number of standards, which are industry-specific schemas for different purposes.  The one you are most likely to come across are the Microsoft Office standards for storing spreadsheets and word documents.  Office 2000 introduced the concept of an XML spreadsheet and we routinely use this now to upload data to web sites. But we can also save reports as XML word documents so they open correctly in word.

Other XML standards you may come across:

XBRL – (Extensible business reporting language) you may come across this as a way to distribute the information in company annual reports.  If you do investor relations work you need to be aware of this (www.xbrl.org).

RSS – (really simple syndication) a standard for distributing news feeds.  This is widely used on the web for distributing blogs as well.

OEBPS is a standard for distributing electronic books.

SportsML a standard for distributing sports data

You can find complete lists (there are hundreds of them) at www.xml.org.

Business intelligence

A new eye-tracking study has refined the 'golden triangle' model of how web pages are scanned and produced an 'F' shaped pattern. 

This study recorded how over 200 users looked at a variety of web sites.  For the heatmaps go to the excellent useit web site http://www.useit.com/alertbox/reading_pattern.html  

News from the web

Google is planning to offer online storage according to company documents that leaked onto the web by mistake.  The new product will be called gdrive and I guess will be similar to Streamload (www.streamload.com). 

Proving that you can prove anything with statistics, Symantec have managed to report that IE has more security holes than Firefox and vise-versa.
 (hint – it depends what you mean by a bug).

The Mozilla browser has been developed by free volunteer programmers, and filthy lucre is not supposed to come in to it. But whenever anyone goes via the Google search box in the browser and clicks on a sponsored link, Mozilla Corp gets a rake-off - $72 million.  
 
Microsoft is releasing a new search engine.  It sounds to me as if the revamp is to do with presentation gadgets (like saved searches) rather than the underlying search method, which always seems to give less good results than Google (to me anyway). 

Check out the beta version now here
http://www.live.com/  You don’t page forward any more by the way – just keep scrolling… an application of a technology called AJAX which I will describe next month.


According to a survey, the British now spend more time on the Internet than watching TV (nearly three hours per day average).  According to another survey people who read surveys will believe anything.

Google has bought writely – who run a web-based word processing tool.  Writely is a word processor that runs in a web browser.  I will be interested in seeing how well this works in practice given the resources of Google are put behind it.  Will it replace Microsoft as the office software of choice?  Personally don’t think so.  But we shall see.

Another web site has sued Google because it didn’t like its ranking (Yaaawwwnnn).  An exercise in futility I am sure, but maybe got the site a few headlines.

Bill Gates has promised us a new version of IE every year.  Great – I love those moving targets.

In the meantime, IE7 beta 2 is out now.

The US conference of catholic bishops has created a web site to refute the Da Vinci code.  The world really needed that. 
http://www.jesusdecoded.com/