| 18 April 2006:XML and the alphabet soup |
In this issueFeatured article XML and the alphabet soupYou can’t go anywhere in the technology world these days without coming across XML (Extended Markup Language). If you want to pick up a news feed, publish data from your client's annual report, or upload information from a spreadsheet, the odds are you will use XML. Any problem that involves moving information about is likely to have an XML solution. Granted this is not the most gripping subject, but it is so important and comes up so much these days I thought I would take a shot at explaining the concept and put it into context. Stick with it and one day you may thank me. XML solves a number of different problems, all to do with organising information:
The last is the most familiar. The language of the web (HTML) is a corrupted XML language, but there is a version of HTML that is proper XML (XHTML). Integrating coporate computer systemsThe size and complexity of the data processing systems in a major corporate is mind-boggling. No-one really understands the whole thing, and these systems often don’t talk to each other. When I was IT manager at a major London broker, we had almost as many different technologies as we had IT staff. None of the systems were properly integrated – it was just too difficult. People have developed systems to transfer information called middleware, but what we needed was a standardised way of organising that information. Uploading and downloading information from a web siteAny large ecommerce site has the problem of uploading the catalogue which can often be thousands of items, and downloading customer information and orders back to the warehousing and accounting systems. Traditionally we have used comma separated values (CSV) files. These are as the name suggests items of information separated by commas (or if you are on the continent by semi-colons). This format is not very satisfactory, and what we needed was a much better way of passing information. Enter Markup languagesWhen you type a document on your computer you will happily put passages in bold, or select italics, or change the font. It probably doesn’t worry you that the computer storage medium can’t store information directly in this format. The coding used by computers can’t cope with the hundreds of different fonts and layout variants. The program has to do all sorts of jiggery-pokery to store the extra information not only the words you have typed, but how they are to be presented. This is called markup information. Some time in the 1970s some guys at IBM came up with a standardised way of storing this extra information. In the 90s this begat HTML (Hyper Text Markup Language) – the language used to store web pages. As the XML language became more standardised it became necessary to bring HTML into line, which begat XHTML. For those of you not familiar with it, here is some XHTML.
So the markup information is indicated by thingies inside angle brackets called tags. The thingies are of two types:
Enter XMLXML stands for Extended Markup Language. The title suggests it is something to do with marking up documents – but not in the least. Here is some XML that you might use to upload a catalogue to your web site <PRODUCT> Do you notice the similarity with XHTML? There are the thingies inside angle brackets, and the container and stand-alone elements. The only difference is that the codes here are about products and prices instead of paragraphs and formatting. This turns out to be what the IT industry has needed for years, a simple way of passing data around in such a way that a standardised piece of software can extract the information so that computers can use it. The alphabet soupVarious organisations have developed a large number of standards, which are industry-specific schemas for different purposes. The one you are most likely to come across are the Microsoft Office standards for storing spreadsheets and word documents. Office 2000 introduced the concept of an XML spreadsheet and we routinely use this now to upload data to web sites. But we can also save reports as XML word documents so they open correctly in word. Other XML standards you may come across: XBRL – (Extensible business reporting language) you may come across this as a way to distribute the information in company annual reports. If you do investor relations work you need to be aware of this (www.xbrl.org). RSS – (really simple syndication) a standard for distributing news feeds. This is widely used on the web for distributing blogs as well. OEBPS is a standard for distributing electronic books. SportsML a standard for distributing sports data You can find complete lists (there are hundreds of them) at www.xml.org. Business intelligenceA new eye-tracking study has refined the 'golden triangle' model of how web pages are scanned and produced an 'F' shaped pattern. This study recorded how over 200 users looked at a variety of web sites. For the heatmaps go to the excellent useit web site http://www.useit.com/alertbox/reading_pattern.html News from the webGoogle is planning to offer online storage according to company documents that leaked onto the web by mistake. The new product will be called gdrive and I guess will be similar to Streamload (www.streamload.com). Proving that you can prove anything with statistics, Symantec have managed to report that IE has more security holes than Firefox and vise-versa. The Mozilla browser has been developed by free volunteer programmers, and filthy lucre is not supposed to come in to it. But whenever anyone goes via the Google search box in the browser and clicks on a sponsored link, Mozilla Corp gets a rake-off - $72 million. Check out the beta version now here
Google has bought writely – who run a web-based word processing tool. Writely is a word processor that runs in a web browser. I will be interested in seeing how well this works in practice given the resources of Google are put behind it. Will it replace Microsoft as the office software of choice? Personally don’t think so. But we shall see. Another web site has sued Google because it didn’t like its ranking (Yaaawwwnnn). An exercise in futility I am sure, but maybe got the site a few headlines. Bill Gates has promised us a new version of IE every year. Great – I love those moving targets. In the meantime, IE7 beta 2 is out now. The US conference of catholic bishops has created a web site to refute the Da Vinci code. The world really needed that.
|
