'Ex Africa semper aliquid novi' said Pliny. Or, as the 100 million or so Kiswahili speakers of East Africa might put it: kitu pya sikuzote njoo Afrika. I do not open this article with Kiswahili on a whim: while there are many languages in use throughout Africa, this is the most strategic Bantu tongue. Of the 100 million Kiswahili speakers in East Africa, an estimated 10 million have access to computers. Establishing software within this language is currently a major theatre of development and contention.
There is a widespread view of Africa as the perennial beggar of the global village. But the reality is different. Africa is a continent awash with cultural richness of great diversity and complexity, containing people as keen and able to take part in the 21st century on their own terms as anyone else.
- Above: Curve fit and some GenStat output from data relating nutritional deficit to distance from a distribution centre. Below: Statistica histogram plots effectiveness of measures to suppress orientia tsutsugamushi in samples from an urban study area.
Of course, problems exist and outside assistance is a vital part of dealing with them - partly from common humanity but, partly, in the hard realities of the world, for selfish reasons. Africa has massive resources - a quarter of US oil could be coming from Africa within a decade, according to many estimates. Politics and big business alike have begun to recognise Africa as a potential power and as a huge market in the future.
Signs of this awareness stretch from the Make Poverty History campaign1,through Tony Blair's Commission on Africa report2, and books by formerly hard-line figures, such as Jeffrey Sachs' The End of Poverty3 and the publicised humanitarian concerns of Microsoft's Bill Gates.
Africa's peoples have a thirst for education and mental challenge. In many countries, rural children often get up before dawn, do a day's work, and then walk miles to school. Some US and European companies have made a commercial success of scouring the developing world for talent to be recruited for clients in the developed world. But, in a world of explosive technological change, there is a need for resources to feed the will: the new coming out of Africa is as much in need of input as anywhere else. In particular, there is a need for information and communication technologies.
- GenStat's all-new 8th Edition interface presents trace dietary element data (Gathered and pre-processed in GSDE files) for a rural community study.
There are many conduits by which vigorous self-help in Africa is abetted by transfer of ICT hardware and software. In some ways, hardware is less of an issue than software: once obtained (and its maintenance provided for) it is flexible. Software, my main interest here, is vulnerable to compatibility issues: there is limited value in gaining a foot on the ICT ladder, if that ladder doesn't lead up to a progressing ICT community.
Despite some healthy pockets of Apple technology, the big choice is between Microsoft Windows and its open-source alternatives. Legacy considerations are among the factors that will condition that choice. What, then, is the state of play between those two choices?
- GenStat 8th Edition's revamped graphics system being used to study the pattern of induced instabilities in a pest population following artificial introduction of natural predators, using data collected and pre-processed in GSDE.
As I mentioned in a survey of free statistical software, last year5, free and open source software (FOSS) is the route favoured by the United Nations6 and the particular focus of the International Open Source Network7 centre of excellence. The IOSN is based in the Pacific but has implications for Africa; it is aimed at corporate, enterprise-level development, but that too has trickle-down effects that smaller users cannot ignore. Macro scale issues, not affordability at the point of use, are the main reason for this, though the UN Conference on Trade and Development (UNCTAD) notes8 that 'it is a useful side-benefit that FOSS tends to be more affordable'.
The UN is not the only big gun backing open source; IBM has committed significant funds (for example, US$100m over three years9 to the cause of easier interaction between Linux users of its Workplace software) and access (in particular to patents10. In IBM's case, it is one plank in the long-running tussle with Microsoft for the hearts and minds (and therefore platforms) of the next mass computing generations. New mass markets will open up in years ahead - one of them being, of course, Africa.
A Kiswahili version of the free-to-download OpenOffice suite (branded 'Jambo OpenOffice'), part of the Tanzania-based Open Kiswahili Localisation Project, is already out there and touting for users11 - especially in education (from primary schools to universities) where future users are formed. Microsoft is not leaving the field uncontested; Kiswahili 'skins' for its software are in various stages of development, though some way behind and there has been a tendency to hope that technology will sway users to English.
Powerful political and commercial currents, beyond the reach of most users, are shaping the software landscape. On a more tactical plane, though, decisions are largely made by default. Available platforms have to be used with available software. Available software for serious scientific work is a blend of legacy decisions by donors and previous decision makers.
One example of an external conduit is the provision by VSN International of free (in financial terms) licences for a 'Discovery Edition' of its heavyweight analytical language product GenStat. After a couple of successful years within the continent, VSN is now seeking to expand this programme out of Africa into other areas of the developing world12. GenStat Discovery Edition (GSDE) runs under Microsoft Windows of any full 32 bit version but, being very economical with system resources, is perfectly happy on what would elsewhere be regarded as low specification platforms (see section below).
Statistical analysis is a common requirement of all research work. GenStat, apart from being one of the world standards for this software type, holds the advantage of having been designed by and for researchers in agricultural work - a major concern (health care is another) for African science. GSDE is distributed freely on CD: the user needs a licence which is issued without charge to any non-commercial user within specified areas, and renewed annually. Operational support for the project is provided by the World Agroforestry Centre, International Livestock Research Institute, and University of Nairobi Biometry Unit Consulting Service (all based in Nairobi, Kenya), plus the University of Reading's Statistical Services Centre in the UK.
VSN are open about the pragmatic side of this arrangement - supplying free versions for those who cannot buy the commercial one does not deprive VSN of any customers, but spreads awareness to a generation of customers who may be able to buy the software in future. It also provides opportunities for product feedback. At the same time, though, it arises from a genuinely altruistic commitment. Though I am sceptical about altruism in the commercial world, I meet it often enough that I ought to be more generous. Individuals in the largest companies (Microsoft and StatSoft to name but two) have bent over backwards to provide me with help for humanitarian projects with no possible mercantile return. Talking to three of VSN's key figures, though, I was impressed by the extent to which such motivation informs their fundamental thinking.
GSDE comes on a CD replete with comprehensive documentation, ranging from an initial tutorial to general matter, and more detailed training content aimed at those with agricultural research experience. Like the program interface, it's all in English (apart from a French version of the introductory section), but I've encountered various home-grown translations of other sections into French, Arabic, Kiswahili and some other Bantu languages.
The free availability of such potent Windows products as GSDE points up the Windows/Linux dilemma for users. In fact, this dichotomy is becoming less hard-edged for those with Windows-capable platforms. While opinions vary within in the Open Source community, the Windows compatibility option offered by WINE (see section below) has significant implications. During the background research for this piece I located several sites, and visited two, which were running Linux platforms with at least one strategic Windows application. The UN logic of OSF development is hard to argue against for the long run; but WINE holds out the possibility of an almost indefinitely extensible transfer for those whose short-term options are Windows-centred. I'm not predicting which software giant will win this titanic struggle but, in the long run, the geosocial giant that is Africa stands to benefit.
To Africa and beyond - GenStat Discovery Edition
GenStat Discovery Edition (GSDE) is based on the 5th Edition commercial release of GenStat for Windows (the current release is 8th edition - see below). Selection of an older release is not commercial cynicism (I'm happy to say that I've found none of that at VSN) but an appropriate choice for the environment. If a machine will run Windows 95, it will run GSDE.
There is obviously a price to be paid for this in program-development terms, but it's a trade-off well judged for the intended GSDE user-base. Latest enhancements and additions to statistical procedures are not available, but there is enough analytic power for applications at the technology levels suggested by budgets that will not stretch to software purchase.
Graphics are focused on exploratory data analysis, not publication-quality prettification. The user interface is sparser and harder work than the rich world has recently become used to, but not arduously so. Manipulation of data is still openly data-centric, with none of the recent 'let's pretend we're in Excel' trends. That seems to be an advantage in many ways - and it certainly it makes for better, more structured habits in users new to data handling. Size of dataset is limited by available machine memory, but in practice this seems to mean that a couple of hundred variables can always be analysed across several thousand cases. For really large datasets, subset copies can always be made containing only the variables (and records) of immediate interest.
To look for what can't be done is, in any case, a habit born of surplus capacity. GSDE is aimed at a different environment, where resources are at a premium, and where what can be done is a more valuable yardstick. Eighteen months is a generation in desktop computing terms and it's worth remembering that we in the US and Europe were more than pleased with access to this level of sophistication only a few years ago.
To ask an obvious question: why GSDE and not one of the many other free statistics software options (such as those mentioned in my SCW survey last year5? Those do remain valuable - but GenStat is a powerful tool, more so than most of those I looked at in my survey, inhabiting the same plane as R.
Roger Payne of VSN commented in passing that S appeals to mathematicians and to those (like myself) who learnt their programming in language structures such as Algol, Pascal, etc, while GenStat attracts those who come from applied statistical analysis and then look around for software which can extend their reach. I took this with a pinch of salt at the time but, on closer investigation with a number of new and experienced users, found that he is quite right. What is true for GenStat and S is, obviously, also true for GSDE and R.
Another important aspect of GSDE is continuity within a strong tradition of GenStat usage in the British Commonwealth. There is a body of expertise out there into which users of GSDE can tap, and skills learnt on GSDE are portable - this is not just an analytical tool but potentially a developmental catalyst as well.
WINE, Windows and Song
WINE is a free (GNU Lesser General Public License terms) open source implementation of the Windows API, running on top of a Linux, FreeBSD or Solaris operating system as a Windows compatibility stratum. It does not use any Microsoft code, but can call native Microsoft DLLs and allows many unaltered native Windows applications to run as normal via a program loader. With typical open-source self-referentiality (remember that 'GNU is Not UNIX'!), the name seems to stand for 'Wine Is Not an Emulator'.
Not everything will run without initial tears, but determination pays off. A database is supplied of those applications whose teething problems have already been overcome; and each new application so tamed brings with it a new cluster of others with similar characteristics. My first attempts to run GSDE (see box opposite) under WINE on a Linux system were unsuccessful, but having checked with VSN that there was no reason in principle why it shouldn't work, I went to a colleague with more relevant experience who got things running after a while. So, while this is a Windows program, investment in it does not entail present or future commitment to Windows itself.
WINE makes it possible to run Windows programs such as GSDE regardless of the platform - and seems likely to strengthen the position of Linux - though not everyone in the open source community agrees.
GenStat 8th Edition
It would make no sense to mention GenStat Discovery Edition without also looking at the latest full commercial incarnation of this analytical benchmark, which went to market in March. When writing a year ago[13] about the previous 7th edition, I commented on an uncharacteristic dearth of information from VSN. If little information was coming out, that was a side-effect of resources being focused on development. I've been using beta and pre-release copies of GenStat for Windows 8th Edition for three and a half months now, and that development time has clearly paid off.
The GUI and graphics, which had already made great moves forward, are better than ever. Cosmetically, the look and feel are the equal of anything else at the top end of this market. I ran a laptop copy of the program past some of my youngest fresher-year students, who have never known anything but Windows XP's candy-coloured bubble look, and it was judged 'cool'. More important, the program is sweet and effortless to use in exploratory mode, without any loss of its traditional batch mode power. Speed and responsiveness are noticeably enhanced, from initial load through calculations to rendering. Menus, whether drop-down or pop-up (the right mouse click offers a rich contextual route) are intuitive and welcoming; and custom environments can be built by adding new menu options. I was able to develop sample add-in menus myself with little trouble, using GenStat's own command language, and a colleague built me two successful COM add-ins - one in C++ and the other in Visual Basic.
Spreadsheet management functions are luxurious, and there is a data menu with a range of tools (including unit conversions and extensive pure matrix operations) gathered in one place. The spreadsheets themselves (and text windows too) have become discrete, emailable objects, which makes dispersed collaboration much more convenient. There is considerable development in the file menu, where data from a rich range of other file sources can be loaded and saved back (including, for example, a range of cells from within an Excel sheet). Output is (optionally) rich text formatted, and clicking on a generated graph enlarges it.
On the analytic front, there are expansions and additions too numerous to mention. They are summarised on VSN's web site (address under 'sources', below). They all work usefully and solidly, as GenStat users will expect, with greater ease and comfort than ever before, using selection and presentation methods on a par with the best of the competition. Plot types and options are expanded, too. As is to be expected in a product with GenStat's particular pedigree, additions across the board show evidence of latest developments in life science including micro array work.
I said of the 6th edition that GenStat had moved out onto the open plain to compete with its big competitors in open market, and of the 7th that it was 'evolutionary rather than revolutionary'. Here in the 8th, the degree of evolution is dramatically apparent: GenStat is now not only the equal of anything else in its expanded habitat, but arguably the best around.
Sources
Though any mistakes are entirely my own, and no quotations survived into final text, preparation of this article was immeasurably enriched by helpful conversations with Matthew Revel of Linux User Group voice LugRadio.
www.lugradio.org/
Biometry Unit Consulting Service, University of Nairobi
www.uonbi.ac.ke/acad_depts/bucs/index.php3
GenStat for Africa
www.worldagroforestry.org/sites/RSU/genstatforafrica/index.asp
International Livestock Research Institute
www.cgiar.org/ilri
Statistical Services Centre, University of Reading
www.rdg.ac.uk/ssc
Open Group
www.opengroup.org
VSN International
www.vsn-intl.com
New in GenStat 8th Edition
www.vsn-intl.com/genstat/8thedition/New_Features_8thEdition.htm
WINE
www.winehq.com
World Agroforestry Centre
www.worldagroforestrycentre.org
Bibliographic notes
1.Make Poverty History. http://www.makepovertyhistory.org
2.Commission for Africa report. http://www.number-10.gov.uk/output/Page7310.asp
3.Sachs, J. The End of Poverty: Economic Possibilities for Our Time. 2005: Penguin Press.
4.Africa 05. 2005. http://www.africa05.co.uk/
5.Grant, F. Free statistics software. Scientific Computing World, 2004(October 2004).
6.UNCTAD says open-source software could boost information technology sector in developing countries. 2003, United Nations: Geneva. http://www.un.org/News/Press/docs/2003/tad1967.doc.htm
7.International Open Source Network. UNDP-APDIP. http://www.iosn.net/
8.Open-source software could boost ICT sector in developing countries, says UNCTAD report. 2003, United Nations. http://www.unctad.org/Templates/webflyer.asp?docid=4255&intItemID=2261&lang=1
9.IBM Accelerates Network-Delivered Client Computing on Linux. 2005, IBM: Somers, NY, USA. http://www-1.ibm.com/press/PressServletForm.wss?MenuChoice=pressreleases&TemplateName=ShowPressReleaseTemplate&SelectString=t1.docunid=7532
10.Power Architecture Community Newsletter, 14 Jan 2005: Community calendar. 2005, IBM: Somers, NY, USA. http://www-128.ibm.com/developerworks/library/pa-nl1-calendar.html
11.Jambo OpenOffice Official Release. 2005, Mradi wa kuswahilisha programu huria (The Open Swahili Localization Project). http://www.kilinux.org/kiblog/archives/cat_2.html
12.VSN Newsletter January 2005. 2005, VSN International. http://www.vsn-intl.com/newsletter/Newsletter04/GenStat%20Classified.htm
13.Grant, F. From Life to General Statistics. Scientific Computing World, 2004(March/April 2004).