Capable software, not fast hardware, is the real differentiator of scientific computing. However, even though software is so important (or maybe because it is so important), there is much disagreement on the best way to create and use software for scientific computing.
One of the most recurrent themes is that of open-source vs. proprietary code. This debate is often painted with the idealistic open-source evangelists on one side, and the business-focused proprietary software advocates on the other. This is, of course, an unfair depiction of the topic. In reality, when debating open-source vs. proprietary, several issues tend to get conflated into one argument – open-source vs. closed-source, free vs. paid-for, restrictive vs flexible licensing, supported vs. unsupported, code quality, and so on.
Open-source really means no more and no less than making the source code readily available to anyone. Thus, open-source makes no statement as to the licensing conditions for using the software, whether there are charges for using the software, whether the software is supported, or actively developed, or any good, and so on. Closed-source means that source code is not readily available, but makes no comment on issues like licensing, costs, support, and quality.
Making the source code open means that people can look at it, help spot and fix bugs, verify that the code does what is promises and help port to new hardware. Keeping the source code private helps protect any IP or competitive advantage believed to be in the code. If the code is subject to restrictive licensing or usage fees, then closed-source helps prevent people from using the code against the licence or without paying. If such a code was open-source, then users could (and indeed unfortunately, do) take the code and use it without paying or under conditions it is not licensed for, such as commercial use vs. academic use.
Misleading pseudonym
I suspect that for the majority of scientific computing users, ‘open-source’ is being used as a (poor) pseudonym for ‘free’. No-one likes paying for something if they can get it for less or for free. Of course, no software is truly free. Someone, somewhere, is covering the costs of development, installation, testing and maintenance. In a curious twist of the ridiculous, it is often those who develop software themselves who expect other software to be free! Yes, they want the product of their daily efforts to be valued by their users, research reviewers and customers, – but have a contrary reluctance to acknowledge that they should in turn value (and pay for) the software outputs of others.
However, leaving aside those researchers who seem to have perverse moral objections to paying for software, for most people it is probably just a case of avoiding the effort involved in buying software and working through the corporate/institutional buying process.
It doesn’t help that many of those selling software, either commercial vendors or academic groups, often impose restrictive licensing conditions – for example, limiting usage to ‘research purposes’ not ‘commercial purposes’, which merely burdens a typical user with risk when many workloads are a blend of the two. Neither does limiting the amount of compute power that can be used by the software using per-core licensing models.
This per-core licensing is often quoted as one of the top problems with commercial software – the rapidly scaling costs prohibit use on any computer system of scale. The counter argument goes something like this: if you double your compute power by deploying twice as much hardware, then you’re happy to pay twice as much, so surely if you double the solver capability by using more cores, then it is reasonable to pay twice as much for that software?
This is a really good argument – as long as the software vendors are willing to accept the other side of the comparison – that the cost of a given amount of compute power halves every year or so with Moore’s Law and associated advances. So, to all those vendors still pursuing wholly per-core licensing models – are you also intending to halve your per-core price each year?
Of course, the costs incurred by the software provider in developing, testing and supporting scalable code are non-trivial, so a premium for using more cores is fair. However, those costs are not linear with the number of cores – not even close – so that premium does not support the case for a wholly per-core licensing model. It does however support a case for users to pay more for a scalable or a GPU version, or a Phi version of the software, than for the desktop version.
Beacon of distrust
The most disgraceful of these restrictive licence conditions – and there is zero excuse for this – is software licences that ban benchmarking the software on various platforms, or against other software products. This should be taken by prospective customers as a huge noisy beacon of distrust on the part of the software vendor in the quality and capability of their own product.
Open-source vs. closed-source is often taken to imply something about the quality of the software, and the availability of support. Rather amusingly, both sides of the debate claim that their model is the one that assures quality software and support, while the opposing model leaves the user at the mercy of cowboys.
The software vendors and other proprietary software advocates work hard to imply (or even outright assert) that open-source software is generally of poor quality and untested. They claim that support for open-source software is rare, or is reliant on a vagrant army of volunteers. Indeed, the proclamation is that any sensible commercial user will want to pay for proper software where the vendor has a commercial incentive to assure quality, continue development, and provide support contracts.
The open-source advocates fight back with the observation that open-source enables the user to see the quality for themselves – (unlike just having to trust closed-source) – and maybe even directly improve the quality if the licence conditions allow it. And, they will argue that the good open-source software enables the best of two support models – support through community and collaboration, plus specialist providers who will take a contract to support open-source software. Depending on the licence, open-source also allows the ultimate backup support model – dive into the code and fix any problems yourself.
One other aspect related to open vs closed-source is exploiting new computational architectures – such as Xeon Phi, GPUs, and ARM. Users of closed-source software are at the mercy of their software provider in being able to use new technologies. Open-source users have the option to port the code directly themselves or pay a specialist to do so (assuming the licence conditions allow this).
Ideology vs commerce?
So, given that open-source vs. closed source software doesn’t really assure the user about support, quality, license limitations, cost – might one conclude that the debate is little more than ideology vs. commercial interests?
However, ideology aside, the role of open-source software is an important topic in both academia and industry. In academia (and national labs) especially, the argument applies to both software created and software used. I’ve written about this topic before; for example, I co-authored an article in 2015 with Dan Katz, Simon Hettrick and Neil Chue Hong (www.hpcnotes.com/2015/08/the-price-of-open-source-software-joint.html).
In this article, we argued that software developed with public funds should be released as open-source by default (although not mandatory). This could be implemented as an onus on any researcher in academia or national labs writing software to demonstrate either that there is a clear reason why the software should not be open-sourced, or that no public funds were used in the development of the software.
It is a delicate argument to make that one researcher should get the financial benefit of charging other users (whether academics or not) to use software for which that researcher was themself publicly funded to create. Indeed, several people argue quite strongly that all software developed with public funds should be freely available to commercialise by anyone, so not just open-source but unrestrictive licensing too.
While entirely credible on a national scale, this starts to get a little awkward when considering international aspects. For example, should software funded by the UK public purse be available for free commercialisation by American companies? Is protecting against this more or less of an evil than hindering commercialisation by UK companies and thus creating an economic benefit back to ‘UK plc’?
Case for re-use
Perhaps more importantly though, we argued loudly in this article (and elsewhere) to support a culture of software re-use within academia. There are currently only weak incentives for academics to re-use existing software, with the result that much public funding is wasted recreating software capabilities across multiple groups, each claiming to be ‘unique’ in order to justify it as a research objective.
We also very firmly called for the involvement of professional software developers where appropriate, working in an integrated manner with the researchers. It is not helpful that the academic world only incentivises ‘research excellence’ – and does even that poorly. The essential role of specialist scientific software developers in enabling a huge proportion of modern research must be recognised, rewarded, and funded properly.
Addressing this challenge in research software is one of the goals of the Research Software Engineer (RSE) movement in the UK (www.rse.ac.uk). We noted that such specialist programmers could be based in academia or industry – the key, we argued, is for research funding rules to encourage either the support of properly funded academic RSE positions or for research grants to fund non-academic sources of such expertise.
Mike Croucher, one of the UK’s newly crowned Research Software Fellows, recently wrote a wonderful blog describing the impact delivered by RSEs and the motivations driving such specialists (www.walkingrandomly.com/?p=5997).
But open-source software is not just an issue for academia. It is a theme with surging momentum in the commercial space too. In general, commercial users of software don’t care whether the source code is locked in a vault or broadcast by a town crier. However, they do care whether the software is tested, will continue to be updated with new features and support for new platforms, is usable, and is documented. They also care how much it costs to acquire, support and use the software.
Proprietary software has traditionally dominated the commercial user space, because such software has been able to meet these needs better than open-source software. Or, perhaps more honestly, proprietary software has been able to convince customers that they meet these needs better than open-source software. After all, how many open-source options have a marketing department to compete with the software vendors?
False economies?
Historically, even what appears to be expensive proprietary software is often significantly less expensive in reality than the cost of employing someone to develop software with similar functionality, or of properly qualifying and implementing an open-source alternative. If a user’s needs are fulfilled by non-free software (whether open or closed source), then there has normally been a solid case for purchasing that software rather than seeking or developing an open-source alternative.
However, the perceived unresponsiveness of software vendors to address unsustainable licence pricing models, uncooperative licence limitations (such as banning benchmarking), and slow support for new hardware technologies, are driving a willingness among commercial users to re-evaluate the true costs and benefits of open-source or in-house developed alternatives.
Various scientific computing companies now sell support services for open-source software, or even for in-house software, helping to make these software options a credible business-ready and affordable option for industry.
Most users of software sensibly employ a mixture of software tools that span open-source, closed-source, proprietary, ‘free’ and in-house. Many modern software developers also decide to use a hybrid of open-source and proprietary models within an integrated code-base. Advocating either open-source only, or commercial only, software dogmas are both narrow-minded and unhelpful in allowing the researcher or the business the freedom to deliver the best outcomes.
So let’s just keep ideological debates out of the users’ way and be flexible to invest in whichever software delivers the best science and engineering advances.