Intel has developed a new machine programming system developed in conjunction with Massachusetts Institute of Technology and Georgia Institute of Technology. Machine Inferred Code Similarity (MISIM), is an automated engine designed to learn what a piece of software intends to do by studying the structure of the code and analysing syntactic differences of other code with similar behaviour.
MISIM outperforms the current state-of-the-art code similarity systems by up to 40x and shows great promise for a range of applications from code recommendation to automated bug fixing.
Justin Gottschlich, principal scientist, director and founder of Machine Programming Research at Intel commented: ‘Intel’s ultimate goal for machine programming is to democratise the creation of software. When fully realised, machine programming will enable everyone to create software by expressing their intention in whatever fashion that’s best for them, whether that’s code, natural language or something else. That’s an audacious goal, and while there’s much more work to be done, MISIM is a solid step toward it.’
With the rise of heterogeneous computing, hardware and software systems are becoming increasingly complex. This complexity—paired with a shortage of programmers with the ability to code at an expert level across multiple architectures—has highlighted a need for new development approaches. Machine programming, a term coined by Intel Labs and MIT in their ‘Three Pillars of Machine Programming’ paper, which aims to improve development productivity by providing automated tools.
A key technology to several of these emerging machine programming tools is code similarity, which has the potential to accurately and efficiently automate some of the software development process to meet this need.
However, building accurate code similarity systems is a complex challenge in itself. These systems attempt to determine whether two code snippets show similar characteristics or aim to achieve similar goals, which can be a daunting task when only provided source code to learn from.
Developed in partnership with MIT and Georgia Institute of Technology, MISIM can accurately determine when two pieces of code perform a similar computation, even when those two pieces of code use different data structures and algorithms. As Gottschlich explains, ‘this is an important step toward the grander vision of machine programming.’
A core differentiator between MISIM and existing code similarity systems lies in its novel context-aware semantic structure (CASS), which aims to lift out what the code actually does. Unlike other existing approaches that attempt this, CASS can be configured to a specific context, enabling it to capture information that describes the code at a higher level. This enables CASS to provide more specific insight into what the code does rather than how it does it. Moreover, MISIM can do all of this without using a compiler -- a program that translates human-readable source code into computer-executable machine code. This has many benefits over existing systems, including the ability to execute on incomplete snippets of code that a developer may be currently writing – an important practical characteristic for recommendation systems or automated bug fixing.
Once the structure of the code is integrated into CASS, a number of neural network systems give similarity scores to pieces of code based on the jobs they are designed to carry out. In other words, if two pieces of code look very different in their structure but perform the same function, these neural networks would rate them as largely similar.
By bringing these principles together in a unified system, Intel, MIT, and Georgia Institute of Technology researchers found that MISIM was able to identify similar pieces of code up to 40x more accurately than prior state-of-the-art systems.
While Intel is still expanding the feature set of MISIM, the company has already moved it from a research effort to a demonstration effort, with the goal of creating a code recommendation engine to assist all software developers programming across Intel’s various heterogeneous architectures. This type of system would be able to recognise the intent behind a simple algorithm input by a developer and offer candidate codes that are semantically similar but with improved performance.
The Machine Programming Lab is also engaging with software groups at Intel to see how MISIM can be integrated into their day-to-day development. Gottschlich, who is also an Adjunct Assistant Professor at the University of Pennsylvania, hopes to help them, and Intel at large, improve productivity and eliminate some of the mundane parts of programming, like hunting down bugs. Gottschlich speculates, ‘I imagine most developers would happily let the machine find and fix bugs for them, if it could – I know I would.’