Linguamatics, Brandwatch and the University of Sussex, UK, have announced a joint project funded by the UK's Technology Strategy Board to address challenges faced by automated language processing software in harnessing diverse data sources. The project forms part of a broader Technology Strategy Board initiative focusing on enabling technologies to harness Big Data for economic growth.
The development will improve automatic extraction of information from scientific papers, news or social media for applications in research and development, marketing and competitive intelligence. The current generation of language processing has had considerable success in extracting useful information from unstructured text, whether this is research literature or social media. However, adapting to a new domain is often a laborious process with respect both to the type of data (e.g. newswire vs. patent literature) and to the terminology used in a given domain (e.g. in medical practice vs. pharmaceutical research).
Humans can perform these tasks on small data sets, but face a huge challenge in the face of massively increasing amounts of electronic text. The EVOKES project, which stands for Exploitation of Diverse Data via Automatic Adaptation of Knowledge Extraction Software will exploit distributional similarity techniques developed by the University of Sussex. The project will run for 18 months.