If you understand what you have read so far in this post, then you would definitely know the difference between the sentences ‘GDP rose in 2015’ and ‘GDP rose consistently from 2010 to 2015.’ So would a Natural Language Processing (NLP) programme. Some NLP programmes might even one-up us mere mortals by giving the ‘dependency parsing’, ‘parts of speech tags’, ‘named entities’ and other sentences attributes that only learned and esteemed linguistic practitioners like Noam Chomsky, George Orwell and Donald Trump would understand.
Well, NLP programmes (or at least the one we’re currently using) might be able to parse a claim like ‘GDP growth averaged 7.3% under the previous Labour administration’ (warning: fake news; please don’t take this statistic for truth) and flood you with a deluge of sentence attributes. But they are currently unable to understand what this claim entails, and more importantly, the data that should be sought to verify this claim. NLP has yet to advance to the point whereby it can take in any sentence ever conceivable by humans and spit out all the intricacies and subtleties in the sentence. And so we currently have to make do with humans to bridge the gap.
We looked through a database of claims that were made (solely) regarding GDP and identified the ones whose sentence structures were more common. These sentences were then parsed by the NLP programme that we used, which would output the words in the sentence that corresponded to certain parts of speech / categories. For example, ‘GDP rose consistently from 2010 to 2015’ would give us ‘GDP’ as the ‘topic’, ‘rose’ as the ‘verb’ (
or type of flower), ‘consistently’ as the ‘checking_modifier’ (a more glorified term for ‘adverb’) and the years 2010 and 2015 as ‘time’. We could then link certain outputs to specific data that we had to obtain to factcheck the claims. As with any other human endeavor, we are making progress in this area. Our current idea is not to factcheck all claims regarding GDP that are made by Jeremy Corbyn, The Sun or Lord Buckethead, but rather the claims that happen more often in the media.
Fingers crossed, we should get an initial working prototype by next week.