Healthcare , Technology , Financials Author:Yusuf Tuna May 07, 2019 08:35 PM (GMT+8)

Compared with the top leaders like Google, Facebook, Baidu and other highly-innovative and groundbreaking unicorns who have a large amount of data; other latecomers are placed in an extremely high threshold in developing NLP Systems.

Beijingers have strong local accents; especially Taxi drivers who come from several different places of China. Although sometimes it is hard to understand what the driver says; they surprisingly communicate well with Baidu Maps (百度地图) when they try to get navigated. No matter how problematic the driver's accent is, the voice recognition system always grasp what the command driver issues. 

EqualOceans cooperated Business Insider and prepared an Exclusive Tech Report. 2019 Technology Trends in China breaks down the eight technologies that will be most significant over the next three years. This prospective report was designed to investigate the applications of and developments in the Quantum Computing, Edge Computing, Flexible Display, Natural Language Processing, 5G Communication, Immunotherapy, Blockchain and Immersive Technology in China.

Natural Language Processing (NLP)

The emergence of deep learning has set off a renaissance of artificial intelligence, and Natural Language Processing (NLP) has achieved a second technical leap. Limited by the differences between human thinking and computational thinking, the previous human-computer interaction method is that humans accommodate to computers and input Boolean structured data in a computer-understandable form; the purpose of natural language processing is to make computers “accommodate”.

Human beings enable computers to "read" human language texts (ie, natural language understanding), including text comprehension, information extraction, language translation, and text proofreading. Further, the technology is also required to generate a language for feedback output (ie, natural language generation).

Technology Application

Natural Language Processing (NLP) is the core technology that guides the transformation of human-computer interaction. Therefore, any scenario involving content text interaction is the value point for NLP to enter and initiate change. NLP understands unstructured texts and refines useful data to efficiently empower vertical businesses such as advertising, search, and e-commerce to provide users with more accurate content recommendations; in the face of email, bullet screen contents, social media texts, and other massive texts, It can help companies/governments to obtain more accurate and concise user feedback and public opinion monitoring results. Human-machine voice and text interaction capabilities are enhanced, and applications such as virtual assistants, chatbots, machine translations, and transcriptions will be used by more people to guide individual work or to enrich daily entertainment.

Repetitive and standardized text content will be used for landing applications such as corporate sector financial reporting, government work reports, personalized advertising push messages, news alerts, and simple data analysis products.

According to "2018 Artificial Intelligence Investment Market Research Report", the current application rate of NLP in all AI technologies is 7%, which is lower than the five branches of computer vision, data mining, intelligent voice, machine learning and robotics. Under the broad application prospects, the importance of NLP after the breakthrough of the underlying technology is self-evident.
 

Major Progress

Google, a leading company, has built a powerful text corpora and solid foundation of NLP algorithm and applied them to every product line to optimize user experience: in machine learning, Google published the neural network translation system in 2016 and finalized the transformer machine translation network structure based on attention with a return of highly-improved accuracy in 2017. Same-year published Pixel Buds revealed Google’s strengths in integrating tech and product. In addition, Google has a lot of achievements in searching for the product’s application knowledge map. It is worth mentioning that the BERT model released by Google in 2018 achieved the best results in 11 language-based tasks, and opened up to the outside world at the end of the year, becoming a major event of the year. 

Microsoft's capabilities in NLP's various branch technologies have been widely used in Windows, Office, Bing, Xiaoice, etc. In 2018, Microsoft NLP also made impressive achievements. In January, its machine comprehension system tied as No.1 with Alibaba in SQuADdataset evaluation; in March, the newly-developed news translation system was tantamount to human level in terms of quality and accuracy; in December The "Microsoft D365 AI & MSR AI" model was launched in the month and performed well in 11 NLP tasks. 

Baidu is also promoting a complete set of NLP technology solutions to commercialization, optimizing existing products such as search, DuerOS, tec, and resolutely implementing open source strategies. In 2018, Baidu Brain released the "Language and Knowledge Technology Platform", aiming at the verticals of customer service and media content creation. In October, it announced the development of an instant machine translation system with predictive capabilities and controllable delays. The main domestic players in the area are iFlytek, Sogou, Tencent, etc. 

Transfer learning and the application of the encoder the main trends in the field of NLP. Breakthrough research such as ULMFiT, ELMo, and BERT has made it an important drive for technological development.

Future Trends and Potential Risks

The BERT model, which had received massive attention from the industry, received open source at the end of 2018. EO Intelligence believes that within one to two years, the change of the research paradigm of transfer learning and introduction of NLP will stimulate more innovative solutions in research and application fields. The ranking race of major companies in the field will also be intensified.

However, the voice recognition that has been criticized by users has clearly proved that the current NLP's ability to understand semantics and output of text can only support high-frequency scenes and the derivative value is little, and the construction of the general model is also almost a fairytale. At present, applications of NLP landing require more specific industry knowledge training and specific scenes and need to be equipped with suitable human-computer interaction design.

The industry's view is that the biggest challenge for NLP is the lack of training data. Training with millions or even billions of labelled data may significantly improve the model while relying on strong computing power. Therefore, compared with the top leaders like Google, Facebook, Baidu, etc. and the highly-innovative and groundbreaking unicorns who have a large amount of data, other latecomers are placed in an extremely high threshold.