Text

~Introduction~

Natural Language Processing

　Natural language processing (NLP) is a technology that uses a computer to process words (natural language) that humans use on a daily basis.
A major theme of NLP is how to deal with ambiguity and imperfections in languages, such as Japanese, English, etc.

Machine learning on NLP

　Machine learning and deep learning, which are frequently used in the fields of audio and image processing, are now also used in the field of NLP. NLP with machine learning is expected to capture the potential meaning of sentences that are difficult with conventional rule-based processing. There are many research reports that the adoption of machine learning has resulted in higher accuracy in fields such as machine translation and information retrieval.

Word Embeddings

　When using machine learning or deep learning for NLP, vectorization of words in a sentence facilitates computer processing. One of the word vectorization technologies is “word embeddings”.
　Word2vec, one of the methods of word embeddings, is able to acquire a word vector expressing the meaning of a word by learning a large number of sentences.
　Word embeddings can also be found by calculating the relevance between words by calculating the cosine similarity between word vectors.

~Research themes~

Retrieval

　We are conducting research to improve the search accuracy and readability of results by performing additional processing on Internet search results. In other words, by using the summary information of the search results, processing such as morphological analysis and calculation of TF-IDF value is performed, and the feature vector of each Web page is obtained.
　By learning the feature vectors obtained using Neural Network such as LSTM (Long Short-Term Memory) and measuring the similarity (cosine similarity, etc.) between the obtained fixed-length expression and the search query, It achieves much higher accuracy than conventional rule-based retrieval systems. We are conducting research such as using Wikipedia and cooking recipes for learning data.

Machine translation

　Machine translation is a method of automatically converting sentences from one language to sentences in another language.
Since 2014, machine translation using a series conversion model as a framework is called Neural Machine Translation. Neural Machine Translation is used in many translation systems, including Google Translate, and they achieve high accuracy.
　We are conducting research using neural networks to perform neural machine translation between languages with insufficient bilingual data, such as minor languages.

Topic/keyword extraction

　Topic/keyword extraction is a technology that extracts topics/keywords from sentences. By extracting keywords and presenting them in the content, you can expect to promote an understanding of the content. Keywords can be automatically extracted from documents using information extracted using TF-IDF or morphological analysis. We are also studying mechanisms that determine and present the topic of a sentence.

Web analysis

　In today’s information society, it is commonplace to browse websites and collect information. Therefore, it is a very important theme that how to select and present what users want from a large amount of information existing all over the world.
　It is also common for individuals to freely send information on SNS such as Instagram or Twitter. By classifying and analyzing such information, it is possible to extract a lot of useful information.
　We are conducting research on this web search/analysis field.

Interactive system/Chatbot

　Starting with the appearance of Siri in the late 2000s, dialogue systems have attracted attention, and a number of studies on dialogue systems such as chatbots have been conducted.
In addition, not only conventional rule-based systems that define and interact with scenarios in advance but also systems that generate and select dialogue sentences using Deep Learning such as LSTM have become a hot topic. We are also studying these systems.