Five Minutes Technology Talk | AI Technology and "Internet Violence" Governance
Five Minutes Technology Talk | AI Technology and "Internet Violence" Governance
Part 01
What is "cyberbullying"?
"Internet violence" refers to the use of words, pictures, videos, etc. to slander and slander others on the Internet, damage the reputation rights and privacy rights of others, and bring mental pressure and trauma to the parties. It is an extension of social violence on the Internet. . Our most common online violence mainly appears on Weibo, videos, news information, and forums.
The causes of "cyber violence" are: firstly, the anonymity of the Internet, which protects personal privacy and at the same time allows infringers to express reckless remarks; secondly, some media use one-sided reports and deliberately distort the facts in order to Pursue traffic and attention Third, when public opinion is formed, individuals tend to tend to the direction of group values, ignoring the ability of self-rational thinking.
Part 02
Natural language processing technology (NLP) and "cyber violence"
Cyber violence on social media is mainly spread in the form of comments and barrage. For analyzing unstructured language data such as comments and barrage, the core AI technology applied is mainly natural language processing. Natural language processing technology is based on machine learning and deep learning methods, enabling machines to automatically learn language features, so that machines have the ability to understand human language. At present, this technology has been widely used in text classification, automatic summarization, question answering systems, machine translation, In terms of sentiment analysis, voice assistants that are common in real life, and the recent popular ChatGPT are all common applications of natural language processing technology. In terms of "cyber violence" governance, the following directions will also be involved:
Text entity extraction:
The object of "cyber violence" is usually a certain person or event, so we must first filter out comments on a certain cyber violence event from the massive comment data, which mainly involves the named entity recognition algorithm (NER). NER algorithms are mainly divided into rule-based methods, statistical methods, and deep learning methods.
Figure 1 Named entity recognition method
Text Sentiment Analysis:
Sentiment analysis can score a comment positively or negatively, and at the same time identify whether the semantics contain different kinds of emotional details, and can also intelligently extract keywords that have the greatest impact on the overall sentiment from the text. In this way, we can understand the emotional distribution of netizens behind tens of millions of comments, and even analyze the emotions of different groups on different events by time period, region, and gender, and timely control the negative and violent emotions of events. Unearth more potential cyberbullying behaviors.
Figure 2 Different emotion categories
The technical points involved are mainly text classification and polar word mining using machine learning (SVM, etc.) or deep learning (CNN). The overall process is shown in the figure:
Figure 3 Sentence-level sentiment analysis scheme
Text similarity analysis:
Analyzing the similarity of comments on the same event can help us discover the trend of public opinion in event comments. By analyzing the similarity of comments on different events, you can find comments that have something in common with the words or expressions of "Internet violence" users, and dig out the recent positive/negative public opinion of an event/someone. At present, there are mainly two deep learning paradigms for similarity analysis, as shown in the following figure:
Figure 4 Two paradigms of similarity analysis
The first paradigm first extracts the representation vector of the comment content through the deep neural network, and then calculates the similarity between the two through the simple distance function (Euclidean distance, etc.) of the representation vector. This method is usually used to extract the representation vector. Commonly used models of this type include DSSM, CNTN, etc.
The second paradigm is to extract the cross-features of the review content through the deep model, obtain the matching signal tensor, and then aggregate it into a similarity score.
Syntax/Lexical Analysis:
Through syntactic and lexical analysis, we can dig out the common syntax and lexical habits of a large number of "positive" comments and "Internet violence" comments, so as to summarize the common words and words used by "Internet violence" users in the current network environment, and Language characteristics used by different users when expressing the polarity of opinions.
Syntactic structure analysis is used to identify the subject-verb-object complement of sentences and analyze the relationship between components, generally based on deep learning RNN and LSTM sequence models.
The task of lexical analysis is to convert the input comment content string into a word sequence and mark the part of speech of each word. Sequence tagging technology is mainly used. The specific algorithm includes conditional random field (CRF), RNN+CRF, etc.
Figure 5 Lexical analysis example
Part 03
Summarize
The existence of "cyber violence" will not only directly endanger the rights and interests of victims, but also have a negative impact on network security and social harmony. China Mobile Smart Home Operation Center relies on its technology accumulation in deep learning, image recognition, natural language processing, OCR, etc., to launch content security protection products, which can protect images, texts, videos, audios, etc. Multi-dimensional content such as politics, gambling, image OCR, and face recognition are checked for security.
With the development of AI technology, Internet violence governance based on technical means will gradually play an important role. China Mobile's smart home operation center will continue to explore advanced technologies in this scenario, combine the industry's cutting-edge technologies to empower content ecological construction, actively respond to the "Qinglang" series of special actions of the State Cyberspace Administration of China, and contribute to the Qinglang network environment.