Abstract

information dissemination prediction based on Weibo has been a hot topic in recent years. In arrange to study this, people always extract features and function machine learning algorithms to do the prediction. But there are some disadvantages. Aiming at these deficiencies, we proposed a fresh feature, the addiction between the Weibos involved in geographic locations and localization of the user. We use elm to bode behaviors of users. An information dispersion prediction model has besides been proposed in this newspaper. experimental results show that our proposed new feature is real and effective, and the model we proposed can accurately predict the scale of information dispersion. It besides can be seen in the experimental results that the habit of ELM significantly reduces the time, and it has a better performance than the traditional method acting based on SVM .

1. Introduction

With the development of the web 2.0, social networks have become an essential function of people ’ south lives. Large social network web site like Facebook, Twitter, and so away brings a bunch of glad time to people. Sina Weibo, as one of China ’ s largest on-line social networks, has more than 500 million registered users. Every day these users produce a lot of social network data through continuously released and forwarded microblogging. These social network data researches help enterprises and government find the users network demeanor rules and make the represent measures. thus, the study of Weibo is a hot issue in late years. There are a draw of directions on the study of Weibo, including opinion analysis based on Weibo [ 1 ] and Weibo personalized recommendation research [ 2 ]. One high virtual rate focus of the researches in Weibo is studying on-line behavior of users and corresponding data propagation. This aspect of the study can help enterprises to understand the exploiter demeanor mode, grasp the user interest preference and recommend the pastime topics, other users, and groups to the user. It can besides help the government to understand the scope of the spread of news, pronounce social public public opinion direction and reactions, and adjust corresponding policies in time.

There are many researches about drug user behavior in on-line social networks and information dispersion exists. One of the common methods is extracting user demeanor characteristics and use machine learning algorithm to classify and predict user behavior [ 3 – 6 ]. In general, the researchers adopt support vector machine ( SVM ) algorithm. The features they wide use are the charm of exploiter, the closeness between the users, the concern similarity of exploiter, Weibo contented importance, and so forth. In life sentence, people are more implicated about the information around their side. This can besides be extended to Weibo. so if a Weibo involves the geographic location, the users who are near the location will pay more attention to the Weibo than users in other areas. Although there are a lot of social network applications which use the geographic stead, for example, Lingad et alabama. [ 7 ] studied the extraction of Weibo position related to the disaster, Hosseini et alabama. [ 8 ] studied placement oriented idiom signal detection in microblogs. But on the analysis of user on-line behavior and data dissemination, the addiction between the geographic locations in which Weibos are involved and location of the drug user has not been mentioned. therefore, on the footing of summarizing the study earlier, we take the colony between the geographic locations in which Weibos involved and placement of the user as a new sport to analyze user demeanor and information dissemination. At the same time, because extreme learning machine algorithm runs fast and can get the optimum solution quite than the sub-optimal solutions, we adopt ELM to replace SVM. The main contribution of this paper is shown as below. ( 1 ) We propose a new feature, the addiction between the geographic locations in which Weibos involved and location of the exploiter. We use this feature and other proposed feature to analyze user behavior and information dissemination. ( 2 ) We test the different performance between the different value of in ignore dataset and found that when is 30 minutes, the performance is the best. ( 3 ) We use elm rather of SVM to predict user demeanor and data dispersion. Our experimental results show that, with the new feature we proposed, we get a higher calculate accuracy than without the raw sport. Our experimental results besides show that ELM gets higher accuracy than SVM in the same dataset. The rest of this wallpaper is organized as follows. section 2 concisely introduces the related work about on-line social network and ELM. section 3 introduces the data and feature we use to predict user behavior and the information dissemination model. And the experimental results are reported in section 4. ultimately, we present our conclusions and future exploit in Section 5 .

2.1. Online Social Network

due to the popularity of social networks, there are many studies of social networks. For example, Marques and Serrão [ 9 ] proposed using rights management systems to improve the capacity privacy of sociable network users ; Quang et alabama. [ 10 ] found the cluster of actors in social network based on the subject of messages ; Tseng and Chen [ 11 ] proposed incremental SVM model to detect undesirable electronic mail, and sol on. Our main work in this paper is analyzing user behavior and information dissemination. There are a fortune of associate works of this expression. Song et alabama. [ 3 ] proposed 4 features to predict if exploiter will forward the Weibo or ignore it. The features are the authority of user, the bodily process of user, the predilection of drug user, and the social relations of exploiter. The four features can reflect the user behavior to a certain extent, but they did not consider the importance of Weibo message and the dependence between the geographic locations, which are involved Weibos, and locations of the drug user. Zaman et aluminum. adopted the model of collaborative filter based on probability [ 12, 13 ]. They select the user name, the number of attention, and number of words that Weibo contains to predict the forward demeanor of user. Although these features have some charm on drug user behavior and information dissemination, these features are not the main gene affecting the drug user ’ south demeanor. Cao et aluminum. [ 4 ] improved the prediction model, added the Weibo subject duration, Weibo importance, whether the drug user is authenticated user, and some early features. The lend features improved the prediction accuracy of drug user demeanor and information dissemination, but they still did not consider the relationship between Weibo mention position names and users. Some other works besides give us some help. For model, some people analyzed the run of information within the scope of the blog and made a prediction model of information transmission in [ 6 ]. Sina Weibo and the traditional web log have sealed similarities. We can draw lessons from the spread of the web log. Webberley et aluminum. [ 14 ] studied the transmit delay, the depth and width of information dissemination on Twitter. They preliminary studied drug user behavior patterns and forwarding rules and have sealed reference point significance. Some researchers have studied the influence of mentioned placement on information dissemination. For exercise, Bandari et alabama. [ 15 ] put forth an algorithm to predict whether the news is popular enough on Twitter or whether it can trigger a inflame discussion on social network sites. This paper puts advancing four features : article categories, the academic degree of objective, the article mentioned geographic identify and people name, and the sources of article. But the study lone gives the effect of the popular places to information dissemination, does not take the colony between the geographic name and users into account. In conclusion, we propose a new feature : the colony between the geographic locations, which are involved Weibos, and locations of the exploiter .

2.2. ELM

extreme Learning Machine ( ELM ) is put forward by Huang at Nanyang technical university in 2004 [ 16 ]. It is a more simple and effective algorithm of single hide layer feed ahead net ( SLFNs ) algorithm. It can automatically choose the remark weight and analyze decision output signal weight unit. It provides the best generalization ability and very fast learning amphetamine. Huang has proved in Extreme Learning Machine a New Learning Scheme of Feed forward Neural Networks, that under the same circumstance of the classification, ELM rate is much higher than the SVM. According to Professor Huang previous studies [ 17, 18 ], we summarize the ELM theory is as follows. For different samples, and. If the SLFNs has hidden nodes and its activation function is, then we get the formula as follows : In the recipe, is the weight unit vector which connects thursday hidden node with the stimulation vector. is the weight vector which connects thorium hidden node with the output signal vector. is the doorway of the thorium hide lymph node. is the inner product of and. The samples approximate to zero mean error, so we have ; then, we get the formula as follows : The above formula can be written into. then, the procedure of ELM can be mathematically modeled as the follow recipe : here, can be expressed as consequently, we get a solution for the parameter as where is the Moore-Penrose popularize inverse of matrix. Based on the above psychoanalysis, the machine learning-based algorithm without iterative tune can be divided into three steps. The specific process of ELM is summarized as follows. Step 1. Randomly arrogate stimulation weight and diagonal ,. Step 2. Calculate the hidden layer output matrix. Step 3. Calculate the output weight, where. Compared with SVM, ELM can be directly applied in many kinds of categorization problems. In professor Huang Extreme Learning Machine for Regression and Multiclass Classification discipline, he has proved that the SVM obtains suboptimal solution and needs higher computational complexity [ 19 ]. consequently, ELM has the advantages that SVM does not have and has a broad application candidate .

3. User Behavior and Information Dissemination Prediction

In this newspaper, we analyze people ’ randomness behavior and information dispersion on Weibo. First of all, we need to get the data from Sina Weibo. The behaviors of users in Sina Weibo are releasing, crop, commenting, and forwarding. Release and forward behaviors are associated with information dissemination. however, the publish behavior is decided by users self and we can not control it. sol our main study is forward behavior of users. In this section, we will introduce the datum and features we use and give the data dissemination prediction model we proposed. First of all, we give the dataset description .

3.1. Dataset Description

When we get the Sina Weibo data, inaugural of wholly, we choose one user and get its fans list. second according to the fans list, we get the fans list of each user in fans list. In this method acting, ultimately we get a drug user ’ sulfur dataset. We got 96438 users in this dataset. Sina Weibo users can be approximately divided into three categories : release active users, forward active users, and inactive users. If a drug user does not have forward or passing activeness in 1 calendar month, we think it is an inactive exploiter. Because the nonoperational users do not have any contribution to the exploiter behavior and information dispersion prediction, so we excluded these users. last we got 89377 users in the dataset. then, we crawl all Weibos of these users which published between May 1, 2014, and May 31, 2014, and get 564835 Weibos. In these Weibos, there are 114943 Weibos related to geographic locations. Most of the Sina Weibos are taiwanese Weibos, the geographic locations in them are taiwanese location. So the minor come of Weibos which contain foreign geographic locations are consider to have nothing to do with the geographic location. We select the datum from the whole Weibo dataset to build forth and ignore datasets. Because we can not see the ignore behavior immediately, we need to define the neglect dataset first. The definition of ignore dataset shown as follows. Definition 1 (ignore dataset). If user forwarded the Weibo published at time, the Weibos which published by the friends of the user at and are not forwarded by the exploiter are the ignore samples. All the ignore sample appoint ignore dataset. Users ignore the Weibos not only because users do not like them, but besides because they are leaving and not seeing the Weibos. So we selected 10 minutes, 30 minutes, 1 hour, 2 hours, and 12 hours as. We besides studied influence of unlike ignore datasets to the final examination accuracy. Algorithm 1 is used to find ignore dataset .

Inputs: Weibos set which published by the friends of the user ;
    Weibos set which user forward.
Output: Weibos set which user ignore.
(1)  Any Weibo, , read the publish time ;
(2)  Find Weibo,
(3)  while (the publish time of satisfy )
(4)    ; // is an intermediate variable.
(5)  While (,, )
(6)    Add to ;
(7)  Output ;

In ordering to facilitate our location keywords extraction, we established the province tree to identify the identify name. calculate 1 is the structure of the province corner .
As we can see in Figure 1, China, according to the position, is divided into east China, south China, central China, north China, northwest, southwest, and northeast. Each region contains some provinces, and each state contains a issue of cities. According to the state tree, we can identify the key discussion belong to which geographic locations. We can besides get the dependent situation of the key bible. In state corner, we only consider the city list, without regard to the stuff name. This is because, in China, different city may contain the lapp blocks mention. We can not be able to accurately determine the pulley belongs to which city. Our study is based on the above datum. In the following section, we will introduce the features we use and the corresponding evaluation exponent .

3.2. Feature Description

In this department, we will introduce the features we use. First of wholly, we will introduce the new have we proposed. And then we will introduce other features we use .

3.2.1. The Dependency between the Weibos Involved Geographical Locations and Location of the User

The Weibo involved geographic locations have been proposed before. however, they only concern whether the location name is celebrated and do not connect it with the locations of users. As the government starts carrying out internet political communication on Weibo, this connection becomes more and more crucial. information published by the local government is probable to be paid attention to in the local anesthetic and surrounding areas. The further area users will give less attention to it. We use Peking University PKUVIS Weibo ocular analysis tools [ 20 ] to analyze 150 Weibos and one of it is shown as follows : # 毛絮是虫子 # 【 南京满天飞的 “ 毛絮 ” 竟是长着白毛的虫子 ! 】 @ 现代快报 : 这两天, 南京一些地方飘着柳絮一样的东西, 漫天飞舞。 南京林业大学森环院的专家发现, 其实它们根本不是柳絮, 而是活物小虫子 ! ! 叫 “ 榆四脉绵蚜 ” ! 今年的气候有利于它们的繁殖, 所以数量非常多 ! 我整个人都不好了 ! In this Weibo, we can extract the localization list Nanjing. According to the province tree, it belongs to Jiangsu state. We guess the users in Jiangsu may have gamey attention in this Weibo. The users far from Jiangsu may pay less attention. So we count users total in every province who forward this Weibo. According to the province field of the data, we obtained the state of these Weibos users. Sina Weibo habit code to represent the provinces and cities. table 1 shows the provinces and its corresponding code. For convenience, in the adopt figure, we all use the state codes in table 2 to represent the province. figure 2 shows the number of users in every state .

Provinces Beijing Tianjin Hebei Shanxi Inner Mongolia Liaoning Jilin
Code 11 12 13 14 15 21 22
Provinces Heilongjiang Shanghai Jiangsu Zhejiang Anhui Fujian Jiangxi
Code 23 31 32 33 34 35 36
Provinces Shandong Henan Hubei Hunan Guangdong Guangxi Hainan
Code 37 41 42 43 44 45 46
Provinces Chongqing Sichuan Guizhou Yunnan Tibet Shaanxi Gansu
Code 50 51 52 53 54 61 62
Provinces Qinghai Ningxia Sinkiang
Code 63 64 65
Quantity
15 minutes 72996
30 minutes 119392
1 hour 188376
2 hours 307483
12 hours 431645


We can see in Figure 2, the local users in Jiangsu pay the most care to the Weibo. The locations which near Jiangsu besides pay much attention to it ( like Anhui, Shanghai, Zhejiang, and Shandong ). According to the theory of probability, to other provinces and cities, the forward quantity share should have the lapp regularity with the read users ’ percentage in each province. It is intemperate to get the registered users ’ share. But in Figure 1 we can see the economically develop provinces, such as Beijing and Guangzhou, have higher fore phone number than some underdevelop areas like Sinkiang and Ningxia. We guess this is because people in developed cities occupy more network resources and can easily get the web site, so the users in develop cities may be larger than underdeveloped city. The other Weibos besides have this rule. To represent the cities ’ development, we found the per caput GDP in each state in 2013. Forward number and per caput GDP are not in the same magnitude. So we normalized these data. figure 3 shows the normalize fore number. figure 4 shows the normalize per head GDP.



In Figures 3 and 4 we can see, in addition to geographic placement mentioned in the Weibo, the ahead quantity and the per caput GDP in early province are in the lapp regularity. For exercise, in Beijing, Guangdong and early regions, two figures both have a local vertex. The geographic placement mentioned in the Weibo makes this feature not obvious. This foster proves that the geographic location mentioned in the Weibo has a stronger influence on the users who are airless to it. All the Weibos we tested have this conclusion. So we use the per head GDP to represent the registered users ’ share. And then the per caput GDP can represent the possibility of users forwarding. When the province is the geographic locations involved Weibos, we add 0.5 to the per head GDP, which means this geographic location plays a overriding function in the forward behavior. The final examination value represents the dependence between the Weibos involved geographic locations and locations of the users. Besides the new feature we put forward, early features are wide applied to the user behavior psychoanalysis and Information Propagation Dissemination. Researchers in [ 3 ] selected 4 features to judge drug user forward behavior. The features are The User ’ randomness Authority, User ’ second Activity, User ’ s Preference, and User ’ s Social Relations. however, the exploiter ’ mho authority is relevant to the user ’ s ahead behavior, but the correlation is weak. Researchers in [ 4 ] selected 15 features. But some features are covered by other features. For exemplar, when we compute the PageRank, the exploiter sports fan numbers are used. This kind of features is useless and should not be used in the user forward behavior prediction. Another research RT to Win ! Predicting Message Propagation in Twitter [ 21 ] divided features into two categories. There are 7 social features ( i.e., number of followers, friends, statuses, favorites, number of times the exploiter was listed, is the drug user verified, is the drug user ’ sulfur linguistic process English ) and 7 tweet features ( i.e., number of followers, friends, statuses, favorites, issue of times the exploiter was listed, is the exploiter verified, is the exploiter ’ mho linguistic process English ). To summarize the features in the above and early papers, we selected 5 features to forecast user forwarding behavior. They are the influence of user, drug user free action, and fore action, the affair between the users, the interest similarity between user and contented or between users and Weibo content importance. The succeed is these features in detail .

3.2.2. The Influence of User

People always use PageRank to compute the influence of exploiter [ 22 ]. The PageRank algorithm is used to measure the importance of specific pages relative to other pages in the search engine. The PageRank formula they use is shown as In this rule, represent the PageRank respect of user, represents the fans list of drug user, represents the collection of users that user pays attention to, is the damping coefficient, and is the total numeral of users .

3.2.3. User Release Activity and Forward Activity

Because of the different behaviors of the exploiter, the user activeness can be divided into two aspects, the user release action and ahead natural process. The user release activity is the Weibo number published over a period of time. We can use recipe ( 7 ) to compute it : The in rule ( 7 ) represents the Weibo number published over a period of meter, is the full number of Weibo, is the whole time. In general, we set to 1 day. The forth activity is share of users forwarding Weibo report for all published Weibo in one day. We use rule ( 8 ) to compute it : is the act of users forwarding Weibo in thursday day, is the number of users releasing Weibo in thorium day, and represents the forward activity. The higher the is, the more active the users are. Users with high forward frequency play a bigger character in information dispersion .

3.2.4. The Intimacy between the Users

Because the forward demeanor in Weibo can reflect the interaction between the users better, we compute the closeness between the users by calculating the percentage of Weibo published by the upstream drug user in the forwarding Weibo of the drug user. The formula we use is In this rule, represents the number of the Weibos of user which appears in the ahead Weibo of user. represents the total number of fore Weibo of drug user .

3.2.5. The Interest Similarity between User and Content or between Users

Weibo can reflect the interests of users. The larger the matter to similarity between user and content, the greater the opportunity user forward. The larger the interest similarity between user and upriver exploiter, the greater the find exploiter forward. So we need to compute the pastime similarity. Because the user ’ south sake is the change over time, we need to analyze the Weibo which exhaust time near a few days. Interest space is extracted from weibo, and the following is the process of compare. ( 1 ) Collect user interest. We select a exploiter and collect the user Weibo published about five days. These form the drug user interest space. is the interest space of user and is the thursday Weibo of drug user. ( 2 ) participle. For Weibo in Chinese, we use the chinese Academy of Sciences Chinese lexical analysis organization ICTCLAS do the bible division [ 22 ]. For Weibo in English, we use space. We get the words level pastime space. is the thursday parole. ( 3 ) Remove the stop. We remove the stop word and get the new words degree interest space. ( 4 ) Repeat ( 2 ) and ( 3 ) ; we get the interest space of user. ( 5 ) For the two users and, we calculate the similarity of and. For the exploiter and the subject, we calculate the similarity of and. We use Jaccard formula to calculate the similarity [ 23 ]. The Jaccard recipe is

3.2.6. Weibo Content Importance

normally if a Weibo contains significant events or popular information, the forward rate will be eminent. So the importance of Weibo message can help us analyze Weibo information dispersion. Based on computing weight of TF-IDF ( term frequency inversed text file frequency ) algorithm on the text classification field, we calculate the importance of Weibo [ 24 ]. The think of this algorithm is that in a specific document the higher the frequency of word appears in the document, the more important the word is ; the lower the frequency of word appears in other document, the more authoritative the password is. We can use formula ( 11 ) to calculate the importance : In this convention, represents the word in the Weibo, represents the number of appearing in, represents the number of Weibo that Weibo set contains, and represents the number of Weibos containing in the Weibo set. The TF-IDF of Weibo can be computed by adding the TF-IDF of all the word in :

3.3. Information Dissemination Prediction Model

According to the features in Section 3.2, we use elm to forecast the exploiter forth demeanor. According to the predicted forward behavior, we forecast information dispersion scale. The forward behaviors in Weibo can be divided into 3 aspects : direct fans forwarding, collateral fans forwarding, and not fans forwarding. We count the each percentage of 3 forward behaviors in different scales of Weibo. number 5 shows each percentage of 3 forward behaviors from the size of 100 to the size of 1500 .
We can see from Figure 5 that advancing behaviors are chiefly composed of aim fans and indirect fans. The percentage of not sports fan users is about 0. So we ignore the advancing behaviors of not winnow users. When we make the prediction, we start from Weibo publishers. And then we traverse its list of fans and predict if the fan will forward the Weibo. If the fan forwards it, the forward number increases 1. then, traverse the fans list of this drug user. We repeat iteration like this until no users forward the Weibo. The prediction model can be represented by a tree. digit 6 is a elementary example of prediction model corner .
The grey point in Figure 6 is the publisher of Weibo. The black points are the users who will forward the Weibo. The white points are the users who will not forward the Weibo. When we make the prediction, we start from the exploiter and traverse its fans list. We find the fans list contains 3 users :, , and, and the is the forward point. frankincense, the forward number increases 1 and we traverse the fans number of. The fans list of contains 2 points, and. is not the forwarding compass point, but is the forward point. So the forward number increases 1 and we traverse the fans tilt of. The points in fans list of are and, and both of them are not the forward points. So we come to. The handling of is alike to the above, and in this method, we finally got all the forwarding nodes. We use Algorithm 2 to build the information dissemination prediction model. In this algorithm, we assume that each drug user forwards the Weibo once and the publisher will not forward the Weibo .

Input: Weibo publisher .
Output: Forwarded number .
(1) ;
(2) Read the fans list of ( is the number of fans, ≥ )
(3) for ( = 0; < ; ++)
(4)   if(the user is predicted as forwarding uesr)
(5)     ;
(6)     Information_forecast();

4. Experiments and Results

In this section, the bode operation is evaluated by using ELM. In addition, we compared the results between ELM and SVM based on adding the fresh feature we proposed and do not use the new feature. We besides test the proposed information propagation prediction exemplary and give it operation in this section .

4.1. Users Behavior Prediction

According to the datum we crawl from Sina Weibo, we select 133190 forward data as the forward sample. According to Section 3.1, the numbers of each ignore sample are shown in table 2. We use elm to forecast ahead or ignore behavior of users. The reference code of ELM can be obtained from the web site ( ELM Source Codes : ELM Source Codes : hypertext transfer protocol : //www.ntu.edu.sg/home/egbhuang/ ). We besides compare the results between ELM and SVM. The instrument of lib-SVM is used in this wallpaper, which can be obtained from the web site ( data sic : hypertext transfer protocol : //www.csie.ntu.edu.tw/~cjlin/libsvm/ ). In order to evaluate the effect of prognosis model, we choose the evaluation index of information retrieval, including accuracy, recall, and the value of. With 10 times of hybrid establishment method validation algorithm, we get the drug user forward behavior prediction results shown in Tables 3 and 4. postpone 3 shows the performance using ELM and Table 4 shows the performance using SVM .

Accuracy Recall F1-score Time (s)
15 minutes 0.861 0.865 0.863 0.0312
30 minutes 0.878 0.882 0.88 0.0312
1 hour 0.864 0.873 0.868 0.0312
2 hours 0.852 0.865 0.858 0.0574
12 hours 0.729 0.706 0.717 0.0621
Accuracy Recall F1-score Time (s)
15 minutes 0.867 0.875 0.871 0.0983
30 minutes 0.869 0.88 0.874 0.0983
1 hour 0.855 0.862 0.858 0.0983
2 hours 0.846 0.861 0.853 0.1492
12 hours 0.749 0.743 0.745 0.2094

If we compare Tables 3 and 4, we can find ELM has a better performance than SVM. No matter what algorithm we used, when we take as 30 minutes, we get the best performance. Because 15 minutes is excessively short, some people may not have had time to release or forward Weibo. People will not spend much time in browsing Weibo once. sol when the is taken as 2 hours, the performance is much lower than 30 minutes. We can besides see that the performance of 12 hours is the lowest. This means people ignore Weibos not lone because they do not like it, but besides because they are not on-line. When is besides long, the absent demeanor plays a dominant allele function. At the same time, in order to consider the time factor, we besides measured the hunt time of ELM and SVM. And the meter of ELM is far lower than the SVM. In order to test the potency of the feature we proposed, we besides test the performance without the new feature. Tables 5 and 6 show the bode results without the feature we proposed .

Accuracy Recall F1-score
15 minutes 0.854 0.86 0.857
30 minutes 0.858 0.869 0.863
1 hour 0.847 0.866 0.856
2 hours 0.836 0.858 0.847
12 hours 0.702 0.736 0.719
Accuracy Recall F1-score
15 minutes 0.849 0.852 0.850
30 minutes 0.856 0.859 0.857
1 hour 0.842 0.854 0.848
2 hours 0.828 0.84 0.834
12 hours 0.692 0.726 0.709

We can see Tables 5 and 6 besides have the lapp stopping point with Tables 3 and 4, in which 30 minutes has the highest performance. indeed when we do the data propagation prediction, we choose the dataset whose is taken as 30 minutes. By comparing the Tables 3 and 5, we find using the newfangled feature we proposed has a better performance than without the newfangled sport. This can besides be found by comparing Tables 4 and 6. In order to give a more intuitive description of this conclusion, we draw the figures to show the details. And Figures 7 and 8 picture comparison charts of using ELM and using SVM .

As can be seen from the Figures 7 and 8, when using the addiction between the Weibos involved geographic locations and location of the user feature, the prediction results are better than without the feature. To give a more intuitive description of the comparison, we besides show the operation of ELM and SVM in a figure. Because 30 minutes has the best performance in both algorithm, we lone compare the performance in this case. calculate 9 shows the comparison between ELM and SVM .
We can see in both cases that the bode results obtained by ELM are higher than the SVM prediction results. This proves that using ELM algorithm is better than using SVM algorithm. ELM algorithm has well operation. We can besides see the fresh feature brings better performance .

4.2. Information Propagation Prediction

According to the algorithm in Section 3.3 of and prediction results of ELM, we predict the scale of the Information generation. We choose 30000 original Weibos of 15375 users to verify our model. We count average exploiter forward measure proportion in every jump from the initial exhaust users ( leap : the shortest distance from users to the initial release drug user ). figure 10 shows the average of users ’ forward percentage in each jump .
It can be seen from Figure 10 that after 5 leap out, the percentage approach to 0. This proves that in our dataset all the forward behaviors happen at the first gear five jumps. This proves that the Weibo is a wide bedspread but abstruse low social network. Based on this theory, our data propagation prediction stops at the one-fifth jump. This can avoid the excessive iteration. digit 11 shows the accuracy we predict in every jump. The accuracy of jump 1 is the accuracy of the first ahead layer of 30000 Weibos. Others are the same .
We can see in Figure 11 that the accuracy of the first chute is the highest. Accuracy reduces with the increase of the jump count. This is because when we do the prediction, the error is constantly accretion. When the jump comes to 5, the mistake has been accumulated to a considerable scale. So the accuracy becomes identical first gear. In order to determine the plate of information dissemination, we divide the scale according to the 10 n regulate of magnitudes. If the information dissemination scale we predicted is in the lapp order of magnitude which is the actual information dissemination scale, we can say the prediction is right. We calculated the average predict data dispersion scale accuracy of 30000 Weibos. trope 12 shows the average prediction accuracy of each Weibo .
As can be seen from Figure 12, for unlike Weibo from different users, our algorithm accuracy is about 70 %. This is because for the Weibos whose forth deep close to 5 or more than 5 jumps, the error of our model has been accumulated to a considerable scale and brings decline in accuracy. For the selected datum, our predicting resultant role is very stable. This proves that our algorithm is actual and effective .

5. Conclusions

Online behavior of Weibo users and data dissemination analysis is a hot issue nowadays. In this newspaper, we analyzed the features of Sina Weibo user behavior and predicted the information infection. We proposed 8 features to analyze exploiter demeanor. They are the colony between the Weibos involved geographic locations and placement of the drug user, the influence of user, drug user exhaust action, drug user forward activeness, the closeness between the users, the interest similarity between drug user and contented, the concern similarity between users, and Weibo subject importance. The feature ( i.e., the addiction between the Weibos involved geographic locations and locations of the users ) is the fresh feature we proposed. We used ELM to analyze if users will forward or ignore a weibo. Our experiment results show that the feature we proposed is very effective and ELM gets better results than SVM. We besides test the different performance between the unlike values of in ignore dataset. We found that when is 30 minutes, the performance is the best. So we use the 30 minutes ignore dataset to build the educate set. Based on that, we proposed information propagation prediction model and calculate the plate of the information propagation. The experiment results show that our mannequin has a good performance. The features and exemplary we proposed in this paper can give some aid to businesses and government. They can use our mannequin to predict the scale of the information propagation before they publish it. If the scale is belittled, they can use our feature of speech to adjust the information textbook. The model and features has very high practical value.

Read more: Wikipedia

however, there is still something we need to improve in this paper. For model, when considering information dissemination size, we do not concern users forward their own Weibo and people may forward the Weibo many times. We will take it into consideration in the future .

Conflict of Interests

The researchers claim no conflict of interests .

Acknowledgments

This inquiry was partially supported by the National Natural Science Foundation of China under Grant no. 61332006 and 61100022, the National Basic Research Program of China under Grant no. 2011CB302200-G, and the 863 Program under Grant no. 2012AA011004 .