KDCloudApps 2017 Abstracts


Full Papers
Paper Nr: 2
Title:

Election Vote Share Prediction using a Sentiment-based Fusion of Twitter Data with Google Trends and Online Polls

Authors:

Parnian Kassraie, Alireza Modirshanechi and Hamid K. Aghajan

Abstract: It is common to use online social content for analyzing political events. Twitter-based data by itself is not necessarily a representative sample of the society due to non-uniform participation. This fact should be noticed when predicting real-world events from social media trends. Moreover, each tweet may bare a positive or negative sentiment towards the subject, which needs to be taken into account. By gathering a large dataset of more than 370,000 tweets on 2016 US Elections and carefully validating the resulting key trends against Google Trends, a legitimate dataset is created. A Gaussian process regression model is used to predict the election outcome; we bring in the novel idea of estimating candidates’ vote shares instead of directly anticipating the winner of the election, as practiced in other approaches. Applying this method to the US 2016 Elections resulted in predicting Clinton’s majority in the popular vote at the beginning of the elections week with 1% error. The high variance in Trump supporters’ behavior reported elsewhere is reflected in the higher error rate of his vote share.

Paper Nr: 3
Title:

Assessment of Private Cloud Infrastructure Monitoring Tools - A Comparison of Ceilometer and Monasca

Authors:

Mario A. Gomez-Rodriguez, Víctor J. Sosa-Sosa and Jose L. Gonzalez-Compean

Abstract: Cloud monitoring tools are a popular solution for both cloud users and administrators of private cloud infrastructures. These tools provide information that can be useful for effective and efficient resource consumption management, supporting the decision-making process in scenarios where administrators must react rapidly to saturation and failure events. However, the estimation of the impact on host systems in a private cloud is not a trivial issue for administrators, specially when monitoring measurements are required in reduced periods of times. This paper presents a performance comparison between two free and well supported cloud monitoring tools called Ceilometer and Monasca, deployed on a private cloud infrastructure. This comparison is mainly focused on evaluating these tools ability to obtain monitoring information in short time intervals for early detection of resource constraints. The impact of resource consumption on the performance of host systems produced by both tools was analyzed and the evaluation revealed that Monasca produced a better performance than Ceilometer for evaluated scenarios and that, according to the learned lessons in this comparison, Monasca represents a suitable option for being integrated into an adaptive cloud resource management system.

Paper Nr: 4
Title:

Protecting Data in the Cloud: An Assessment of Practical Digital Envelopes from Attribute based Encryption

Authors:

Víctor J. Sosa-Sosa, Miguel Morales-Sandoval, Oscar Telles-Hurtado and José Luis González-Compeán

Abstract: Cloud storage services provide users with an effective and inexpensive mechanism to store and manage big data with anytime and anywhere availability. However, data owners face the risk of losing control over their data, which could be accessed by third non-authorized parties including the provider itself. Although conventional encryption could avoid data snooping, an access control problem arises and the data owner must implement the security mechanisms to store, manage and distribute the decryption keys. This paper presents a qualitative and quantitative evaluation of two Java implementations of security schemes called DET-ABE and AES4SeC. Both are based on the digital envelope technique and attribute based encryption, a non-conventional cryptography that ensures confidentiality and access control security services. The experimental evaluation was performed in a private cloud infrastructure where experiments for both implementations ran using the same platform, settings, underlying libraries, thus providing a more fair comparison. The quantitative evaluation revealed DET-ABE and AES4SeC have similar performance when applying low security levels (128-bit keys), whereas DET-ABE surpasses AES4SeC performance when medium (192-bit keys) and high (256-bit keys) security levels are required. Qualitative evaluation shows that AES4SeC also ensures authentication and integrity services, which are not supported by DET-ABE.

Paper Nr: 5
Title:

An Ontology for Representing Information over Social Service in an Educational Institution

Authors:

Mireya Tovar, Juan Carlos Flores and José A. Reyes-Ortiz

Abstract: In this paper, we present a method for constructing a manual ontology for the search of information over social service in a higher level education institution. We use some steps from methodology proposed by Grüninger and Fox. The ontology model will be useful to answer the questions of the students interested in the procedures of the social service. A scenario, competition questions, classes, and relationships are part of the design process. The answer to these questions leads us to the evaluation of the ontology. The ontology is made into Protégé and queries are written in the SPARQL query language.

Paper Nr: 7
Title:

Entity-based Opinion Mining from Spanish Tweets

Authors:

Fabián Paniagua-Reyes, José A. Reyes-Ortiz and Maricela Bravo

Abstract: Networking service has grown in the last years and therefore, users generate large amounts of data about entities, where they can express opinions about them. This paper presents an approach for opinion mining based on entities, which belong to banks, musicians and automobiles. Our approach uses machine learning techniques in order to classify Spanish tweets into three categories positives, negatives and neutral. A Support Vector Machine (SVM) and the bag of word model is used to obtain the corresponding class given a tweet. Our experimentation shows promising results and they validate that entity-based opinion mining is achievable.

Paper Nr: 8
Title:

Analysis of Brain Waves in Violent Images - Are Differences in Gender?

Authors:

Juan Andrés Martínez-Escobar, Silvia B. González-Brambila and Josué Figueroa-González

Abstract: We collected information using the Electroencephalograph (EEG) EmotivEpoc, and the software complement of the Eye Tracking system SMI RED250mobile. As a first step, it was stored in text files, the readings of each EEG sensor during the time the presentation of 5 violent images and 5 non-violent images were observed. The database was collected with 50 volunteers, consisting of 25 men and 25 women. The database was later loaded into R, for the execution of the algorithms of data mining, K-means, K-medoids, Hierarchical Clustering, Naive Bayes, Support Vector Machines, Adaboost and Decision trees. In the clustering methods, a random clustering was presented and with little information, with the Naive Bayes, SVM and Adaboost models, a classification with a high percentage of error was obtained using the Decision Trees method, we obtained one of the worst results, with the highest error rates in the classification performed with the test data of selected method. Based on the results obtained, no significant difference was found in the individual's gender, which affected his reaction when viewing images with violent and non-violent content.

Paper Nr: 9
Title:

Determining a New Home Classification - A Data Mining Approach

Authors:

Fidel López-Saca, José Castro-López, Josué Figueroa-González and Silvia Beatriz González-Brambila

Abstract: This paper presents a new home classification using a data mining approach and clustering algorithms. It focus in sociological characteristics. Data was obtained from a survey used in the research project "The Dwelling of Older Adults in the Central City" that is part of a larger research project entitled "Habitat and Centrality". This survey has 3,000 registers and 294 columns. From this, we selected 30 columns that were categorized in 4: gender, if at least one child exists, if a partner exists, and if there are one or more elder. Elder were 64 or more, following Mexican guidelines. Classification was performed with 6 clustering algorithms, and evaluated by silhouette and Dunn. The proposed classification is 10 clusters, that more adequately represent the type of families from a sociological point of view.