KomIS 2017 Abstracts


Full Papers
Paper Nr: 1
Title:

A Collaborative Environment for Web Crawling and Web Data Analysis in ENEAGRID

Authors:

Giuseppe Santomauro, Giovanni Ponti, Fiorenzo Ambrosino, Giovanni Bracco, Antonio Colavincenzo, Matteo De Rosa, Agostino Funel, Dante Giammattei, Guido Guarnieri and Silvio Migliori

Abstract: In this document we provide an overview on the development and the integration in ENEAGRID of some tools able to download data from Web (Web Crawling), manage and display a large amount of data (Big Data), and extract from data relevant hidden information (Data Mining). We collected all these instruments inside the so called Web Crawling Project. Further, the corresponding environment, called Virtual Laboratory, is able to offer the possibility to use all these integrated tools remotely and by a simple graphical interface. A detailed description of the developed web application will be illustrated. Finally, some experimental results on the behaviour of the Web Crawling tool will be reported.

Paper Nr: 2
Title:

Big Data Visualization Tools: A Survey - The New Paradigms, Methodologies and Tools for Large Data Sets Visualization

Authors:

Enrico G. Caldarola and Antonio M. Rinaldi

Abstract: In the era of Big Data, a great attention deserves the visualization of large data sets. Among the main phases of the data management’s life cycle, i.e., storage, analytics and visualization, the last one is the most strategic since it is close to the human perspective. The huge mine of data becomes a gold mine only if tricky and wise analytics algorithms are executed over the data deluge and, at the same time, the analytic process results are visualized in an effective, efficient and why not impressive way. Not surprisingly, a plethora of tools and techniques have emerged in the last years for Big Data visualization, both as part of Data Management Systems or as software or plugins specifically devoted to the data visualization. Starting from these considerations, this paper provides a survey of the most used and spread visualization tools and techniques for large data sets, eventually presenting a synoptic of the main functional and non-functional characteristics of the surveyed tools.

Paper Nr: 3
Title:

An Open Source System for Big Data Warehousing

Authors:

Nunziato Cassavia, Elio Masciari and Domenico Saccà

Abstract: The pervasive diffusion of new data generation devices has recently caused the generation of massive data flows containing heterogeneous information generated at different rates and having different formats. These data are referred as \emph{Big Data} and require new storage and analysis approaches to be investigated for managing them. In this paper we will describe a system for dealing with massive big data stores. We defined an open source tool that exploits a NoSQL approach for data warehousing in order to offer user am intuitive way to easily query data that could be quite hard to be understood otherwise.

Paper Nr: 4
Title:

A Novel Influence Diffusion Model based on User Generated Content in Online Social Networks

Authors:

Flora Amato, Antonio Bosco, Vincenzo Moscato, Antonio Picariello and Giancarlo Sperlí

Abstract: Social Network Analysis has been introduced to study the properties of Online Social Networks for a wide range of real life applications. In this paper, we propose a novel methodology for solving the Influence Maximization problem, i.e. the problem of finding a small subset of actors in a social network that could maximize the spread of influence. In particular, we define a novel influence diffusion model that, learning recurrent user behaviours from past logs, estimates the probability that a given user can influence the other ones, basically exploiting user to content actions. A greedy maximization algorithm is then adopted to determine the final set of influentials in the network. Preliminary experimental results shows the goodness of the proposed approach, especially in terms of efficiency, and encourage future research in such direction.

Paper Nr: 5
Title:

Reputation Analysis towards Discovery

Authors:

Raffaele Palmieri, Vincenzo Orabona, Nadia Cinque, Stefano Tangorra and Donato Cappetta

Abstract: This work describes the development and the realization of an OSInt solution conducting a supplier risk assessment, focused on the evaluation of suppliers’ reputation starting from publicly available information. The main challenge is represented by the data processing phase that exploits NLP technologies to extract facts, events, and relations from unstructured sources, building the knowledge base for reputational analysis. Several measures have been adopted to provide a satisfactory user experience; however, further integrations are still needed to increase efficiency of the developed solution. Particularly, it is necessary to deepen and improve the analysis over the huge volume of data coming from open sources, enhancing the discovery of all possible relevant information influencing the reputation of the targeted entity.

Paper Nr: 6
Title:

The Challenge of using Map-reduce to Query Open Data

Authors:

Mauro Pelucchi, Giuseppe Psaila and Maurizio Toccu

Abstract: For transparency and democracy reasons, a few years ago Public Administrations started publishing data sets concerning public services and territories. These data sets are called open, because they are publicly available through many web sites. Due to the rapid growth of open data corpora, both in terms of number of corpora and in terms of open data sets available in each single corpus, the need for a centralized query engine arises, able to select single data items from within a mess of heterogeneous open data sets. We gave a first answer to this need in (Pelucchi et al., 2017), where we defined a technique for blindly querying a corpus of open data. In this paper, we face the challenge of implementing this technique on top of the Map-Reduce approach, the most famous solution to parallelize computational tasks in the Big Data world.

Paper Nr: 7
Title:

A Pipeline for Multimedia Twitter Analysis through Graph Databases: Preliminary Results

Authors:

Roberto Boselli, Mirko Cesarini, Fabio Mercorio, Mario Mezzanzanica and Alessandro Vaccarino

Abstract: Twitter is a microblogging service where users post not only short messages, but also images and other multimedia contents. Twitter can be used for analyzing people public discussions, as a huge amount of messages are continuously broadcasted by users. Analysis have usually focused on the textual part of messages, but the non-negligible number of images exchanged calls for specific attention. In this paper we describe how the tweet multimedia contents can be turned into a knowledge graph and then used for analyzing the messages sent during marketing campaigns. The information extraction and processing pipeline is built on top of off-theshelf APIs and products while the obtained knowledge is modelled through a Graph Database. The resulting knowledge graph was useful to explore and identify similarities among different marketing campaigns carried out using Twitter, providing some preliminary but promising results.

Paper Nr: 8
Title:

BotWheels: a Petri Net based Chatbot for Recommending Tires

Authors:

Francesco Colace, Massimo De Santo, Francesco Pascale, Saverio Lemma and Marco Lombardi

Abstract: Technological progress seems unstoppable: large companies are ready to implement more and more sophisticate solution to improve their productivity. The near future may be represented by so-called Chatbot, already present in the instant messaging platforms and destined to become more and more popular. This paper presents the realization of a prototype of a conversational workflow for a Chatbot in tires domain. The initial purpose has focused on the design of the specific model to manage communication and propose the most suitable tires for users. For this aim, it has been used the Petri Net. Finally, after the implementation of the designed model, experimental campaign was conducted in order to demonstrate its enforceability and efficiency.