KomIS 2014 Abstracts


Full Papers
Paper Nr: 1
Title:

Information Extraction from Legacy Spreadsheet-based Information System - An Experience in the Automotive Context

Authors:

Domenico Amalfitano, Anna Rita Fasolino, Porfirio Tramontana, Vincenzo De Simone, Giancarlo Di Mare and Stefano Scala

Abstract: Nevertheless spreadsheets were originally designed for computing purposes and for commercial applications, they are often used in industry to implement Information Systems, thanks to the functionalities offered by integrated scripting languages and ad-hoc frameworks (e.g., Visual Basic for Applications). This technological solution allows the adoption of Rapid Application Development processes for the quickly development of Spreadsheets-based Information Systems, but the resulting systems are quite difficult to be maintained and very difficult to be migrated to other architectures such as Database-oriented Informative Systems or Web applications. In this paper we present an approach for reverse engineering the data model from an Excel spreadsheet-based information system. The approach exploits a set of heuristic rules that are automatically applied in a seven-steps process. The applicability of the process has been shown in an industrial context where it was used to obtain the UML class diagrams representing the conceptual data models of three spreadsheet-based information systems.

Paper Nr: 2
Title:

Are the Methodologies for Producing Linked Open Data Feasible for Public Administrations?

Authors:

Roberto Boselli, Mirko Cesarini, Fabio Mercorio and Mario Mezzanzanica

Abstract: Linked Open Data (LOD) enable the semantic interoperability of Public Administration (PA) information. Moreover, they allow citizens to reuse public information for creating new services and applications. Although there are many methodologies and guidelines to produce and publish LOD, the PAs still hardly understand and exploit LOD to improve their activities. In this paper we show the use of a set of best practices to support an Italian PA in producing LOD. We show the case of LOD production from existing open datasets related to public services. Together with the production of LOD we present the definition of a reference ontology, the Public Service Ontology, integrated with the datasets. During the application, we highlight and discuss some critical points we found in methodologies and technologies described in the literature, and we identify some potential improvements.

Paper Nr: 8
Title:

A Pattern-Oriented Approach for Supporting ETL Conceptual Modelling and Its YAWL-based Implementation

Authors:

Bruno Oliveira, Orlando Belo and Alfredo Cuzzocrea

Abstract: Modelling and implementing Data Warehouse populating processes (mainly known as ETL) involves in complex and challenge tasks that have been highlighted by numerous researchers in the field. Several workflow modelling languages, mainly used for supporting Business Processes modelling, such as BPMN, BPEL or UML AD, have been adapted and (sometimes) used to model ETL processes with the goal of successfully conceptual-modelling them and enabling their posteriorly executions. However, there is a large bridge still to be built between abstract models, which are typically used in initial phases of projects, and their subsequent implementation in real-world practical scenarios. Furthermore, ETL processes are strongly related to business requirements, which are often affected by business evolution and business operational systems. Hence, with the goal of minimizing such gaps, in this paper we propose the use of a pattern-oriented approach for supporting ETL conceptual modelling, where common tasks are identified and formalized in order to describe their potential behaviour, thus allowing for its application and reuse in different application scenarios.

Short Papers
Paper Nr: 3
Title:

Data Preparation for Tourist Data Big Data Warehousing

Authors:

Nunziato Cassavia, Pietro Dicosta, Elio Masciari and Domenico Saccà

Abstract: The pervasive diffusion of new generation devices like smart phones and tablets along with the widespread use of social networks causes the generation of massive data flows containing heterogeneous information generated at different rates and having different formats. These data are referred as Big Data and require new storage and analysis approaches to be investigated for managing them. In this paper we will describe a system for dealing with massive tourism flows that we exploited for the analysis of tourist behavior in Italy. We defined a framework that exploits a NoSQL approach for data management and map reduce for improving the analysis of the data gathered from different sources.

Paper Nr: 4
Title:

A Clustering-based Approach for a Finest Biological Model Generation Describing Visitor Behaviours in a Cultural Heritage Scenario

Authors:

Salvatore Cuomo, Pasquale De Michele, Giovanni Ponti and Maria Rosaria Posteraro

Abstract: We propose a biologically inspired mathematical model to simulate the personalized interactions of users with cultural heritage objects. The main idea is to measure the interests of a spectator w.r.t. an artwork by means of a model able to describe the behaviour dynamics. In this approach, the user is assimilated to a computational neuron, and its interests are deduced by counting potential spike trains, generated by external currents. The main novelty of our approach consists in resorting to clustering task to discover natural groups, which are used in the next step to verify the neuronal response and to tune the computational model. Preliminary experimental results, based on a phantom database and obtained from a real world scenario, are shown. To discuss the obtained results, we report a comparison between the cluster memberships and the spike generation; our approach resulted to perfectly model cluster assignment and spike emission.

Paper Nr: 5
Title:

A Method of Topic Detection for Great Volume of Data

Authors:

Flora Amato, Francesco Gargiulo, Antonino Mazzeo and Carlo Sansone

Abstract: Topics extraction has become increasingly important due to its effectiveness in many tasks, including information filtering, information retrieval and organization of document collections in digital libraries. The Topic Detection consists to find the most significant topics within a document corpus. In this paper we explore the adoption of a methodology of feature reduction to underline the most significant topics within a document corpus. We used an approach based on a clustering algorithm (X-means) over the t f −id f matrix calculated starting from the corpus, by which we describe the frequency of terms, represented by the columns, that occur in each document, represented by a row. To extract the topics, we build n binary problems, where n is the numbers of clusters produced by an unsupervised clustering approach and we operate a supervised feature selection over them considering the top features as the topic descriptors. We will show the results obtained on two different corpora. Both collections are expressed in Italian: the first collection consists of documents of the University of Naples Federico II, the second one consists in a collection of medical records.

Paper Nr: 6
Title:

A Semantic Content Management System for e-Gov Applications

Authors:

Donato Cappetta, Salvatore D'Elena, Vincenzo Moscato, Vincenzo Orabona, Raffaele Palmieri and Antonio Picariello

Abstract: In this paper, we describe a novel Semantic Content Management System (SCMS) able to handle multimedia contents of different kinds (e.g. texts and images) using the related semantics and capable of supporting e-gov applications in different scenarios. All the information is described using semantic metadata semiautomatically extracted from multimedia data, which enriches the browsing experience and enables semantic contents’ authoring and queries. To this aim, several Semantic Web technologies have been exploited : RDF/OWL for data modeling and representation, SPARQL as querying language, Multimedia Information Extraction techniques for content annotation, W3C standard models, vocabularies and micro-formats for resource description. In addition, we propose for entity annotation issue the LOD approach. As an application scenario of the platform, we report a system customization useful for managing the semantic matching between the required professional profiles by a Public Administration and the available skills in a set of curricula vitae with respect to a given call.

Paper Nr: 7
Title:

Discovering Expected Activities in Medical Context Scientific Databases

Authors:

Daniela D'Auria and Fabio Persia

Abstract: Reasoning with temporal data has attracted the attention of many researchers from different backgrounds including artificial intelligence, database management, computational linguistics and biomedical informatics. More specifically, activity detection is a very important problem in a wide variety of application domains such as video surveillance, cyber security, fault detection, but also clinical research. Thus, in this paper we present a prototype architecture designed and developed for activity detection in the medical context. In more detail, we first acquire data in real time from a cricothyrotomy simulator, when used by medical doctors, then we store the acquired data into a scientific database and finally we use an Activity Detection Engine for finding expected activities, corresponding to specific performances obtained by the medical doctors when using the simulator. Some preliminary experiments using real data show the approach efficiency and effectiveness. Eventually, we also received positive feedbacks by the medical personnel who used our prototype.