Skip to content

Bostjan Kaluza Posts

Machine Learning in Java

My second book on Data Science is focused on how to implement Machine Learning applications in Java by leveraging most popular libraries such as Apache Mahout, Weka, deeplearning4java, and others. It is scheduled to be published by the end of the year.

Machine Learning in Java will provide you with the techniques and tools you need to quickly gain insight from complex data. You will start by learning how to apply machine learning methods to a variety of common tasks including classification, prediction, forecasting, market basket analysis, and clustering.

Moving on, you will discover how to detect anomalies and fraud, and ways to perform activity recognition, image recognition, and text analysis. By the end of the book, you will explore related web resources and technologies that will help you take your learning to the next level.

By applying the most effective machine learning methods to real-world problems, you will gain hands-on experience that will transform the way you think about data.


E-Turist is a mobile application that provides a personalized sightseeing program. The applications aims to provide a tourist with an experience comparable to that offered by a professional tour guide, but tailored specifically to user. The tourist enters her interests, the available time and any special requirements she may have. Based on these, the application prepares a personalized sightseeing program using a recommender system. Afterwards, the application guides the tourist using the GPS, providing a multilingual description accompanied by photos. The description will be available on the mobile phone screen and via synthesized voice. The tourist may comment and rate each sight, which is then used by the recommendar system and tourism workers to improve their services. The mobile application is accmpanied by a web application through which tourism workers may enter information on sights of interest and track the activity of their visitors.

Classifying animal sounds

Together with a team at Department of intelligent systems we started a machine learning-based approach to recognize different types of animal species based on the sound they produce. Currently supported species are Bumblebees, Birds, And Frogs.


A. Gradisek et al.: How to Recognize Animal Species Based On Sound – A Case Study On Bumblebees, Birds, And Frogs (IS 2015).

In our approach we used Mel-Frequency Cepstrum Coefficients (MFCC) as a feature vector alongside hundreds of others audio features. Data was preprocessed using Adobe Audition software. Features were extracted using openAUDIO feature extraction tool. Classification algorithms were created using WEKA open source machine learning software. The approach was tested on three groups of animals: bumblebees, with the largest number of samples (11 species, with queens and workers both represented in most cases, 20 classes in total), Slovenian frogs (13 species), and different species of cuckoos (7 species). The recordings of bumblebee were obtained in the field, frog sounds were obtained from the CD Frogs and toads of Slovenia [9] produced by Slovenian Wildlife Sound Archive [10], and the sounds of the cuckoos were obtained from the Chinese database 鸟类网.

In order to make the sound recognition application available to broader audiences, we have developed a web-based service where users can, apart from using only the species classification feature, upload their recordings to be later used in the learning set for further improvement of the classification. The application is now available at It runs in Slovenian, English, and Chinese.

Recommender system for eco-friendly accommodations

As as part of EU project EcoDots we built a recommender system for eco-friendly accommodations at the website The system leverages Prediction.IO platform and Elasticserach to offer recommendations as a service.

G. Slapnicar et al.: Recommender System as a Service based on the Alternating Least Squares Algorithm (IS 2015)

In this paper, we describe a production-ready recommender system as a service for recommending eco-friendly tourist accommodations. It offers two main features: (1) it returns personalized recommendations for a user by creating a latent factor model through matrix factorization (Alternating Least Squares algorithm, ALS) and (2) it returns accommodations that are similar to a given accommodation by calculating content-based similarity using the Jaccard coefficient and the Euclidian distance. The system is evaluated on the collected data by using cross-validation and Precision@k as a performance measure. It achieves 19% Precision@k for personalized recommendations to a user based on his past interactions with accommodations. This score far surpasses a random recommender implementation that achieves 1% Precision@k.

Predictive API and Decision as a Service

The key challenge that existing predictive analytics software has not been able to solve is how to extract knowledge from data quickly and put it into the hands of data owners to make better, more informed decisions. Separating decision-making from application process logic is often beneficial, hence designing adaptive predictive models based on Software as a Service (SaaS) paradigm makes such technology more accessible.

Leveraging Prediction.IO machine learning stack built on top of Apache Spark, HBase, Spray, and Elastic Search, this study addresses the process of constructing and components of new prediction engines that are consumed over web API. The main challenges addressed in this study are concept drift, varying data type and distribution, and data stream mining.