RAW Labs to Participate in "SmartDataLake: Sustainable Data Lakes for Extreme-Scale Analytics" EU Initiative

Lausanne, September 7 - 2018.

RAW Labs SA has been awarded a contract to participate in the highly visible SmartDataLake project funded by the EU. RAW Labs will provide its NoDB Platform for the project. "This is an exciting moment for RAW Labs. Through this project we will be able to expose our unique NoDB technology to industry leaders and researchers", says Anastasia Ailamaki, Co-Founder and CEO of RAW Labs. 



"SmartDataLake: Sustainable Data Lakes for Extreme-Scale Analytics", abstract:

Data lakes are raw data ecosystems, where large amounts of diverse data are retained and coexist. They facilitate selfservice analytics for flexible, fast, ad hoc decision making. SmartDataLake enables extreme-scale analytics over sustainable big data lakes. It provides an adaptive, scalable and elastic data lake management system that offers: (a) data virtualization for abstracting and optimizing access and queries over heterogeneous data, (b) data synopses for approximate query answering and analytics to enable interactive response times, and (c) automated placement of data in different storage tiers based on data characteristics and access patterns to reduce costs. The data lake’s contents are modelled and organised as a heterogeneous information network, containing multiple types of entities and relations. Efficient and scalable algorithms are provided for: (a) similarity search and exploration for discovering relevant information, (b) entity resolution and ranking for identifying and selecting important and representative entities across sources, (c) link prediction and clustering for unveiling hidden associations and patterns among entities, and (d) change detection and incremental update of analysis results to enable faster analysis of new data. Finally, interactive and scalable visual analytics are provided to include and empower the data scientist in the knowledge extraction loop. This includes functionalities for: (a) visually exploring and tuning the space of features, models and parameters, and (b) enabling large-scale visualizations of spatial, temporal and network data. The results of the project are evaluated in real-world use cases from the business intelligence domain, including scenarios for portfolio recommendation, production planning and pricing, and investment decision making. SmartDataLake will foster innovation and enable European SMEs to capitalize on the value of their own data lakes.


Brief description of RAW Labs' role in the project:

RAW Labs SA has designed a software stack that permits efficient and scalable execution of analytic queries directly on raw data files (i.e., without pre-formatting and importing them in a database). Therefore, it has extensive experience in the following two areas: (a) integration of heterogeneous data into one data model, and (b) optimization of query execution against these data, in particular in the context of distributed storage/computing. Accordingly, RAW Labs SA will participate in WP2, in particular leading the efforts on distribution and elasticity. The presence of RAW Labs SA will bring in not only the strong technical expertise but also a robust codebase of its flagship product called RAW, a distributed query execution engine for raw data. Starting from a mature and robust query execution engine codebase will enable the project participants to focus on the innovative aspects of SmartDataLake. In addition, RAW Labs SA will participate in the pilot testing and will contribute to the activities for dissemination and exploitation, particularly within its network of clients and collaborators.