Webinar: A Dutch marine data lake based on the Beacon technology
Following the marine data lake developments with IHM and the interest in the underlying technology, MARIS is organizing a technical webinar on December 10 about the Beacon technology: an open-source solution to index, query and combine large amounts of measurement data at lightning speed.
The session is in line with the results we showed at the DigiShape day on 7 November , but mainly focuses on the underlying technology and the broad applicability for marine and inland water data. Interesting for data managers, developers, system architects and researchers who work with large, or many millions of datasets and data platforms.
Why this webinar?
Driven by the ever-growing demand for data from science and industry (from Notebooks, models, AI solutions and digital twins), more and more organizations are struggling to make fragmented measurement data accessible and searchable. These are available, but in millions of small files or in larger collections, resulting in high workflow investments and slow searches across large time series or spatial areas.
The Beacon technology makes it possible to make enormous amounts of observation data searchable at lightning speed, regardless of whether it concerns marine data, inland water data or datasets from European research programmes.
During this webinar, we will show:
- how the Beacon technology is structured;
- how Beacon works for indexing of datasets on both cloud and physical server and querying at scale;
- how you can work directly with multiple data lakes from Python notebooks and a simple user interface;
- how to bring together and analyze datasets from different sources (e.g., NL, UK, DE) as if it were one virtual source.
Program:
- Introduction and background (Peter Thijsse, Maris)
- Insight into the Beacon technology (Robin Kooyman, Maris)
- Examples of use in notebooks and Beacon studio (Tjerk Krijger, Maris)
There will be ample opportunity to ask questions.
For whom?
This webinar is intended for:
- data managers;
- software developers;
- system architects;
- researchers and consultants who work with large amounts of measurement data or build data platforms.
Policymakers are of course welcome to join, but the session is mainly technical in nature.
What can you expect?
For the real techies, we give an insight into the underlying technology, how Beacon handles millions of files in text, NetCDF, ZARR or other format, and how it supports ARCO, Apache Iceberg, SQL querying and other technologies. Using examples, we show how you can quickly filter by area, period and parameters, run quality checks and combine different data streams. We show applications such as trend analyses (e.g. occurrence of species in relation to temperature in 30 years) and fast data science workflows.
We also place the technology in the broader data architecture: How does it fit into the data distribution chain, how does the data lake relate to modeling, how does good standardization help to make optimal use of these types of solutions, and how can Beacon even help in the standardization of data?
Connection to DigiShape
Organizing and publishing data is an essential building block for everything we do within DigiShape. The Beacon technology has been developed in European programs and further developed within DigiShape in the Dutch context (together with IHM). It fits seamlessly with our ambition to develop open, reusable solutions that the entire community can build on.