Proof of Concept – setting up Dutch marine data lake with Beacon software
Beacon makes it possible to query (and find) a large number of datasets, is easy to install and provides the possibility to obtain exactly the specific subsets that are needed to feed a model or perform calculations in a Jupyter (Python) notebook.
Beacon ensures that the following question is answered in tenths of a second from various data sources. For example: “Give me all temperature data in the North Sea, from 2010 to 2020, at a depth of 0-10 meters, in degrees Celsius.” The answer from the API is then one NetCDF file that contains exactly the requested data, harmonized by parameter and unit, generated on-the-fly, immediately usable for visualization, in a Jupyter Notebook or for processing in a model.
Suitable for different domains
Although Beacon has its origins in marine projects, the technology is not domain-specific and suitable for every application where large collections of files in different formats must be quickly and uniformly questionable. Think of hydrological time series, monitoring data from rivers and lakes, or climate data that are spread over large numbers of stocks.
Background
Information House Marien is involved in the disclosure of Dutch marine North Sea Data from multiple parties, so that they are easy to find and can be reused. At the moment, a lot of data is already available and insightful from metadata, and can be retrieved as entire datasets, but this data cannot yet be queried integrally. As a result, the data is no longer easily usable in models, Notebooks, and Virtual Research Environments, which researchers and engineers are increasingly using.
Proof of Concept
However, the technology is in place to improve access to data. In this context, MARIS, as a Proof of Concept (PoC), together with IHM and participation from Deltares, WMR and RWS, has carried out a successful pilot to investigate whether the Open-Source Beacon software, developed by MARIS, is sufficient for setting up a Dutch marine data lake and thus greatly increases the reusability of the data.
Various sources with large amounts of data were analyzed and then Beacon agencies were set up for this purpose. On top of these instances, Python notebooks have been created that query the instances and show that the data from the various sources can be retrieved and visualized very quickly, and harmonized by parameter and unit. The success of the PoC offered opportunities to further expand the data lake, grow functionalities, and stimulate usage. The follow-up assignment consisted of:
- Development of a (simple) user interface on top of Beacon, called Beacon Studio, so that users without Python knowledge can also search the data sources.
- A feasibility study on whether and how Beacon can be used to deliver NL marine (project) data according to EMODnet standards.
Jupyter notebooks
In the previous PoC, several Dutch marine data sources were investigated and the following six sources were ultimately chosen as input for the IHM-specific Beacon instance that can be queried and visualized via a notebook and the new Beacon Studio:
- WADAR
- Aquadesk
- RWS CTD collection
- VIS monitoring
- Bird monitoring
- WOT data WMR
The IHM Beacon instance is described here: https://beacon-ihm.maris.nl/swagger/. In preparation for the development of the Beacon Studio, “easy” tables have been created for the WADAR, RWS-CTD and Aquadesk with the most important (meta)data columns and also aggregations and mappings have been applied to make the collections as searchable as possible for the user. In the IHM_beacon.ipynb notebook that can be found on the GitHub, there is an example for each of the above collections, how they can be queried via the Beacon Python library.
The figure below shows how at the end of this notebook the data points of the different sources can be plotted together on one map.

Beacon Studio
As an extension of the PoC, a user interface has been developed on Beacon, called the “Beacon Studio”, with which the IHM Beacon instance can easily be queried via a form, and results can be viewed and visualized.
- Beacon studio web interface: https://beacon-ihm.maris.nl/studio/
- Beacon studio GitHub: https://github.com/maris-development/beacon-studio
A Beacon instance can handle a very complex data and metadata model and thus builds a “flattened” version in tabular form, from which data can be retrieved quickly and efficiently. To query the data, it is often not necessary to search on all columns: Most users make a search query by location, time period, depth and parameter(s). The “easy query builder” searches only for these core attributes (which the administrator of the Beacon instance has specified) for composing and executing queries.

After entering the query, the user has the choice at the bottom of the screen between: Download Dataset; Copy the query in JSON format so that it can be used in a notebook; view data table; map-view, the data points with real values of the parameters are shown directly on the map; chart-view, the user can gain insight into the content of the data and metadata that has been requested in this module (e.g. a division by category).
Sources
- Beacon is available for free under Open-Source license:
- Documentation:
- GitHub:
- Beacon Studio (user interface for IHM POC):
- Python library:
Call to the network
Organizations that work with large observation collections and want to explore whether Beacon can help with their workflow are invited to get in touch. We are happy to collect practical examples, experiences and questions that can contribute to the further development of the technology and its application in the Dutch context.