SCANFOODLABEL - The development of a database of information on the labelling and packaging of food products on the Belgian market

Last updated on 2-9-2025 by Lieke Vervoort
Project duration:
October 25, 2024
-
October 25, 2025

In short

The SCANFOODLABEL project aims to build a comprehensive database of food products on the Belgian market, consolidating labelling information such as: ingredients, food additives (E-numbers), and nutritional values.

We collect data from online retailers for thousands of products, creating a valuable resource for the domains of food safety and nutrition.  To achieve this, we develop automated methods to clean and process large volumes of food data, including an AI-based approach to classify products into standardised categories used in food science and EU regulations.

By making this data accessible, this project helps researchers and policymakers monitor what’s on the market, identify potential health risks, and support evidence-based food policy.

 

Project description

Database of food products

The Belgian market offers a vast array of food products, whose labelling contains crucial information for food safety and nutrition policy. Under European regulations, much of this data is mandatory or voluntarily provided on product packaging. Some key elements include:

  • Ingredients (incl. food additives and their functions),
  • Nutrition declaration (i.e., calories, fibre, proteins, fats, sugars, …)
  • Allergens
  • Health claims
  • Expiration dates
  • Voluntary labels (e.g., Nutri-Score).

Despite its importance, there exists no comprehensive database consolidating this type information. In the SCANFOODLABEL project we aim to address this data gap by building a high-quality database of food labelling data for products on the Belgian market.

Data collection & analysis

Building on prior scientific projects, we have obtained data on a large number of food products by web scraping online retail platforms. In the current phase of the project, this dataset comprises almost 100,000 unique food products.

However, such raw data invariably contains inconsistencies and requires extensive cleaning and processing before it can be used for analysis. To further enhance the utility of the data, we develop a methodology to achieve the following objectives:

  • Generate structured data on food additives’ presence and function by inferring this information from the raw text representation of ingredients.  
  • Classify all food products according to standardised food classification systems.

Scalability & automation

While our initial dataset is substantial, the market of food products continuously evolves, often undergoing rapid changes. Therefore, ensuring the database remains up-to-date poses a significant challenge. Manual and semi-automated approaches to data cleaning and processing become infeasible when confronted with such an ever-increasing volume of data.

We address this challenge by developing automated processes that transform the raw data according to the requirements of our database. For this, we are investigating the use of artificial intelligence (AI). By employing machine learning techniques, including the application of pre-trained Large Language Models (LLMs), we can realise the practical utility of large datasets by avoiding the infeasible amount of work otherwise required for manual processing.

Expected outcomes

We will develop a dashboard as part of the project to enhance accessibility and facilitate data-driven decision-making. This tool will enable users to explore and analyse the data efficiently, supporting regulatory bodies, researchers, and policymakers. Key functionalities will include visualisation, automated analysis, and customisable queries, optimised for food policy and risk assessment purposes.

Our long-term objective is to further develop the techniques and processes investigated in this pilot project into a fully automated framework that includes systematic data acquisition, processing, and classification.

 

Service(s) working on this project

Associated Health Topics

QR code

QR code for this page URL