Header	Data with Icon

DaFaB Summer School on Earth Observation and AI: September 22-25 2025, in Ljubljana, in the beautiful Slovenia

Applying AI to Earth Observation

4 days of lectures and hands-on sessions

Join us for a summer school dedicated to AI applied to Earth Observation data, in the beautiful city of Ljubljana, Slovenia.

This summer school offers a unique opportunity to delve into the cutting-edge intersection of Artificial Intelligence (AI) and Earth Observation (EO), combined with the essential skills for managing large-scale data and workflows in modern computing environments. Participants will gain theoretical knowledge and practical experience in applying AI techniques to analyze EO data, optimizing AI performance, managing complex workflows with Kubernetes, and handling massive datasets

DaFab is a project funded by ESA under the grant agreement 101128693 — HORIZON-EUSPA-2022-SPACE

DaFab summer School September 22-24, 2025

DaFab is proud to be supported by the AI for SCIENCE 2025 for its summer school on Earth Observation data and AI. During 4 days we will cover topic such as AI models for satellite images, Kubernetes workflow, metadata generation and management and performance analysis.

During 4 days, morning will be devoted to lectures and lessons and afternoon to hands-on session provided by the DaFab Consortium.

Important dates

Registration June 1st until September 10th, 2025
Registration has to be made through the AI for SCIENCE 2025 registration page
Summer School will be held from from September 22th to 25th
For any question please feel free to reach info@dafab-ai.eu

12 lecturers from all over Europe

Lecturers selected for the Summer School are coming from prestigious organization all over across Europe, coming from both Academia and the industry. Don't miss the opportunity to exchange with seasoned professional.

4 days with 2 sessions

The program is organized around 4 days: one dedicated to AI and EO, the second to AI and Performance, the third day to Workflow management and the last day to Earth Observation and data management.

Access to world class supercomputers

Hanson session will provide access on leadership class supercomputers in order to test, try and learn on reals systems.

Program

DaFab summer school is organised in 4 days, with a 3h lectures in the morning and a 2h session of coding and hands-on during the afternoon.

Monday Sep. 22, AI and EO

Tuesday Sep. 23, AI and Performance

Wednesday Sep. 24, Workflows

Thursday Sep. 25, EO and Data Management

9h00-12h00 AI and Earth Observation

9h00 Plenary Session

Earth Observation Remote Sensing Introduction

Michelle Aubrun, Thales Alenia Space

Michelle Aubrun received the Engineering degree in civil engineering and geomatics in 2013 and the Ph.D. degree in geography on SAR satellite image processing from the University of Montreal, Montreal, Canada, in 2019. She joined Thales Alenia Space in the beginning of 2018. Since then, she has been involved in several projects concerning image processing by deep learning approaches for remote sensing applications. Since 2020, she has also been a Researcher with the French Research Institute of Technology Saint Exupery, Toulouse, France. Her research interests include image representation learning with self-supervised approaches.

What is Earth Observation remote sensing? What are the key characteristics of EO data? What are its main fields of application? We will also introduce a concrete EO program called Copernicus.

9h45 Plenary Session

AI introduction in Remote Sensing

Alena Bakhorina, GCore

Evolution of AI technologies. Era of Foundation Models. Foundation Models in geoscience & datasets used for training. Popular Python frameworks for working with Earth Observation data. European projects in EO and AI.

10h30

30 minute break

11h00 Plenary Session

Earth Observation data access tools

Alena Bakhorina, GCore

Introduction to STAC (SpatioTemporal Asset Catalogs) specification for discovering geospatial information. Examples of usage Python libraries for efficient data processing.

11h45 Plenary Session

Thales Alenia Space AI applications

Michelle Aubrun, Thales Alenia Space

What type of applications does Thales Alenia Space develop using Earth Observation data enhanced by artificial intelligence algorithms? Discover how AI-driven EO solutions are shaping innovative applications in various fields.

13h00

lunch break

14h30 Hands-on Session

End-to-end AI pipeline for flood detection

Alena Bakhorina, GCore

Prerequisite: Notebook; Google account to log in Google Colab / or python with conda env + jupyter notebook locally

[5 min] Setup

Runtime
Installations
Mount Google Drive
Dataset download

[15 min] Sen1Floods11 Dataset

Overview
Visualizations
Dataset preparation

[20 min] Finetuning Geo Foundation Models (TerraMind / Prithvi)

Tensorboard
Configuration file
Finetuning for 5 epochs
Test metrics for trained model

[10 min] Inference

Sentinel-2 product search
Web interface
STAC API in Python

[20 min] Inference workflow on Sentinel-2 product

ONNX conversion
Inference steps explanation
Run
Visualizations outputs

[10 min] Q & A

16h00

coffee break

16h30 handson Session

GGIS: EO Data Visualization

Michelle Aubrun, Thales Alenia Space

Prerequisite: Linux / Windows computer + QGIS v3.4 installed (optional)

Installation of QGIS (if not already done)

Quick QGIS Installation Guide

Presentation of QGIS

Slides on QGIS instroduction

Manipulation of QGIS

Visualization of raster and vector data

Creation of vector data

9h00-12h00 AI and Performance

9h00 Plenary Session

Building the High-Performance Core of AI Factories

Farouk Mansouri, LuxProvide

AI factories must be designed to handle the most demanding stages of the machine learning lifecycle—data ingestion, preprocessing, and large-scale model training. These early phases place extreme demands on infrastructure, from GPU-accelerated supercomputers and high-throughput storage to low-latency interconnects and distributed data pipelines. This session explores how to architect and optimize dynamic, high-performance environments capable of processing massive datasets, orchestrating parallel training jobs, and scaling resources efficiently. Participants will gain practical insights into overcoming data bottlenecks, maximizing hardware utilization, and integrating HPC and cloud resources to deliver speed, scalability, and cost-efficiency at the start of the AI production chain.

9h45 Plenary Session

- Evolving Cloud Infrastructures for AI Workloads - High-Performance Foundations for AI Training

Clara Ulken, GCore

At GCore, Clara is leading cross-functional initiatives that align engineering, product, and business strategies to deliver scalable, high-impact technology solutions.

AI development begins with workloads that demand extreme performance. Training large language models and generative AI systems requires GPU acceleration, low-latency interconnects, and scalable storage to handle massive datasets. This session explores how Gcore’s GPU Cloud—powered by NVIDIA A100, H100, and H200 instances with InfiniBand networking—delivers HPC-grade capabilities in a flexible, cloud-native environment. We will look at distributed training strategies, mixed precision techniques, and orchestration tools that transform supercloud-class resources into elastic infrastructures for enterprise AI.

10h30

30 minute break

11h00 Plenary Session

Delivering Agility: From Trained Models to Scalable AI Services

Farouk Mansouri, LuxProvide

Once AI models are trained, the challenge shifts to delivering them as fast, reliable, and cost-effective services. Inference workloads require entirely different priorities—low latency, high availability, and seamless integration into production systems—often through containerized microservices, auto-scaling cloud platforms, or edge computing. This session examines how to bridge the performance–agility gap, from compressing and optimizing models for deployment to building resilient MLOps pipelines for monitoring, retraining, and governance. Real-world patterns and case studies will illustrate how to move from HPC-heavy training to lightweight, scalable inference while maintaining cost control, compliance, and operational excellence.

11h45 Plenary Session

Evolving Cloud Infrastructures for AI Workloads – Agility at Scale: Deploying AI Everywhere

Clara Ulken, GCore

AI is no longer confined to research or centralized datacenters, but it increasingly powers real-time, interactive experiences where every millisecond counts. This session focuses on Gcore’s Everywhere Inference platform, which brings models closer to end users, enabling ultra-low latency, reducing unnecessary backhaul traffic, and supporting regional data handling requirements. Gcore has extended its global edge infrastructure with GPU-powered Points of Presence worldwide, creating a platform designed for workloads where speed and locality make a measurable difference. With Kubernetes integration, autoscaling, and support for hybrid deployments, enterprises can deploy optimized inference services that adapt in real time to diverse workloads. From finance to conversational AI, fraud detection and live content personalization, to immersive gaming and AR/VR, inference at the edge delivers tangible improvements in performance and user experience.

13h00

lunch break

14h30 Hands-on Session

Hands-on Session. Participant will have the opportunity to access a remote super-computer to run tests and experiments

Part 1 – HPC Training/Finetuning of an LLM

Goal: Show participants how to run and scale a training job on an HPC system. (45 minutes)

1. Introduction to HPC Infrastructure

o Overview of the HPC cluster architecture (login nodes, compute nodes, GPU nodes) o SLURM basics: job submission, partitions, scheduling policies o Storage layout and data transfer tips

2. Exploration of Performance Tools

o htop, nvidia-smi, ibstat for InfiniBand, iostat for disk I/O o Brief on nvtop or similar GPU monitoring tools o How to interpret load, memory, and network usage

3. Simple Run

o Launch a minimal LLM training job (small dataset, reduced parameters) o Show SLURM job submission (sbatch), log checking (squeue, sacct) o Walk through model directory structure and outputs

4. Run at Scale

o Increase dataset/model size and GPU count o Demonstrate distributed training (PyTorch DDP, DeepSpeed, or Megatron-LM) o Monitor scaling behavior and GPU utilization in real-time

5. Monitoring & Optimization

o Detect bottlenecks (GPU idle time, I/O wait, network congestion) o Adjust batch size, precision (FP32 vs FP16), and parallelism strategies o Quick discussion: trade-offs between speed and accuracy

14h45

10 minute break

Part 2 – Kubernetes Deployment of the Trained Model

Goal: Deploy models as a scalable inference service (45 minutes)

1. Introduction to Kubernetes Infrastructure

o K8s architecture: master, worker nodes, pods, services, ingress o Overview of deployment options: cloud K8s, on-prem, hybrid o Brief on container images and registries

2. Exploration of Performance Tools

o kubectl top nodes/pods for resource monitoring o Logs (kubectl logs), kubectl describe for debugging o K8s dashboard or Lens for visual inspection

3. Simple Run

o Deploy model as a single replica pod with REST API (FastAPI/Flask) o Expose via NodePort or port-forwarding o Test with a sample query

4. Run at Scale

o Scale replicas using kubectl scale or HPA (Horizontal Pod Autoscaler) o Demonstrate load testing (e.g., hey or ab command) o Show autoscaling behavior under increasing load

5. Monitoring & Optimization

o Identify bottlenecks: CPU/GPU constraints, network latency o Optimize container startup, model loading time, batch inference o Discuss cost/performance trade-offs in scaling

15h40

10 minute break

Part 3 – End-to-End Challenge & Wrap-Up

Goal: Apply both HPC and K8s learnings to simulate a full AI factory pipeline (45 minutes)

1. End-to-End Integration

o Take the trained model from HPC output o Package it into a Docker image o Push image to registry for K8s deployment

2. Performance Challenge

o Teams run the model at scale and optimize throughput & latency o Compare scaling behavior between HPC training and K8s inference o Introduce optional constraints (cost cap, latency target)

3. Wrap-Up & Key Takeaways

o Recap performance/agility lessons from lecture o Share best practices cheat-sheet for HPC and K8s operations o Q&+A + feedback

9h00-13h00 Beyond stand-alone performance: AI Workflow

9h00 Plenary Session

Containers/Docker

Giorgos Saloustros, FORTH

Containers are now the standard for application deployment. By packaging an application's code and all its dependencies into a single, isolated unit, they ensure consistent behavior across any environment. This makes applications portable and much easier to develop and share. This session will introduce the technology and present the reason of its ubiquitous success as well as the challenges remaining to be addressed

9h45 Plenary Session

Orchestration/Kubernetes

Antony Chazapis

Orchestration is essential for managing containers at scale. Without it, manually deploying and scaling applications quickly becomes unmanageable. Kubernetes automates these tasks, handling everything from load balancing to self-healing, so your applications are always available. Orchestration is the fundamental concept to build a service architecture out of a set of individual containers

10h30

30 minute Coffee break

11h00 Plenary Session

Worflows

Antony Chazapis

A workflow is a sequence of computational steps. It structures complex processes where the output of one step becomes the input for the next. This is fundamental for data processing and machine learning, making these pipelines repeatable, transparent, and scalable. This session will present the key concept the main pitfalls and the most relevant technologies to implement a workflow.

11h45 Plenary Session

Argo

Lefteris Vasilakis and Antonis Tapanlis

Argo is a suite of tools for building and managing workflows directly on Kubernetes. Argo Workflows allows you to define complex, multi-step pipelines natively within your cluster. It's an ideal technology for running large-scale data and machine learning tasks. This session is the concluing stage of the morning session where after presentation of the individual containers, the need to organize and share resources and orchestration, a fully operating workflow can be implemented and deployed.

Slides of the handson Argo session

13h00

lunch break

14h30 Hands-on Session

Hands-on Session. Participant will have the opportunity to access a remote super-computer to run tests and experiments

Knot

Giorgos Saloustros/Antony Chazapis

prerequisite:

access to an HPC Cluster will be provided by DaFab

Outline of the sessions

The hands-on session will start with Docker containerization for packaging and managing applications in reproducible environments, followed by Kubernetes fundamentals for orchestrating and scaling workloads. The session will also introduce Knot a novel open-source orchestrator developed by FORTH harnessing both Kubernetes workflow and Slurm the HPC resource manager.

16h00

30 minute Coffee break

16h30 Hands-on Session

Hands-on Session. Participant will have the opportunity to access a remote super-computer to run tests and experiments

hands-on create your own workflow with Knot/Argo

Lefteris Vasilakis/Antonis Tapanlis

prerequisite:

access to an HPC Cluster will be provided by DaFab

Outline of the sessions

Participants will explore further workflow design principles, with a deep dive into Argo Workflows for managing complex, automated pipelines and the way Argo and knot can interoperate to provide seamless workflow deployment on different infrastructures, either cloud based on HPC.

Source code of the Argo workflows

9h00-12h00 Earth Observation and Data Management

9h00 Plenary Session

The AI Factory for Earth

Stephanie Giard, DDN

Stephanie is a Senior Product Marketing Manager at DDN. She brings extensive experience in Earth Observation from her roles at Planet, Airbus Defence & Space, Boeing, and Ubotica. Based in Fort Collins, CO, she holds a Master’s degree in Remote Sensing and GIS from Boston University.

There is an important on-going evolution of the Earth Observation data market. Originally the data market was driven by data acquisition, data acquisition cost and complexity. The Earth Observation community has been extremely successful in its effort to democratize data acquisition, as a result the community is drowning in data and starving for insight. The bottleneck is not satellites or sensors—it's data infrastructure. In this talk we will discuss this on-going evolution and the key data infrastructure technologies required to address this problem specifically in the time of AI.

9h45 Plenary Session

Data and Performance: Key factors and Parameters

Jean-Thomas Acquaviva, DDN

In this presentation we will present the traditional pitfall in data logistic, from management to exploitation. We will introduce some useful metric to assess the scalability and the relevance of a data infrastructure

Slides of Performance presentation

10h30

30 minute break

11h00 Plenary Session

Metadata Catalog: The needle and the haystack

Dimitrios Xenakis, CERN

Earth Observation today produces petabyte‑scale streams scattered across multiple mirrors and portals. The result: finding the right scene is often harder than processing it. This session introduces a practical path from needle to hits. We’ll start by mapping today’s discovery landscape and give a concise primer on STAC, identifying its strengths but also where it falls short. Then we’ll show how the DaFab AI project uses CERN’s Rucio to unify source and AI‑derived metadata, and how the Rucio Enhanced Filter (REF) is designed to improve the metadata discovery.

11h45 Plenary Session

Observability and data performance at the time of AI

Jean-Thomas Acquaviva, DDN

In this presentation we will review suitable tools and methodology to observe I/O performance and tune workload to suppress potential data bottleneck. This session is an introduction of the hands-on tutorial in the afternoon

Slides of Observability presentation

10h30

30 minute break

11h00 Plenary Session

Metadata Catalog: The needle and the haystack

Dimitrios Xenakis, CERN

Earth Observation today produces petabyte‑scale streams scattered across multiple mirrors and portals. The result: finding the right scene is often harder than processing it. This session introduces a practical path from needle to hits. We’ll start by mapping today’s discovery landscape and give a concise primer on STAC, identifying its strengths but also where it falls short. Then we’ll show how the DaFab AI project uses CERN’s Rucio to unify source and AI‑derived metadata, and how the Rucio Enhanced Filter (REF) is designed to improve the metadata discovery.

11h45 Plenary Session

13h00

lunch break

14h30 Hands-on Session

Hands-on Session. Participant will have the opportunity to access a remote super-computer to run tests and experiments

prerequisite:

Installation of the VPN software Wireguard on a laptop. Availability of ssh

Git and Docker available on the laptop.

Outline of the sessions

Performance measurements with Darshan. Darshan is data profiling https://github.com/darshan-hpc/darshan tool used to trace applications behavior and report I/O patterns. In this session we will analyse mysterious applications to discover potential I/O bottlenecks using Darshan tracing tools.

Steering Committee

DaFab is a European project sponsored by ESA with a focus on applying AI technique to Earth Observation data. As we've observed the limited offer in terms of interdisciplinary event in Europe, we've decided to set-up this summer school! The summer school has been organized by the DaFab consortium as a whole, nevertheless, five members are specifically involved in the organization.

Andrej Filipcic

Institute Jozef Stefan

General Chair

Andrej Filipčič has significantly contributed to major international projects like the ATLAS experiment at CERN and the Pierre Auger Observatory in Argentina. Andrej's expertise includes particle physics, cosmic-ray physics, distributed computing, and supercomputing. He coordinates ATLAS Distributed Computing and advises EuroHPC, playing a key role in advancing computational strategies in high-energy physics. Additionally, he teaches particle physics and computerized data acquisition at the University of Nova Gorica.

Jean-Thomas Acquaviva

DDN Storage

Co-Chair and Thematic Chair Earth Observation and Data management

Jean-Thomas successively worked for Intel, the University of Versailles and the French Atomic Commission (CEA). He participated to the creation of their joint laboratory on Exascale Research. At DDN, Jean-Thomas' role includes overseeing research collaborations in Europe as well as product management for some advanced DDN’s solutions.

Alena Bakhorina

GCore labs

Thematic Chair AI and Earth Observation

Alena is a proficient Data Scientist with over five years of experience specializing in AI and Computer Vision. She has a strong background in solving complex Computer Vision tasks, including expertise in image processing techniques, distributed deep learning training, 3D modeling, video analytics, and geospatial analysis. Since 2024, Alena is working as a Data Scientist at Gcore, contributing to European projects focused on the Earth Observation domain, specifically crop field detection and water anomaly (e.g., floods) detection on satellite imagery.

Farouk Mansouri

LuxProvide

Thematic Chair AI and Performance

Farouk Mansouri is a senior HPC and HPDA engineer at LuxProvide, Luxembourg's national HPC organization. He specializes in managing and leveraging supercomputing resources, particularly Luxembourg’s supercomputer, MeluXina. Farouk is known for his expertise in running complex AI/ML workflows and has significantly contributed to advancing computational strategies in various scientific domains, including space and Earth sciences. His work at LuxProvide highlights his commitment to innovation in supercomputing.

Giorgos Saloustros

FORTH

Thematic Chair Worflow

Giorgos Saloustros is a Research and Development Engineer at the Foundation for Research and Technology Hellas (FORTH) in Crete, Greece. With a strong background in computer architecture and VLSI systems, Giorgos has contributed to various research projects and publications in the field. His work focuses on optimizing memory-mapped I/O for fast storage devices, showcasing his expertise in computer systems and hardware optimization. Giorgos holds a Bachelor's degree in Physics from the University of Ljubljana and has been actively involved in both academic and applied research throughout his career.

Register to the Summer School on the AI4SCience registration page

The workshop will be held in the faculty of Computer Sciences, University of Ljubjana, The Discovery Science 2025 International Conference will take place at the Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana, Slovenia: Access map

DaFaB Summer School on Earth Observation and AI: September 22-25 2025, in Ljubljana, in the beautiful Slovenia

Applying AI to Earth Observation

4 days of lectures and hands-on sessions

Join us for a summer school dedicated to AI applied to Earth Observation data, in the beautiful city of Ljubljana, Slovenia.

DaFab summer School September 22-24, 2025

Important dates

12 lecturers from all over Europe

4 days with 2 sessions

Access to world class supercomputers

Program

9h00-12h00 AI and Earth Observation

9h00 Plenary Session

Earth Observation Remote Sensing Introduction

Michelle Aubrun, Thales Alenia Space

9h45 Plenary Session

AI introduction in Remote Sensing

Alena Bakhorina, GCore

10h30

** 30 minute break **

11h00 Plenary Session

Earth Observation data access tools

Alena Bakhorina, GCore

11h45 Plenary Session

Thales Alenia Space AI applications

Michelle Aubrun, Thales Alenia Space

13h00

** lunch break **

14h30 Hands-on Session

End-to-end AI pipeline for flood detection

Alena Bakhorina, GCore

Prerequisite: Notebook; Google account to log in Google Colab / or python with conda env + jupyter notebook locally

16h00

** coffee break **

16h30 handson Session

GGIS: EO Data Visualization

Michelle Aubrun, Thales Alenia Space

Prerequisite: Linux / Windows computer + QGIS v3.4 installed (optional)

9h00-12h00 AI and Performance

9h00 Plenary Session

Building the High-Performance Core of AI Factories

Farouk Mansouri, LuxProvide

9h45 Plenary Session

- Evolving Cloud Infrastructures for AI Workloads - High-Performance Foundations for AI Training

Clara Ulken, GCore

At GCore, Clara is leading cross-functional initiatives that align engineering, product, and business strategies to deliver scalable, high-impact technology solutions.

10h30

** 30 minute break **

11h00 Plenary Session

Delivering Agility: From Trained Models to Scalable AI Services

Farouk Mansouri, LuxProvide

11h45 Plenary Session

Evolving Cloud Infrastructures for AI Workloads – Agility at Scale: Deploying AI Everywhere

Clara Ulken, GCore

13h00

** lunch break **

14h30 Hands-on Session

Hands-on Session. Participant will have the opportunity to access a remote super-computer to run tests and experiments

Part 1 – HPC Training/Finetuning of an LLM

Goal: Show participants how to run and scale a training job on an HPC system. (45 minutes)

1. Introduction to HPC Infrastructure

2. Exploration of Performance Tools

3. Simple Run

4. Run at Scale

5. Monitoring & Optimization

14h45

** 10 minute break **

Part 2 – Kubernetes Deployment of the Trained Model

Goal: Deploy models as a scalable inference service (45 minutes)

1. Introduction to Kubernetes Infrastructure

2. Exploration of Performance Tools

3. Simple Run

4. Run at Scale

5. Monitoring & Optimization

15h40

** 10 minute break **

Part 3 – End-to-End Challenge & Wrap-Up

Goal: Apply both HPC and K8s learnings to simulate a full AI factory pipeline (45 minutes)

1. End-to-End Integration

2. Performance Challenge

3. Wrap-Up & Key Takeaways

30 minute break

lunch break

coffee break

30 minute break

lunch break

10 minute break

10 minute break

30 minute Coffee break

lunch break

30 minute Coffee break

30 minute break

30 minute break

lunch break