Header Data with Icon
DaFab Summer School

DaFaB Summer School on Earth Observation and AI: September 22-25 2025, in Ljubljana, in the beautiful Slovenia

Applying AI to Earth Observation

4 days of lectures and hands-on sessions

Register

Join us for a summer school dedicated to AI applied to Earth Observation data, in the beautiful city of Ljubljana, Slovenia.

This summer school offers a unique opportunity to delve into the cutting-edge intersection of Artificial Intelligence (AI) and Earth Observation (EO), combined with the essential skills for managing large-scale data and workflows in modern computing environments. Participants will gain theoretical knowledge and practical experience in applying AI techniques to analyze EO data, optimizing AI performance, managing complex workflows with Kubernetes, and handling massive datasets

DaFab is a project funded by ESA under the grant agreement 101128693 — HORIZON-EUSPA-2022-SPACE

DaFab summer School September 22-24, 2025

DaFab is proud to be supported by the AI for SCIENCE 2025 for its summer school on Earth Observation data and AI. During 4 days we will cover topic such as AI models for satellite images, Kubernetes workflow, metadata generation and management and performance analysis.

During 4 days, morning will be devoted to lectures and lessons and afternoon to hands-on session provided by the DaFab Consortium.

Important dates

12 lecturers from all over Europe

Lecturers selected for the Summer School are coming from prestigious organization all over across Europe, coming from both Academia and the industry. Don't miss the opportunity to exchange with seasoned professional.

4 days with 2 sessions

The program is organized around 4 days: one dedicated to AI and EO, the second to AI and Performance, the third day to Workflow management and the last day to Earth Observation and data management.

Access to world class supercomputers

Hanson session will provide access on leadership class supercomputers in order to test, try and learn on reals systems.

Program

DaFab summer school is organised in 4 days, with a 3h lectures in the morning and a 2h session of coding and hands-on during the afternoon.

  • Monday Sep. 22, AI and EO
  • Tuesday Sep. 23, AI and Performance
  • Wednesday Sep. 24, Workflows
  • Thursday Sep. 25, EO and Data Management

9h00-12h00 AI and Earth Observation

program
9h00 Plenary Session

Earth Observation Remote Sensing Introduction

Michelle Aubrun, Thales Alenia Space

Michelle Aubrun received the Engineering degree in civil engineering and geomatics in 2013 and the Ph.D. degree in geography on SAR satellite image processing from the University of Montreal, Montreal, Canada, in 2019. She joined Thales Alenia Space in the beginning of 2018. Since then, she has been involved in several projects concerning image processing by deep learning approaches for remote sensing applications. Since 2020, she has also been a Researcher with the French Research Institute of Technology Saint Exupery, Toulouse, France. Her research interests include image representation learning with self-supervised approaches.

What is Earth Observation remote sensing? What are the key characteristics of EO data? What are its main fields of application? We will also introduce a concrete EO program called Copernicus.

program
9h45 Plenary Session

AI introduction in Remote Sensing

Alena Bakhorina, GCore

Evolution of AI technologies. Era of Foundation Models. Foundation Models in geoscience & datasets used for training. Popular Python frameworks for working with Earth Observation data. European projects in EO and AI.

program
10h30

** 30 minute break **

program
11h00 Plenary Session

Earth Observation data access tools

Alena Bakhorina, GCore

Introduction to STAC (SpatioTemporal Asset Catalogs) specification for discovering geospatial information. Examples of usage Python libraries for efficient data processing. 

program
11h45 Plenary Session

Thales Alenia Space AI applications

Michelle Aubrun, Thales Alenia Space

What type of applications does Thales Alenia Space develop using Earth Observation data enhanced by artificial intelligence algorithms? Discover how AI-driven EO solutions are shaping innovative applications in various fields.

program
13h00

** lunch break **

program
14h30 Hands-on Session

End-to-end AI pipeline for flood detection

Alena Bakhorina, GCore

Prerequisite: Notebook; Google account to log in Google Colab / or python with conda env + jupyter notebook locally

  • [5 min] Setup
    • Runtime
    • Installations
    • Mount Google Drive
    • Dataset download
  • [15 min] Sen1Floods11 Dataset
    • Overview
    • Visualizations
    • Dataset preparation
  • [20 min] Finetuning Geo Foundation Models (TerraMind / Prithvi)
    • Tensorboard
    • Configuration file
    • Finetuning for 5 epochs
    • Test metrics for trained model
  • [10 min] Inference
    • Sentinel-2 product search
    • Web interface
    • STAC API in Python
  • [20 min] Inference workflow on Sentinel-2 product
    • ONNX conversion
    • Inference steps explanation
    • Run
    • Visualizations outputs
  • [10 min] Q & A

program
16h00

** coffee break **

program
16h30 handson Session

GGIS: EO Data Visualization

Michelle Aubrun, Thales Alenia Space

Prerequisite: Linux / Windows computer + QGIS v3.4 installed (optional)

  • Visualization of raster and vector data
  • Creation of vector data
  • 9h00-12h00 AI and Performance

    program
    9h00 Plenary Session

    Building the High-Performance Core of AI Factories

    Farouk Mansouri, LuxProvide

    AI factories must be designed to handle the most demanding stages of the machine learning lifecycle—data ingestion, preprocessing, and large-scale model training. These early phases place extreme demands on infrastructure, from GPU-accelerated supercomputers and high-throughput storage to low-latency interconnects and distributed data pipelines. This session explores how to architect and optimize dynamic, high-performance environments capable of processing massive datasets, orchestrating parallel training jobs, and scaling resources efficiently. Participants will gain practical insights into overcoming data bottlenecks, maximizing hardware utilization, and integrating HPC and cloud resources to deliver speed, scalability, and cost-efficiency at the start of the AI production chain.

    program
    9h45 Plenary Session

    - Evolving Cloud Infrastructures for AI Workloads - High-Performance Foundations for AI Training

    Clara Ulken, GCore

    At GCore, Clara is leading cross-functional initiatives that align engineering, product, and business strategies to deliver scalable, high-impact technology solutions.

    AI development begins with workloads that demand extreme performance. Training large language models and generative AI systems requires GPU acceleration, low-latency interconnects, and scalable storage to handle massive datasets. This session explores how Gcore’s GPU Cloud—powered by NVIDIA A100, H100, and H200 instances with InfiniBand networking—delivers HPC-grade capabilities in a flexible, cloud-native environment. We will look at distributed training strategies, mixed precision techniques, and orchestration tools that transform supercloud-class resources into elastic infrastructures for enterprise AI.

    program
    10h30

    ** 30 minute break **

    program
    11h00 Plenary Session

    Delivering Agility: From Trained Models to Scalable AI Services

    Farouk Mansouri, LuxProvide

    Once AI models are trained, the challenge shifts to delivering them as fast, reliable, and cost-effective services. Inference workloads require entirely different priorities—low latency, high availability, and seamless integration into production systems—often through containerized microservices, auto-scaling cloud platforms, or edge computing. This session examines how to bridge the performance–agility gap, from compressing and optimizing models for deployment to building resilient MLOps pipelines for monitoring, retraining, and governance. Real-world patterns and case studies will illustrate how to move from HPC-heavy training to lightweight, scalable inference while maintaining cost control, compliance, and operational excellence.

    program
    11h45 Plenary Session

    Evolving Cloud Infrastructures for AI Workloads – Agility at Scale: Deploying AI Everywhere

    Clara Ulken, GCore

    AI is no longer confined to research or centralized datacenters, but it increasingly powers real-time, interactive experiences where every millisecond counts. This session focuses on Gcore’s Everywhere Inference platform, which brings models closer to end users, enabling ultra-low latency, reducing unnecessary backhaul traffic, and supporting regional data handling requirements. Gcore has extended its global edge infrastructure with GPU-powered Points of Presence worldwide, creating a platform designed for workloads where speed and locality make a measurable difference. With Kubernetes integration, autoscaling, and support for hybrid deployments, enterprises can deploy optimized inference services that adapt in real time to diverse workloads. From finance to conversational AI, fraud detection and live content personalization, to immersive gaming and AR/VR, inference at the edge delivers tangible improvements in performance and user experience.

    program
    13h00

    ** lunch break **

    program
    14h30 Hands-on Session

    Hands-on Session. Participant will have the opportunity to access a remote super-computer to run tests and experiments

    Part 1 – HPC Training/Finetuning of an LLM

    Goal: Show participants how to run and scale a training job on an HPC system. (45 minutes)

    1. Introduction to HPC Infrastructure

    o Overview of the HPC cluster architecture (login nodes, compute nodes, GPU nodes) o SLURM basics: job submission, partitions, scheduling policies o Storage layout and data transfer tips

    2. Exploration of Performance Tools

    o htop, nvidia-smi, ibstat for InfiniBand, iostat for disk I/O o Brief on nvtop or similar GPU monitoring tools o How to interpret load, memory, and network usage

    3. Simple Run

    o Launch a minimal LLM training job (small dataset, reduced parameters) o Show SLURM job submission (sbatch), log checking (squeue, sacct) o Walk through model directory structure and outputs

    4. Run at Scale

    o Increase dataset/model size and GPU count o Demonstrate distributed training (PyTorch DDP, DeepSpeed, or Megatron-LM) o Monitor scaling behavior and GPU utilization in real-time

    5. Monitoring & Optimization

    o Detect bottlenecks (GPU idle time, I/O wait, network congestion) o Adjust batch size, precision (FP32 vs FP16), and parallelism strategies o Quick discussion: trade-offs between speed and accuracy

    program
    14h45

    ** 10 minute break **

    program

    Part 2 – Kubernetes Deployment of the Trained Model

    Goal: Deploy models as a scalable inference service (45 minutes)

    1. Introduction to Kubernetes Infrastructure

    o K8s architecture: master, worker nodes, pods, services, ingress o Overview of deployment options: cloud K8s, on-prem, hybrid o Brief on container images and registries

    2. Exploration of Performance Tools

    o kubectl top nodes/pods for resource monitoring o Logs (kubectl logs), kubectl describe for debugging o K8s dashboard or Lens for visual inspection

    3. Simple Run

    o Deploy model as a single replica pod with REST API (FastAPI/Flask) o Expose via NodePort or port-forwarding o Test with a sample query

    4. Run at Scale

    o Scale replicas using kubectl scale or HPA (Horizontal Pod Autoscaler) o Demonstrate load testing (e.g., hey or ab command) o Show autoscaling behavior under increasing load

    5. Monitoring & Optimization

    o Identify bottlenecks: CPU/GPU constraints, network latency o Optimize container startup, model loading time, batch inference o Discuss cost/performance trade-offs in scaling

    program
    15h40

    ** 10 minute break **

    program

    Part 3 – End-to-End Challenge & Wrap-Up

    Goal: Apply both HPC and K8s learnings to simulate a full AI factory pipeline (45 minutes)

    1. End-to-End Integration

    o Take the trained model from HPC output o Package it into a Docker image o Push image to registry for K8s deployment

    2. Performance Challenge

    o Teams run the model at scale and optimize throughput & latency o Compare scaling behavior between HPC training and K8s inference o Introduce optional constraints (cost cap, latency target)

    3. Wrap-Up & Key Takeaways

    o Recap performance/agility lessons from lecture o Share best practices cheat-sheet for HPC and K8s operations o Q&+A + feedback

    9h00-13h00 Beyond stand-alone performance: AI Workflow

    program
    9h00 Plenary Session

    Containers/Docker

    Giorgos Saloustros, FORTH

    Containers are now the standard for application deployment. By packaging an application's code and all its dependencies into a single, isolated unit, they ensure consistent behavior across any environment. This makes applications portable and much easier to develop and share. This session will introduce the technology and present the reason of its ubiquitous success as well as the challenges remaining to be addressed

    program
    9h45 Plenary Session

    Orchestration/Kubernetes

    Antony Chazapis

    Orchestration is essential for managing containers at scale. Without it, manually deploying and scaling applications quickly becomes unmanageable. Kubernetes automates these tasks, handling everything from load balancing to self-healing, so your applications are always available. Orchestration is the fundamental concept to build a service architecture out of a set of individual containers

    program
    10h30

    ** 30 minute Coffee break **

    program
    11h00 Plenary Session

    Worflows

    Antony Chazapis

    A workflow is a sequence of computational steps. It structures complex processes where the output of one step becomes the input for the next. This is fundamental for data processing and machine learning, making these pipelines repeatable, transparent, and scalable. This session will present the key concept the main pitfalls and the most relevant technologies to implement a workflow.

    program
    11h45 Plenary Session

    Argo

    Lefteris Vasilakis and Antonis Tapanlis

    Argo is a suite of tools for building and managing workflows directly on Kubernetes. Argo Workflows allows you to define complex, multi-step pipelines natively within your cluster. It's an ideal technology for running large-scale data and machine learning tasks. This session is the concluing stage of the morning session where after presentation of the individual containers, the need to organize and share resources and orchestration, a fully operating workflow can be implemented and deployed.

    program
    13h00

    ** lunch break **

    program
    14h30 Hands-on Session

    Hands-on Session. Participant will have the opportunity to access a remote super-computer to run tests and experiments

    Knot

    Giorgos Saloustros/Antony Chazapis

    prerequisite:

    access to an HPC Cluster will be provided by DaFab

    Outline of the sessions

    The hands-on session will start with Docker containerization for packaging and managing applications in reproducible environments, followed by Kubernetes fundamentals for orchestrating and scaling workloads. The session will also introduce Knot a novel open-source orchestrator developed by FORTH harnessing both Kubernetes workflow and Slurm the HPC resource manager.

    program
    16h00

    ** 30 minute Coffee break **

    program
    16h30 Hands-on Session

    Hands-on Session. Participant will have the opportunity to access a remote super-computer to run tests and experiments

    hands-on create your own workflow with Knot/Argo

    Lefteris Vasilakis/Antonis Tapanlis

    prerequisite:

    access to an HPC Cluster will be provided by DaFab

    Outline of the sessions

    Participants will explore further workflow design principles, with a deep dive into Argo Workflows for managing complex, automated pipelines and the way Argo and knot can interoperate to provide seamless workflow deployment on different infrastructures, either cloud based on HPC.

    9h00-12h00 Earth Observation and Data Management

    program
    9h00 Plenary Session

    The AI Factory for Earth

    Stephanie Giard, DDN

    Stephanie is a Senior Product Marketing Manager at DDN. She brings extensive experience in Earth Observation from her roles at Planet, Airbus Defence & Space, Boeing, and Ubotica. Based in Fort Collins, CO, she holds a Master’s degree in Remote Sensing and GIS from Boston University.

    There is an important on-going evolution of the Earth Observation data market. Originally the data market was driven by data acquisition, data acquisition cost and complexity. The Earth Observation community has been extremely successful in its effort to democratize data acquisition, as a result the community is drowning in data and starving for insight. The bottleneck is not satellites or sensors—it's data infrastructure. In this talk we will discuss this on-going evolution and the key data infrastructure technologies required to address this problem specifically in the time of AI.

    program
    9h45 Plenary Session

    Data and Performance: Key factors and Parameters

    Jean-Thomas Acquaviva, DDN

    In this presentation we will present the traditional pitfall in data logistic, from management to exploitation. We will introduce some useful metric to assess the scalability and the relevance of a data infrastructure

    program
    10h30

    ** 30 minute break **

    program
    11h00 Plenary Session

    Metadata Catalog: The needle and the haystack

    Dimitrios Xenakis, CERN

    Earth Observation today produces petabyte‑scale streams scattered across multiple mirrors and portals. The result: finding the right scene is often harder than processing it. This session introduces a practical path from needle to hits. We’ll start by mapping today’s discovery landscape and give a concise primer on STAC, identifying its strengths but also where it falls short. Then we’ll show how the DaFab AI project uses CERN’s Rucio to unify source and AI‑derived metadata, and how the Rucio Enhanced Filter (REF) is designed to improve the metadata discovery.

    program
    11h45 Plenary Session

    Observability and data performance at the time of AI

    Jean-Thomas Acquaviva, DDN

    In this presentation we will review suitable tools and methodology to observe I/O performance and tune workload to suppress potential data bottleneck. This session is an introduction of the hands-on tutorial in the afternoon

    program
    10h30

    ** 30 minute break **

    program
    11h00 Plenary Session

    Metadata Catalog: The needle and the haystack

    Dimitrios Xenakis, CERN

    Earth Observation today produces petabyte‑scale streams scattered across multiple mirrors and portals. The result: finding the right scene is often harder than processing it. This session introduces a practical path from needle to hits. We’ll start by mapping today’s discovery landscape and give a concise primer on STAC, identifying its strengths but also where it falls short. Then we’ll show how the DaFab AI project uses CERN’s Rucio to unify source and AI‑derived metadata, and how the Rucio Enhanced Filter (REF) is designed to improve the metadata discovery.

    program
    11h45 Plenary Session
    program
    13h00

    ** lunch break **

    program
    14h30 Hands-on Session

    Hands-on Session. Participant will have the opportunity to access a remote super-computer to run tests and experiments

    prerequisite:

    Installation of the VPN software Wireguard on a laptop. Availability of ssh

    Git and Docker available on the laptop.

    Outline of the sessions

    Performance measurements with Darshan. Darshan is data profiling https://github.com/darshan-hpc/darshan tool used to trace applications behavior and report I/O patterns. In this session we will analyse mysterious applications to discover potential I/O bottlenecks using Darshan tracing tools.

    Steering Committee

    DaFab is a European project sponsored by ESA with a focus on applying AI technique to Earth Observation data. As we've observed the limited offer in terms of interdisciplinary event in Europe, we've decided to set-up this summer school! The summer school has been organized by the DaFab consortium as a whole, nevertheless, five members are specifically involved in the organization.




    Register to the Summer School on the AI4SCience registration page


    The workshop will be held in the faculty of Computer Sciences, University of Ljubjana, The Discovery Science 2025 International Conference will take place at the Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana, Slovenia: Access map









    Sponsors

    DaFab would like to warmly thanks its sponsors: