Geovisualization.net

GEMINI AI GENERATIVA con NANO BANANA: ¡La gran tormenta en Benidorm!

La IA generativa aplicada a mapas está transformando profundamente la forma en que entendemos, producimos y comunicamos la información geográfica. A diferencia de los enfoques clásicos, que se basan en reglas fijas o en clasificación supervisada, la IA generativa es capaz de crear nuevas representaciones espaciales a partir de patrones aprendidos en grandes volúmenes de datos geoespaciales.

En el ámbito cartográfico, estos modelos permiten generar mapas sintéticos, completar zonas con datos faltantes, aumentar la resolución espacial (super-resolución) o simular escenarios futuros, como la expansión urbana o el impacto del cambio climático. Modelos como GANs, diffusion models o transformers espaciales ya se están utilizando para generar imágenes satelitales realistas, mapas de uso del suelo plausibles o distribuciones probabilísticas de fenómenos territoriales.

Uno de los grandes valores de la IA generativa en mapas es su capacidad para integrar múltiples fuentes (satélite, censos, movilidad, clima) y producir salidas coherentes desde el punto de vista espacial. Esto resulta especialmente útil en regiones con escasez de datos, donde los mapas generados pueden servir como apoyo a la planificación o a la toma de decisiones.

Sin embargo, su uso plantea retos importantes: interpretabilidad, validación y sesgo espacial. Un mapa generado no es necesariamente un mapa verdadero, sino una hipótesis basada en datos previos. Por ello, la IA generativa debe entenderse como una herramienta de apoyo al análisis territorial, no como un sustituto del conocimiento geográfico ni del criterio experto.

En este contexto, el papel del geógrafo y del analista geoespacial es clave: evaluar, contextualizar y validar los mapas generados por IA para que su uso sea responsable y científicamente sólido.

Segun la página mappinggis, la IA generativa está cambiando las reglas del juego. Antes, las visualizaciones en la planificación urbana y paisajística se asociaban estrechamente con licencias costosas y largos tiempos de procesamiento. Hoy en día, una fotografía aérea y un buen prompt son suficientes para obtener un fascinante «mapa» generado automáticamente por IA.

Este enfoque muestra un potencial significativo para la visualización cartográfica automatizada. Si bien aún no está listo para producción, los resultados demuestran que la IA puede comprender y traducir imágenes aéreas a estilos de mapas reconocibles con una precisión razonable… y sin duda ofrece un anticipo de lo que nos espera en los próximos meses (¡no años!).

David Oesch

Ejemplo 1 en Benidorm, España.

Figura 2 – Captura de Google Earth en Benidorm, España

Figura 3 – Misma imagen pasada por el filtro de Gemini

Ejemplo 2 en Benidorm, España.

Figura 4 – Captura de Google Earth en Benidorm, España

Figura 5 – Misma imagen pasada por el filtro de Gemini

Ejemplo 3 en Benidorm, España.

Figura 6 – Captura de Google Earth en Benidorm, España

Figura 7 – Misma imagen pasada por el filtro de Gemini

Figura 8 – Ahora añadimos contexto: lluvia, emergencias, embotellamiento, etc

Este es el complejo promt que he usado, el propietario es David Oesch (citado en fuentes más abajo):

			
[COMPLETE FRESH START]
You are generating a completely new cartographic map. This is NOT an edit or modification of a previous image. 
Disregard all conversation history and prior image states.
OBJECTIVE: Transform the provided satellite image into a precise Swiss national topographic map
TRANSFORMATION REQUIREMENT: Complete replacement of all satellite imagery with vector-style cartographic symbolism. Maintain scale, proportions, and preserve the layout and spatial relationships from the source satellite image.
OUTPUT GUARANTEE: The result must contain ZERO visible satellite photograph elements
PHASE 1 - IMAGE ANALYSIS:
Identify all geographic features in the satellite image (buildings, roads, water, vegetation, terrain)
PHASE 2 - CARTOGRAPHIC TRANSFORMATION:
Replace every satellite image element with exact Swiss topographic symbology
PHASE 3 - VERIFICATION:
Confirm the output is a pure cartographic map with no satellite imagery, no photographic texture, no aerial photography
RENDERING SPECIFICATIONS:
- Orthographic projection (flat, top-down view)
- Exact RGB color values as specified
- Maintain scale
- Professional, clean cartographic presentation suitable for official swisstopo standards
## Background
- Base color: rgb(253, 253, 254) (off-white)
- Apply subtle gray hillshading for terrain relief
## Buildings and Structures
- Standard buildings: Fill with rgb(170, 172, 174) (medium gray)
- Building outlines: Dark gray rgb(154, 156, 158) with line width 0.5-2pt
- Roofs and cooling towers: Lighter fill rgb(196, 198, 200) with outline rgb(180, 182, 184)
- Construction platforms: Light gray rgb(222, 220, 220)
- Solar panels: Pale yellow rgb(245, 246, 189) with brownish outline rgb(152, 142, 132)
- Render as solid filled polygons with clean, precise edges
## Roads and Transportation Networks
**IMPORTANT: All roads rendered as clean, continuous centerlines without surface vehicles. Apply vehicle occlusion removal to show underlying road surface only.
**Unified Road Classification (Simplified Homogeneous System):**
- **Major roads** (motorways, highways, trunk roads, routes):
  - Fill: rgb(248, 207, 117) (golden yellow/beige) - unified color for all major routes
  - Casing: rgb(70, 55, 30) (dark brown)
  - Width: 3-11pt fill with 4-14pt casing (scale by importance, not by type)
  - Apply consistent styling across all motorway and trunk classifications
- **Secondary roads** (all secondary and tertiary routes):
  - Fill: rgb(250, 243, 158) (pale yellow) - single unified color
  - Casing: rgb(65, 65, 25) (olive brown)
  - Width: 2-9pt fill with 3.5-11pt casing
  - Homogeneous representation regardless of route number
- **Minor roads and local streets** (service roads, residential, tracks):
  - Fill: rgb(255, 255, 255) (white) - unified appearance
  - Casing: rgb(60, 60, 60) (dark gray)
  - Width: 1.5-8pt fill with 2.5-10pt casing
  - Consistent styling for all local access roads
- **Footpaths and trails:**
  - Color: rgb(60, 60, 60) (dark gray)
  - Style: Dashed line pattern with 16-40pt dashes, 2-4pt gaps
  - Width: 0.75-2pt
- **Mountain trails:**
  - Color: rgb(20, 20, 20) (nearly black)
  - Style: Dashed line
  - Width: 2-3.5pt
- **Railways and transit:**
  - Color: rgb(203, 77, 77) (red-brown)
  - Width: 0.75-2pt solid lines
- **Ferries:**
  - Color: rgb(77, 164, 218) (blue)
  - Style: Dashed lines (6-18pt dashes, 2-4pt gaps)
  - Width: 0.4-2.5pt
**Road Surface Rendering:**
- Remove all vehicles, cars, trucks, and moving objects from road surfaces using inpainting techniques
- Reconstruct underlying pavement/surface texture where vehicles were present
- Maintain road boundary precision and lane marking continuity
- Show only static road infrastructure (pavement, markings, surfaces)
## Water Bodies and Features
- Remove all boats, ships in harbour or anchoring and moving objects from water surfaces using inpainting techniques
- Lakes and oceans:
  - Fill: rgb(210, 238, 255) (light sky blue)
- Rivers and streams:
  - Fill: rgb(181, 225, 253) (slightly darker blue)
- Waterways (rivers, canals):
  - Line color: rgb(77, 164, 218) (medium blue)
  - Width: 0.25-10pt depending on water body size
- Intermittent streams:
  - Line color: rgb(0, 0, 0) (black)
  - Opacity: 0.4 (40%)
  - Style: Dashed pattern (0.75-4pt dashes with gaps)
- Water boundaries/shorelines:
  - Line color: rgb(77, 164, 218) (blue)
  - Width: 0.1-1.5pt
  - Blur: 0.25pt
## Vegetation and Landcover
- Forests and woodlands:
  - Primary: rgb(62, 168, 0) (vibrant green)
  - Alternative: rgb(62, 153, 10) (darker green)
  - Opacity: 0.1-0.25
- Parks and green spaces:
  - Fill: rgb(211, 235, 199) (pale green)
  - Opacity: 0.25-0.35
- Park boundaries:
  - Line color: rgb(112, 180, 70) (medium green)
  - Width: 1-4pt with 0.4pt blur
- Residential/urban green:
  - Fill: rgb(246, 219, 164) transitioning to rgba(188, 186, 185, 0.4) (beige to gray)
- Sports fields/pitches:
  - Fill: rgb(231, 243, 225) (very pale green)
- Sand/beach areas:
  - Fill: rgb(239, 176, 87) (sandy orange)
- Glaciers and ice:
  - Fill: rgb(205, 232, 244) (ice blue)
  - Opacity: 0.1-0.3
## Terrain and Elevation
- Hillshading (shadows):
  - Gray scale gradient from rgb(173, 188, 199) (darkest) to rgb(251, 252, 252) (lightest)
  - Apply based on luminosity values -15 to 0
  - Opacity: 0.5-1.0
- Sunny slopes:
  - Tint: rgb(255, 235, 5) (yellow)
  - Opacity: 0.04 (very subtle)
- Contour lines:
  - Standard: rgba(180, 110, 13, 0.35) (brown, semi-transparent)
  - Width: 0.75-3pt (major contours thicker)
  - Blur: 0.25-0.4pt
- Scree, barren and rock fields:
  - Apply stippled/textured patterns in gray tones
  - Opacity: 0.25-0.35
## Special Features
- Parking areas:
  - Fill: rgb(255, 255, 255) (white)
  - Outline: rgb(60, 60, 60) (dark gray)
  - **Remove parked vehicles** - show empty parking spaces only
- Dams and weirs:
  - Fill: rgb(196, 198, 200) (light gray)
  - Outline: rgb(154, 156, 158) (medium gray)
- Retaining walls:
  - Line: rgb(132, 132, 132) (medium gray)
  - Width: 0.5-3pt
## Overall Technical Specifications
- Line blur: 0.25-0.4pt for smooth appearance
- Line cap: Round for roads, butt for boundaries
- Line join: Round for curves, miter for sharp angles
- Projection: Top-down orthographic (flat, no perspective)
- Antialiasing: Smooth edges for all features except terrain fills
- Layering order (bottom to top): Background → Hillshading → Water → Landcover → Roads → Buildings → Boundaries 
## Color Accuracy Notes
- Use exact RGB values provided (no approximations)
- Maintain opacity/transparency as specified
- Apply subtle blurring (0.25-0.4pt) to soften hard edges
## Vehicle Removal Specifications
- Detect and mask all vehicles (cars, trucks, buses, motorcycles, trains, boats, ships, airplanes) on roads, traintracks, airports and parking areas
- Apply deep learning-based inpainting to reconstruct road surface beneath vehicles
- Preserve road markings, lane boundaries, and surface texture continuity
- Ensure seamless integration of reconstructed areas with surrounding pavement
- Maintain edge fidelity and structural features of roads during vehicle removal
**Output Requirements:** Vector-style cartographic map with Swiss precision, clean geometry, harmonious homogeneous road classification, vehicle-free representation, proper layering hierarchy, and professional presentation quality suitable for official topographic analysis.

		

Luego, imaginando una situación en la que varios layers fueran necesarios, he tratado de integrar algunas capas como POIs o modelos 3D realistas como helicopteros, tiendas de campaña de emergencias, coches colapsados, etc

Figura 9 – Integración de layers IA generativa. Gemini con Nano Banana

Lo que funciona bien
Precisión geométrica: La IA mantiene una alta precisión posicional, colocando correctamente las características geográficas. Reconocimiento de estilo: Fuerte fidelidad a las convenciones cartográficas (colores, símbolos, capas). Extracción de información: Buena detección de carreteras, árboles y edificios. Viabilidad de automatización: El flujo de trabajo es reproducible y puede procesar múltiples mosaicos sistemáticamente.

Limitaciones actuales
Desafíos de consistencia: Las interpretaciones generadas varían entre los mosaicos, lo que crea discontinuidades visuales. Procesamiento incompleto: no todos los elementos de imágenes aéreas se interpretan de manera consistente: vegetación y cuerpos de agua (pequeños estanques verdosos). Variaciones de estilo: la IA no siempre aplica el mismo estilo cartográfico en toda el área. Objetos transitorios: los automóviles y las características temporales no siempre se eliminan automáticamente.

Realmente, están pasando muchas cosas a todos los niveles relacionados con la IA, la visualuización, el GIS, los datos en general. Esto no es sino un intento de comprender desde dentro cómo puede estar afectando a la parte que me toca a mí.

Sources
https://gemini.google.com/
https://mappinggis.com/2025/10/generar-mapas-topograficos-automaticamente-con-ia/
https://github.com/davidoesch/ai-topographic-maps/blob/main/prompt.txt

¡Con R de running!

Un registro constante: más de 11 años (desde Agosto 2014 hasta hoy) con más de 1,150 sesiones documentadas. Un proyecto vital; no son solo números, es la cronología de mi disciplina. Puedo decir de nuevo que R me ha roto mis esquemas de geógrafo de ArcGIS, de Global Mapper y QGIS, ahora no todo pasa por el filtro de tener coordenadas, por ejemplo estos insights no tienen coordenadas pero son analizables y se pueden tomar conclusiones que te permiten tomar decisiones rápidas… Echemos un vistazo a mis carreras los últimos años.

Esta década de kilómetros no es solo una base de datos; es una narrativa de superación :-). Para descifrarla, he usado el rigor estadístico de R, que me permite “limpiar” el ruido del esfuerzo diario mediante el uso de percentiles, transformando variables caóticas en tendencias de rendimiento.

El recorrido que he diseñado sigue una secuencia lógica de descubrimiento:

La Densidad: Primero visualizamos la “forma” de tu carrera, identificando dónde se concentra tu volumen habitual mediante crestas de densidad.
La Composición: Luego diseccionamos tu “ADN” anual con gráficos de donas para entender el peso relativo de la intensidad vs. el volumen.
El Radar: Finalmente, llegamos a la joya de la corona: una brújula radial que actúa como un mapa de calor estacional. Aquí, la precisión de R para gestionar gradientes tricolores se une a la estética de proyección polar, permitiéndote ver cómo tu rendimiento “orbita” alrededor de los meses y cómo tu esfuerzo se expande año tras año. Es la unión perfecta entre la estadística descriptiva y la visualización de datos de alto impacto.

Antes de empezar, mi último 2025 que será por cierto el último de este tipo. Los años no pasan en balde y mi próximo proyecto running personal tendrá mucho que ver con el GIS (Python, R) pero eso os lo contaré otro día!

Figura 1 – Todo el año 2025 corriendo. Creado con ggplot2 en R

Empecé a darle vueltas a qué hacer y después de mucho tiempo (una semana, juasssss) me pareció que combinar mis carreras con mis análisis GIS podía ser una buena idea, sobre todo ahora que estoy moviendo mi portfolio para encontrar nuevas oportunidades laborales (trabajo, vamos).

El archivo csv contiene columnas clave como un ID, el trimestre, la fecha exacta en formato dd/mm/yyyy, la distancia total (en km), el ritmo (pace, en min/km), el esfuerzo (cociente entre distancia y ritmo), el esfuerzo que toma en cuenta el desnivel recorrido (effort_h) y la clasificación desde la primera hasta la última entrada.

Figura 1b – Un poco lioso así, ¿no?. ¡Vamos a darle unas vueltas!

Para empezar a darles valor a estos datos y ayudarte con este proyecto, lo primero es quedarse mirando el CSV durante al menos 10minutos, no hacer nada más… una vez (y solo entonces realizado este primera pasito) empezar con los demás 🙂

Figura 2 – La madre de todas las tablas del running 🙂 el CSV.

Empecemos a analizar, ¿han sido todos los años iguales? Lo primero que vemos es una constancia y una regularidad infinita, eso es bueno cuando hablamos de coherencia pero hoy no vamos a hablar de coherencia, vamos a hablar de comprensión, de tomar de perspectiva rápida, de toma de decisiones. En lugar de running puedes aplicarlo con casi cada tema. Paso a visualizar todos los años a la vez, todavía no veo mucho pero empiezo a comprender más de la base de datos…

Figura 3 – Todas las tedencias de todos los años (si os las pongo juntas parecen spaguettis!)

Puntos con Presencia: Al subir el tamaño a 1.5, cada sesión individual ahora tiene peso visual, permitiéndote ver si la línea de tendencia realmente está representando bien la “nube” de ese año.

Claridad en la Leyenda (Label): He añadido el texto el “esfuerzo” (columna effort_h, que mide la relacion entre el tiempo, la distancia y los metros de desnivel positivo) en la parte superior de cada cuadro.

Figura 4 – No sabía que podía hacer gráicos tan bonitos (bueno a mí me gustan!)

Orden Cronológico Vertical: Puedes ver año tras año cómo la “montaña” de los ritmos se mueve. Si el pico de 2025 está más a la derecha que el de 2014, hay progreso real.

Consistencia vs. Dispersión: Si una montaña es muy alta y estrecha, ese año fui un reloj (siempre al mismo ritmo). Si es baja y plana, fui más irregular.

El Color es el Esfuerzo: El granate oscuro marca visualmente los años donde “me exprimí” más, permitiéndome ver si ese esfuerzo se tradujo en ser más rápido (montaña más a la derecha).

Cero Solapamiento Sucio: Al estar una encima de otra pero ligeramente desplazadas, se puede ver todo sin que nada se tape.

Figura 5 – Cómo y cuánto he ido corriendo a lo largo de los años

Labels “Ghost”: Las distancias máximas (ej. 21.1k o 42.2k) aparecen en un gris suave (gray40) y con transparencia. Esto permite que el dato esté ahí si lo buscas, pero no ensucie la visión general de la montaña. Cronología Perfecta: El eje X corre de 2014 a 2025 sin saltos.

Referencia de Maratón: He añadido el hito de los 42.2k en el eje de distancias. Cada maratón, verás el “pico” o el label asomando por esa zona en el año correspondiente (arriba del todo están mis tres maratones, mi orgullo máximo, el 2017 en Valencia, el 2019 en Madrid y 2022 en Nantes).

Hitos de secuencia: marcados discretamente el “Inicio” y el “Actual” para dar contexto temporal a la evolución.

Nº de carreras (runs): Da el contexto del volumen. No es lo mismo un año con 100% de intensidad sobre 10 carreras que sobre 100.

ø (Media): Te da el valor exacto del esfuerzo medio anual (effort_h).

Leyenda Descriptiva: He titulado la leyenda de forma más técnica (“Nivel de Esfuerzo”) para que quede claro que los colores representan la variable effort_h.

Limpieza: El uso de theme_void() asegura que toda la atención se centre en la forma y el color de tus datos.

Al usar x = 0.5 en el comando annotate, los números de los años ahora actúan como una “cremallera” que separa el final y el principio de cada ciclo anual, dejando que los colores de todos los meses brillen.

Mapa de Intensidad Completo: La inclusión del Verde permite identificar tus ritmos “crucero” (aeróbicos), diferenciándolos claramente de los días de recuperación (Amarillo) y los de máxima exigencia (Rojo).

Visión 360°: El gráfico ahora parece un instrumento de precisión. Puedes ver perfectamente cómo en ciertos años “conquistaste” la zona roja y en otros te mantuviste en la zona verde/amarilla de base.

Aquí el código del último gráfico:

			
library(tidyverse)
library(lubridate)
# 1. CARGA Y FILTRADO (PERCENTILES 10-90)
df_radar <- read_delim("run_r.csv", delim = ";", 
                        locale = locale(decimal_mark = ","),
                        show_col_types = FALSE) %>%
  mutate(
    date = dmy(date),
    anio = year(date),
    mes = month(date, label = TRUE, abbr = TRUE),
    pace_decimal = case_when(
      ID == 181 ~ 5.516, 
      str_detect(as.character(pace), ":") ~ {
        p <- str_split(as.character(pace), ":", simplify = TRUE)
        as.numeric(p[1]) + as.numeric(p[2])/60
      },
      TRUE ~ as.numeric(str_replace(as.character(pace), ",", "."))
    )
  )
limites <- quantile(df_radar$pace_decimal, probs = c(0.10, 0.90), na.rm = TRUE)
# 2. PROCESAMIENTO MENSUAL
df_radar_f <- df_radar %>%
  filter(anio >= 2014, anio <= 2025) %>%
  group_by(anio, mes) %>%
  summarise(
    ritmo_medio = mean(pace_decimal),
    esfuerzo_total = sum(effort_h),
    .groups = "drop"
  )
# 3. GRÁFICO CON LABELS ENTRE MESES Y TRICOLOR
ggplot(df_radar_f, aes(x = mes, y = as.factor(anio), fill = ritmo_medio)) +
  # Celdas de ritmo
  geom_tile(color = "white", size = 0.2) +
  
  # Círculos de esfuerzo (blancos, con tamaño dinámico)
  geom_point(aes(size = esfuerzo_total), color = "white", alpha = 0.6) +
  
  coord_polar() +
  
  # ESCALA TRICOLOR: Rojo (Rápido) -> Verde -> Amarillo (Lento)
  scale_fill_gradientn(
    colors = c("#C0392B", "#27AE60", "#F1C40F"), # Rojo -> Verde -> Amarillo
    name = "Ritmo Medio",
    limits = c(limites[1], limites[2]),
    oob = scales::squish,
    labels = function(x) sprintf("%d:%02d", floor(x), round((x - floor(x)) * 60))
  ) +
  
  scale_size_continuous(name = "Volumen Esfuerzo", range = c(0.5, 9)) +
  
  # LABELS DE AÑO: Situados en el borde entre meses (x = 0.5 es entre Dic y Ene)
  annotate("text", x = 0.5, y = as.factor(2014:2025), label = 2014:2025, 
           color = "gray30", size = 2.8, fontface = "bold", hjust = 0.5) +
  
  labs(
    title = "EVOLUCIÓN DE RENDIMIENTO 2014-2025",
    subtitle = "Rojo: Velocidad | Verde: Aeróbico | Amarillo: Recuperación\nLos años se indican en la línea divisoria para no tapar los datos",
    x = "", y = ""
  ) +
  theme_minimal(base_size = 14) +
  theme(
    panel.grid.major = element_line(color = "gray94", linetype = "dotted"),
    axis.text.x = element_text(face = "bold", size = 12, color = "black"),
    axis.text.y = element_blank(),
    plot.title = element_text(face = "bold", size = 20, hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5, size = 11, color = "gray40"),
    legend.position = "right"
  )

		

El análisis exhaustivo de esta década de entrenamiento muestra mi capacidad para sostener cargas de trabajo elevadas (y yo no lo sabía!), que ha evolucionado hacia una mayor inteligencia biológica o comprensíón del cuerto de uno mismo, permitiéndome identificar y explotar tus ventanas de máxima eficiencia con una precisión que los registros brutos no alcanzaban a mostrar. Esto es lo que buscaba!

Espero que os haya gustado. En breve más!

Alberto C.
(Geo)data Analyst

Fuentes
https://posit.co/download/rstudio-desktop/

R analysis for HR corporate talent management

I am a geographer by training, and my professional career has always had a predominantly geospatial focus. Having recently completed a forty-hour course in R, using RStudio and GitHub, I feel that a whole new world of analysis has opened up before me. This work represents the meeting point between my basic geographical instinct and the technical capabilities of statistical programming. It is important to emphasise that I have invented this data and model entirely, so the results have no real meaning and contain inevitable biases. Their sole purpose is to learn and demonstrate the capabilities of this language. I believe that geographical knowledge and code are interdependent, as one without the other would not function successfully. It is precisely this symbiosis that I hope will make a difference in my current job search.

In terms of visual results, the analysis integrates layers of complexity that facilitate understanding of the territory. Using logarithmic scaling, we have managed to bring cities with very different data volumes together on the map, allowing us to identify small talent hubs that would normally be hidden by large capital cities. Furthermore, we do not simply place points on the map, but use colour gradients to link average age with location, making it easier to detect groups of young talent as opposed to more senior profiles. The inclusion of a histogram allows us to validate the general distribution of the data in relation to its geographical dispersion. This exercise demonstrates that the combination of territorial analysis and programming can transform data into a clear and strategic visual narrative.

Let us imagine an HR company that seeks to understand when its clients, with certain educational backgrounds and occupations, are most likely to decide to move to another city. In other words, is the profile of those willing to move mostly young or older? Urban or rural? With low or high levels of education? Where are they concentrated?

The database is a CSV file with only nine fields, thoroughly cleaned up so we can start playing around with the tidyverse library.

Figura 2 – Invented RH data model for decision making 🙂

Once the project has been set up, which can actually be one of the most complex parts (directories, files with appropriate separators, overall consistency of information, etc.), we start to get our hands on the code (please don’t look too closely, I’m just a humble geographer… and blond!). Below, I begin to understand the basics and create a histogram of the number of people broken down by age.

Figura 3 – Adding my first ggplot2 in this R project

But, what is an analysis without a map to overlay it?? 🙂

Figura 4 – Adding the base map, and start sketching the “invented” model

Inserting a map, adding labels, playing with sizes, colors, alphas, shapes… I have a lot to thank to my “geography visualization” teachers at the UVA in Valladolid!

Figura 5 – If “SI/YES” they’re willing to move to switch to a better job

A little detour, let’s focus on Andalucía…

Here comes the fun. Adding colors to disaggreagate profiles and size to rapidly get to understand where is the bigger amount of people willing to move…

Figura 7 – A bigger title for a quicker understanding. ¿Have we saved one second? It’s definitely worth.

¿Do I need to disaggregate by bigger regions (CCAA in Spain)?

Figura 8 – disaggregating in R using facet wrap

Please don’t blame me for the incoherence… this is FULLY RANDOM!!!! 🙂 Do we understand if we use a color fade for average age?

Figura 8 – Getting closer to the final result

Should we try a violin-like diagram showing that I don’t just make pretty maps or graphs, but that you establish aKPI? Any HR recruiter can see which groups are ‘aged’ or which are the ‘youngest’ in relation to the total workforce. A Senior Geodata Analyst should know how to interpret the social reality behind the data 🙂

Or even better, a disaggregation and highlight of MAX-MIN cases per level of education.

And the final result today (so far): the main cities, the age factor, the average candidate profile. Everything has an explanation:-)

Figura 9 – OUTCOM: Those willing to relocate is mostly young, urban and highly educated, concentrated in the country’s major economic hubs

Urban Concentration: The map shows that willingness to relocate is not uniform; it is concentrated heavily in Madrid, Barcelona, Seville, and Valencia. The larger circles in these areas confirm that they are the main ‘talent engines’.

The Age Factor: The trend line and histogram reveal a clear negative correlation: the younger the age, the greater the willingness. Younger talent (light/yellow colours) is the most flexible, while from the age of 45-50 (dark colours), the intention to move falls dramatically.

Geographical Balance: Thanks to logarithmic scaling, we see that although capital cities dominate, there is a constant flow of profiles in medium-sized cities, suggesting that talent is distributed but needs incentives depending on the stage of life.

Candidate Profile: The histogram confirms that the bulk of those interested are between 25 and 38 years old. Outside this range, mobility becomes exceptional.

Borders: The visualisation by autonomous community allows us to identify that regions such as Andalusia and Catalonia have a network of secondary cities with high mobility, unlike other regions where everything is concentrated in a single point.

In conclusion: the profile of those willing to relocate is mostly young, urban and highly educated, concentrated in the country’s major economic hubs.

I hope you enjoyed it. If you can think of any other scenarios where we don’t have to make up the data, I’ll give it some thought and write another post!

Alberto C.
Geospatial analyst and someone who is looking for a job. ¿Do you have one?

URBAN ATLAS 2018 + WORLDPOP 100m/GHSL 100m estimates over Madrid

Urban Atlas: Precision Geoespacial en el Corredor de Copernicus. Usos del Suelo combinados con estimaciones de población sobre cada una de las clases, un paso más en combinación de fuentes de Datos Abiertos en la nube.

Este análisis representa un pequeño test rápido desarrollado por Alberto C (Geovisualization.net) para mostrar las potencialidades de uso de un asset externo en la plataforma Google Earth Engine (GEE) mediante JavaScript. Se trata de un mapa de usos del suelo (LULC / Clutter) de media-alta resolución que ofrece una precisión temática y espacial muy precisa (significativamente superior a la de Corine Land Cover) a lo que añadimos una estimación de población con una base de datos transnacional como WOLDPOP 100m y otra en paralelo GHSL 100m. Este flujo de trabajo, que integra capas externas con datasets globales de computación en la nube, se ha ejecutado íntegramente en unos pocos minutos, demostrando la agilidad operativa de las herramientas cloud actuales.

Urban Atlas (UA) representa el estándar de oro dentro del Copernicus Land Monitoring Service (CLMS) para el análisis de la morfología urbana en Europa. A diferencia de Corine Land Cover, UA ofrece una resolución temática y espacial drásticamente superior (Unidad Mínima de Mapeo de 0.25 ha para clases urbanas), permitiendo discriminar entre tejidos urbanos continuos y discontinuos con una precisión de densidad del 10% al 80%.

Figura 1. Interfaz de GEE visualizando el ASSET de URBAN ATLAS

Casos de Uso de Vanguardia: Del Urbanismo a la Resiliencia

En la actualidad, el Urban Atlas se ha consolidado como la capa base para modelos críticos:

Modelización de Islas de Calor Urbanas (UHI): Gracias a la diferenciación entre superficies selladas y áreas verdes, UA es el input fundamental para correlacionar la temperatura de superficie (LST) con la tipología edificatoria.
Gestión de Escorrentía y Riesgo de Inundación: La clase High/Low Imperviousness permite calcular coeficientes de escorrentía precisos para el diseño de infraestructuras hidráulicas.
Planificación de la “Ciudad de los 15 minutos”: Se utiliza para analizar la fragmentación del ecosistema urbano y la accesibilidad a servicios según el tejido residencial.
Cuentas de Ecosistemas: Monitorización del “Urban Sprawl” (expansión urbana) y la pérdida de suelo agrícola o forestal colindante a las Funcional Urban Areas (FUA).

Integración en Google Earth Engine: Escalando el Análisis

La verdadera potencia del Urban Atlas se libera al integrarse en motores de Cloud Computing como GEE. Pasar de un análisis local por municipio a un análisis continental es ahora una cuestión de código, no de capacidad de hardware.

Ventajas de la automatización en GEE:

Zonal Statistics a Gran Escala: Mediante el uso de reduceRegions, se pueden extraer perfiles de uso de suelo para miles de ciudades en segundos, cruzándolos con datos de población.
Fusión Multi-Sensor: GEE permite intersectar el ráster categórico de Urban Atlas con series temporales de Sentinel-2 (NDVI) o Sentinel-1 (Backscatter) para validar la salud de la vegetación urbana o la altura de las estructuras.
Remapping Dinámico: Como hemos visto en flujos de trabajo previos, la capacidad de aplicar un .remap() instantáneo permite simplificar las 27 clases originales de UA en indicadores binarios (Gris vs. Verde) para generar histogramas de resiliencia en tiempo real.

Ejemplo de flujo lógico en GEE:

JavaScript

// Agregación de clases para análisis de infraestructura verde
var greenSpace = ua_image.remap([14100, 14200, 31000], [1, 1, 1], 0);
var stats = greenSpace.reduceRegion({
  reducer: ee.Reducer.mean(),
  geometry: region_interes,
  scale: 10
});

El Futuro: Automatización y Deep Learning

El siguiente paso que estamos viendo en la industria es el uso de Urban Atlas como Ground Truth (verdad terreno) para entrenar redes neuronales convolucionales (CNN) sobre imágenes de muy alta resolución (VHR), permitiendo actualizar los mapas de uso de suelo de forma continua sin esperar a los ciclos de actualización trienales de Copernicus.

Figura 2. Histograma en consola de % de área por cada clase.

Las estimaciones de población se obtuvieron mediante la intersección espacial de los datos de población en cuadrículas de WorldPop 2020 con clases categóricas de uso del suelo y la agregación de los recuentos de población por clase.

Figura 3. Clases corregidas para mejor comprensión.

Cambiamos fácilmente la leyenda puesto que necesitamos unos colores más más adecuados

Figura 4. Leyenda más adecuada al tejido urbano.

Las clases de uso del suelo se obtuvieron del Atlas Urbano Copernicus y se agruparon en categorías temáticas siguiendo la nomenclatura oficial del Atlas Urbano. El scrip de GEE saca directamente esta tabla en formato CSV.

Si bien no hay correspondencia con la población real del distrito esto es porque por un lado tenemos una fuente vectorial de media-alta resolución (urban atlas) mientras que los datos de población vienen de una fuente continua de 100m (100 veces menor). Este análisis advierte de las limitaciones del estudio mientras que se enfoca en las potencialidades de uso de fuentes en la nube que con toda lógica, deben de hacerse coincidir en aras de una completa coherencia.

Si te interesa el tema, pídeme el ASSET de URBAN Atlas (lo puedes ver en las fuentes abajo del post) o el ASSET de población sobre el AOI para que puedas importarlo en tu workspace o si no quieres replicarlo simplemente dime qué te parece este enfoque!
Un saludo!

Alberto C.
Geodata Analyst

Sources:
https://code.earthengine.google.com/afe5dbf9b75c53dda2b82e4cad6d0b4e
https://land.copernicus.eu/en/products/urban-atlas/urban-atlas-2018#download
https://developers.google.com/earth-engine/datasets/catalog/WorldPop_GP_100m_pop?hl=es-419
https://human-settlement.emergency.copernicus.eu/
https://hub.worldpop.org/project/categories?id=3
https://developers.google.com/earth-engine/datasets/catalog/JRC_GHSL_P2023A_GHS_BUILT_V

Analyzing Spatial Correlation between Purchase Power Index and Gambling Stores (2)

Theoretical framework: This GIS study applies Geographically Weighted Regression (GWR) to investigate the spatial relationship between Purchasing Power Index (PPI) and the distribution of gambling-related retail establishments within the city of Madrid. My aim is to account for spatially varying relationships driven by local urban contexts, under the assumption that the relationship between socioeconomic conditions and the presence of gambling venues varies across urban space and his socioeconomic patterns. My hypothesis is that this socioeconomic conditions of the urban fabric I mention, can be a breeding ground for the location of Betting Shops/Gambling Stores (AKA bookmaker in UK or bookie in the US), or in other words, I am attempting to Detect Urban Vulnerability to Gambling Harm.

Figura 1 – Highlighting Gambling Stores’ distribution over Madrid

I have chosen this Purchasing Power Index (PPI) as it’s a standardized socioeconomic indicator that represents the relative capacity of households to spend and consume goods and services within a given geographic area. It typically integrates information on income levels, employment, and demographic structure, and is expressed as a relative measure rather than an absolute monetary value, allowing comparisons across spatial units such as census sections. The logic says: The higher the PPI, the lower the vulnerability (where ) thus the more likely to find Gambling Stores around.

A potential relationship between purchasing power and the location of Gambling Stores arises from commercial location strategies and socioeconomic vulnerability dynamics. Gambling operators may preferentially locate in areas where household purchasing power and consumption patterns maximize demand, or conversely in areas with lower purchasing power where gambling expenditure may function as a substitute consumption behavior. As a result, the spatial distribution of gambling establishments may reflect underlying socioeconomic gradients within the urban fabric.

Figura 3 – Gambling Stores (Official Local Census, 2024) over Census Sections (INE, 2025). Spatial Join in ArcGIS Pro

From a spatial analysis perspective, traditional global regression models (e.g., Ordinary Least Squares) impose a single, spatially invariant relationship across the entire study area. However, urban socioeconomic processes are inherently heterogeneous, especially in large metropolitan areas such as Madrid, where neighborhood-level dynamics, urban morphology, and socioeconomic gradients differ significantly between districts. GWR is therefore selected as the most appropriate method to capture local variations in model coefficients and to provide geographically explicit insights.

Figura 4 – Gambling Stores and Census Sections over Madrid (Zoom out)

Sources: The analysis integrates two primary datasets, both harmonized at the census section level (sección censal), which represents the finest administrative unit for socioeconomic statistics in Spain:

Figura 5 – Gambling Stores and Census Sections over Madrid (Zoom in)

Gambling Stores Local Census (2024):
An official census of commercial premises provided by the City of Madrid, including the geolocated inventory of gambling-related establishments (e.g., betting shops, gaming halls). For analytical purposes, individual point locations were spatially aggregated to census sections, generating a count of Gambling Stores per census section. (–>Figura 3)
Purchasing Power Index (INE, 2022, extracted from ESRI’s LIVING ATLAS):
Socioeconomic data derived from the Spanish National Statistics Institute (INE), used ESRI demographics for providing a standardized Purchasing Power Index at the census section level. This indicator reflects relative household purchasing capacity and is widely used as a proxy for local socioeconomic status.

The census section is adopted as the spatial unit of analysis to ensure statistical consistency between datasets and to align with official demographic and economic reporting standards.

I used ArcMap 10.6.1 Spatial Statistics Tool /Modelling Spatial Relationships /Geographically Weighted Regression. Also visualized and geoprocessed in Global Mapper 26.2.

Figura 6 – ArcMap 10.6.1 – Geographically Weighted Regression (GWR)

Geographically Weighted Regression is a local regression technique that extends classical linear regression by allowing model parameters to vary spatially. Instead of estimating a single global coefficient, GWR computes a separate regression equation for each spatial unit, calibrated using nearby observations weighted by their geographic proximity.

I would like to mention that, at this point, having fed the tool (using ArcMap 10.6.1) with the necessary inputs (dependent and explanatory variables), I am noting down the results and adding them to the article in order to systematize my understanding (and perhaps that of some of you), but my explanatory capacity is still limited. I continue thou.

Formally, the GWR model can be expressed as: $y_i = \beta_0(u_i, v_i) + \beta_1(u_i, v_i) x_i + \varepsilon_i$

where:

yi represents the number of Gambling Stores in census section i,
xi corresponds to the Purchasing Power Index,
(ui,vi) are the spatial coordinates of census section i,
β0 and β1 are location-specific parameters,
εi is the local error term.

This formulation allows the strength and direction of the relationship between purchasing power and gambling store presence to vary across Madrid.

And these were the results/output

OBJECT_ID / VARNAME / VARIABLE / DEFINITION
1 / Bandwidth / 6563,230379
2 / Residual Squares / 513,203175
3 / Effective Number / 12,898945
4 / Sigma / 0,45955
5 / AICc / 3141,813059
6 / R² / 0,016602
7 / R² Adjusted / 0,011787
8 / Dependent Field / 0 / Join_Count (Amount of Gambling Stores in the same Census Section)
9 / Explanatory Field / 1 / PPIDX_CY (Purchase Power Index)

Figura 7 – Geographically Weighted Regression (GWR) map result: Associational rather than casual

The geographically weighted regression model shows a very low explanatory power (adjusted R² = 0,011787), indicating that purchasing power alone does not meaningfully explain the spatial distribution of betting shops, even when allowing for spatially varying relationships.

A critical component of GWR is the definition of the spatial weighting scheme, which determines how neighboring census sections influence each local regression. In this analysis:

A distance-based kernel function is used to assign higher weights to closer census sections and progressively lower weights to more distant ones.
The bandwidth—controlling the spatial extent of each local calibration—is optimized automatically using AICc minimization, balancing model fit and complexity.

This adaptive approach ensures that densely populated urban areas benefit from a more localized calibration, while peripheral areas incorporate information from a broader neighborhood when necessary.

The GWR model produces several spatially explicit outputs of analytical relevance:

Local regression coefficients for the Purchasing Power Index, revealing where purchasing power is more strongly or weakly associated with the presence of Gambling Stores.
Local R² values, indicating how well the model explains variance in gambling store distribution in different parts of the city.
Residual surfaces, used to identify spatial patterns of over- or under-prediction and potential omitted variables.

Rather than seeking a single city-wide conclusion, the emphasis is placed on geographic patterns, such as clusters of census sections where lower purchasing power is more strongly associated with higher concentrations of gambling establishments, or conversely, areas where this relationship is weak or absent.

By adopting a geographically weighted approach, this analysis explicitly acknowledges that urban socioeconomic processes are spatially contingent. The resulting maps of local coefficients provide actionable insights for urban policy, public health, and regulatory frameworks, allowing stakeholders to identify areas where gambling availability may be more closely linked to socioeconomic vulnerability.

From the standpoint of a spatial analyst, GWR serves not only as a statistical tool but as an exploratory framework that integrates spatial thinking directly into the modeling process. It enables an interpretation of the urban landscape of Madrid, grounded in official data sources and aligned with best practices in spatial econometric analysis.

Despite the analytical advantages of Geographically Weighted Regression in capturing spatial heterogeneity, several ***assumptions and limitations**** must be acknowledged to ensure a transparent and rigorous interpretation of the results.

First, GWR assumes that spatial non-stationarity is present and meaningful, and that local variations in model parameters reflect real underlying processes rather than random noise. The method presumes that nearby observations are more relevant for explaining local relationships than distant ones, an assumption operationalized through the spatial weighting kernel and bandwidth selection. While this assumption is generally appropriate in urban socioeconomic analyses, it may oversimplify complex, multi-scalar processes that operate beyond immediate spatial neighborhoods.

Additionally, GWR inherits the core assumptions of linear regression, including linearity, additivity, and independence of errors at the local scale. Although local calibration mitigates some forms of spatial autocorrelation, it does not fully eliminate the risk of residual spatial dependence, particularly in densely urbanized areas with strong structural patterns.

Also we need to have into account that the analysis relies on cross-sectional data from different reference years, specifically the Purchasing Power Index from INE (2022) and the Gambling Stores census from the City of Madrid (2024). This temporal mismatch assumes relative stability in the spatial distribution of purchasing power over the short term. While this assumption is reasonable for aggregated socioeconomic indicators, it may obscure short-term dynamics or recent neighborhood-level changes.

The results of this study should be interpreted as associational rather than causal. GWR identifies spatially varying relationships between purchasing power and gambling store distribution, but it does not establish causal directionality. The observed patterns may reflect a combination of regulatory frameworks, commercial location strategies, historical land-use patterns, and unobserved socioeconomic factors.

Moreover, areas exhibiting strong local relationships should not be automatically interpreted as zones of direct vulnerability without complementary qualitative, behavioral, or health-related data. Gambling harm is a multidimensional phenomenon, and the presence of gambling establishments constitutes only one potential exposure factor.

Finally, this analysis focuses exclusively on the relationship between purchasing power and the spatial distribution of Gambling Stores. Other relevant dimensions of urban vulnerability—such as age structure, unemployment, educational attainment, or proximity to schools—are not explicitly modeled. As such, the results should be viewed as a partial and exploratory assessment of urban vulnerability, intended to inform further multivariate and multi-scalar analyses rather than provide a definitive evaluation

Anyway, my aim writing this post is start understanding this potentional multivariable correlation and this is only the first step. Hope you are still reading this and come back every now and then to keep reading 🙂

Sources:
https://datos.madrid.es/portal/site/egob/menuitem.c05c1f754a33a9fbe4b2e4b284f1a5a0/?vgnextoid=66665cde99be2410VgnVCM1000000b205a0aRCRD&vgnextchannel=374512b9ace9f310VgnVCM100000171f5a0aRCRD
https://livingatlas.arcgis.com/en/home/
https://www.txalaparta.eus/es/libros/ludomorfina?srsltid=AfmBOop4ZFCax6eeJt223I94jsH4iLDEKRtJT6Xlrhip7WuS7agtOowR
https://pro.arcgis.com/en/pro-app/3.4/tool-reference/spatial-statistics/how-geographicallyweightedregression-works.htm
https://www.publichealth.columbia.edu/research/population-health-methods/geographically-weighted-regression
https://www.sciencedirect.com/topics/earth-and-planetary-sciences/geographically-weighted-regression
https://carto.com/blog/how-geographically-weighted-regression-works

Alberto C.
GIS Analyst and ex-freelancer

Testing GEMINI for 3D environments. From SketchUp to an unlikely future!

The exercise shows how a simple SketchUp 3D volume, defined solely by its basic geometry, can be transformed into a complex architectural proposal. Starting from the initial schematic model, the system interprets proportions, levels, and shapes, and converts them into a fully developed building, complete with textures, vegetation, lighting, and an urban context.

It is a good test to evaluate the extent to which AI understands spatial structure and is capable of generating a coherent and rich interpretation from a minimal geometric skeleton.

Adding a bit of perspective to this post, myself I have always loved 3D and I have worked over it in the past using it from different approaches, starting from architectural modelling to massive QC of LOD1 buildings anywhere in the world. That’s why I see these Gemini features as something kind on interesting and related to my world.

Some old 3D models sold to Telespazio back in 2008 🙂

Alberto C.
Geodata Analyst, ex 3D designer and always a curious person 🙂

Mapping Something Unthinkable: Flood Risk in Madrid using Open Data

Dont get wrong if you see the IA background showing our handsome major showing his beautiful smile in Cibeles/Correos, it’s only to get your precious attention (only if you need it thou!). Flooding in urban environments is not a speculative hazard but something we can quantify. In the case of Madrid, the intersection of pretty mountainous terrain and urban expansion presents a scenario of significant risk, particularly when analyzed through the lens of shared high-resolution geospatial data (it might surprise you there are 2000m difference between the highest spot in Madrid province, Pico Peñalara -2428m- and the Alberche river environment in some areas -430m-).

Major Almeida is surprised to get to know how risky might be a heavy rain episode over Madrid area

This study integrates the buildings from BTN (Base Topográfica Nacional) provided by the Spanish “IGN”, the CNIG with the official flood hazard maps for a 100-year return period (T=100), published by the Ministry for the Ecological Transition and the Demographic Challenge (MITECO). The T=100 scenario is the most representative for evaluating long-term flood exposure, as it reflects events with a 1% annual probability—rare but not improbable, and certainly not negligible.

If we measure preliminarily, only 0.999% (8097/810134) of all buildings (available in our latest BTN buildings provided by CNIG) would be affected but but if we go deeper, this means a lot in specific spots: Aranjuez for instance would be very affected by the Tajo river flood.

Focused over Aranjuez area as I find it representative and we happen to find lots of buildings in likely flood areas (check it yourself in GEE)

The spatial overlay of flood-prone zones with demographic and land-use data reveals a concerning concentration of residential population within areas designated as ARPSIs (Áreas de Riesgo Potencial Significativo de Inundación). These zones include dense urban districts along the Manzanares River and low-lying areas in Arganzuela, Usera, and parts of Puente de Vallecas.

Histogram of those 8097 buildings (according to BTN/CNIG database of all buildings over Madrid)

While precise population figures vary depending on the granularity of census data, preliminary estimates suggest that tens of thousands of residents could be directly affected by a flood event of this magnitude. The implications extend beyond displacement and property damage, encompassing public health risks, disruption of essential services, and long-term socioeconomic instability.

Global Mapper was used to geoprocess and visualize geodata

Economic activities within these flood zones are diverse and structurally significant. Central districts host a high concentration of retail, hospitality, and cultural institutions, while peripheral zones near the M-30 corridor accommodate logistics, warehousing, and industrial operations. The exposure of these sectors to flood risk implies not only direct financial losses but also cascading effects on employment, supply chains, and urban mobility. Moreover, the presence of public infrastructure—transport nodes, administrative buildings, and emergency services—within these vulnerable areas raises questions about the resilience of the city’s operational backbone.

*Retail and hospitality in central districts (e.g., Lavapiés, La Latina)
*Logistics and warehousing near the M-30 corridor
*Cultural and tourism assets, including museums and heritage sites
*Public infrastructure, such as metro stations, bus depots, and administrative buildings

Disruption in these zones could result in multi-million euro losses, not only from direct damage but also from prolonged service interruptions.

I have uploaded a small sample over Aranjuez to my GEE interface (is not that I am from this place, I only find it representative!). You can quickly tune the JavaScript code to be able to count the amount of buildings affected, the area of all of them and the DTM range of the flood DTM REM (Relative Elevation Model) over your actual screen.
https://code.earthengine.google.com/1d7e907283e33dc4b468ab3adf578840

Cloud computation calculation in GEE. Flood Risk in Madrid. Check it yourself using the code below!!!

Of particular concern is the identification of facilities regulated under Annex I of Directive 96/61/EC, which governs integrated pollution prevention and control.

ArcGIS Pro was also used for geoprocessing and visualization

*Fuel storage and distribution centers
*Waste treatment plants
*Industrial facilities with hazardous materials

These include fuel depots, waste treatment centers, and industrial sites handling hazardous substances. In the event of flooding, such facilities pose a risk of accidental contamination, with potential impacts on protected zones defined in Annex IV of Directive 2000/60/EC. These zones include drinking water abstraction points, Natura 2000 habitats, and recreational waters. The spatial proximity of these sensitive areas to flood-prone industrial sites underscores the need for integrated risk assessment that goes beyond hydrological modeling and incorporates environmental and public health dimensions.

(i) Drinking water abstraction points
(iii) Habitats designated under Natura 2000
(v) Recreational waters and sensitive ecosystems

The visual simulations accompanying this analysis are intentionally exaggerated. They do not represent predictive models but serve as heuristic devices to provoke reflection and debate. By depicting iconic Madrid landmarks submerged under chaotic floodwaters, these images challenge the viewer to confront the consequences of urban planning decisions that disregard hydrological constraints. They are not intended to alarm, but to illustrate the scale of disruption that could result from a statistically plausible event.

Please note these features have been taken out of the scope of this preliminary approach for analysis Flood in Madrid province using Open Data.

Again, this study/first approach would not have been possible without access to open geospatial data. The availability of national datasets such as the Base Topográfica Nacional and flood hazard maps from MITECO exemplifies the transformative potential of public data infrastructures. However, the mere existence of data is insufficient. What is required is a culture of proactive use—by planners, policymakers, and civil society—where risk is treated not as an abstract probability but as a concrete design constraint.

The presence of vulnerable populations, critical infrastructure, and environmentally sensitive facilities within flood-prone zones constitutes a non-assumable risk. It is a risk that could be mitigated through better zoning, stricter regulation, and investment in adaptive infrastructure. The cost of inaction is not only economic but ethical. Avoidable losses—whether of property, livelihoods, or ecosystems—are a failure of foresight, not fate.

For full maps, methodology, and downloadable layers, visit geovisualization.net.. Flood Risk Maps T=100 (MITECO): https://www.miteco.gob.es/es/cartografia-y-sig/ide/descargas/agua/riesgo-inundacion-fluvial-t100.html

Sources:
https://centrodedescargas.cnig.es/CentroDescargas/listaFicheros
https://www.miteco.gob.es/es/agua/temas/gestion-de-los-riesgos-de-inundacion/snczi.html

Software used: ArcGIS Pro 3.6 + Global Mapper 26.2 + Photoshop X + Copilot (IA cover) (+GEE)

Spatial relationship between “high schools” and “betting shops” in Madrid. A first approach (1)

It is a fact that a betting shop (AKA bookmaker in UK or bookie in the US) should not be close to a secondary school. Its obvious the impact on population ranging something like 12-18 could be higher than in other age thresholds. How close? 100m? 300m? 500m? Euclidean distance (a straight line) or following the street network?. In any case, if I decide choosing for instance a range of 500m, for example, 81% of betting shops in Madrid have secondary schools within that distance (258 out of 316). Looking at it from the secondary schools’ point of view, almost 60% of secondary schools have betting shops within 500m (171/291). This is undoubtedly an issue that needs to be addressed.

The spatial analysis reveals a moderately negative correlation (r = -0.39) between the distance to the nearest betting shop and the number of betting shops within a 500-meter radius of each secondary school. This means that, on average, the shorter the distance between a school and the nearest betting house, the greater the number of betting houses found within a short walking distance (for example, within 500 meters). In other words, schools that are close to one betting house are very often close to several. Conversely, schools located farther from any betting house tend to be in areas where these establishments are much less common or even absent. This pattern does not imply a perfect one-to-one relationship — some exceptions exist — but the overall trend is clear: betting houses are not randomly distributed in the city. Instead, they tend to form clusters, and those clusters often appear in the same parts of the city where many schools are located. Although the relationship is not perfectly linear, it might indicate a potential spatial association between the two types of locations.

From the heat map, this relationship becomes more tangible. 171 out of 291 secondary schools in Madrid — approximately 60% of the total — are situated within 500 meters of at least one betting shop. The density surface highlights four clear hotspot zones: Usera, Carabanchel, Centro, and Tetuán. These districts concentrate the majority of the spatial overlap between High Schools and betting shops , forming well-defined high-intensity clusters. The pattern aligns with socio-economic and demographic realities: these are traditionally dense urban districts with younger populations and, in several cases, lower average income levels, which have historically attracted a higher concentration of gambling venues.

Concentration of Problem zones (threshold 500m) Usera, Carabanchel, Centro, and Tetuán

By contrast, the richest districts — such as Salamanca, Retiro or Chamberí — show a more dispersed and lower-intensity pattern. This uneven distribution reinforces the idea that the proximity of betting shops to High School educational institutions is not random, but rather follows a spatial logic influenced by the urban and social fabric of the city.

In practical terms, the combination of a negative correlation and the hotspot clustering suggests that High Schools and betting shops in Madrid exhibit a clear spatial association. The proximity of many High Schools to gambling establishments may have implications for urban planning, youth protection policies, and spatial regulation of gambling activities. The results provide a quantitative foundation for further policy discussions on how the city’s gambling landscape interacts with its educational network.

Pearson correlation. Number of betting houses vs High Schools. Pearson r: -0.39

Methodology: Open Data Madrid provided ‘Censo de locales, sus actividades y terrazas de hostelería y restauración. Histórico’ which I cleaned and cross-referenced to obtain the venues dedicated to gambling and having an active licence (2020). Then I downloaded Madrid High Schools (those offering public and private secondary education). Based on Madrid districts I was able to measure distances and highlight the most affected areas. I used GEE to performed the Pearson Correlation (you have the code below). Maps where created using Global Mapper v26.2.

GEE interface. Cloud Computation calculation

What if I can get a GeoJSON just by feeding the system with the required threshold? (i.e this code returns the 171 secondary schools in Madrid which have betting houses closer than a 500m radius from them). It’s crazy 171/291!! (almost 60%)?:
https://code.earthengine.google.com/e15e0ee2c0269d8b3e4a199d444620d3

“58% Schools have bookies inside a 500m buffer”

TRY IT YOURSELF!!!

What if I can get a GeoJSON just by asking the minimum distance to each other? (i.e this code returns the 98 bookies (betting shops) which are closer than 100m to other bookie?:
https://code.earthengine.google.com/662af463a6c4a4b3eca9d9dbcb6fd5fc

31% of bookies are closer than 100m to another bookie

TRY IT YOURSELF!!!

Additional considerations:

The regulations governing the distance between betting shops and educational establishments such as secondary schools vary depending on the autonomous community. Some communities have implemented or proposed minimum distances, such as a 300-metre radius in some regional laws, while others continue to apply shorter distances or are in the process of reviewing their regulations. It is essential to consult the specific legislation of each community to find out the exact regulations and how they are applied. Examples of regulations by autonomous community (CCAA):

*Community of Madrid: Established a minimum distance of 100 metres between gaming halls and schools/secondary schools in 2019. Although extensions were proposed, the current regulation remains at 100 metres, with the possibility of some existing premises being exempt, according to El Mundo.
*Galicia: Its gaming law establishes a minimum distance of 300 metres from educational centres for new gaming establishments, increasing the previous linear distance of 150 metres, according to GaliciaPress.
*Andalusia: The proposal to extend the distance to 500 metres has been taken to Parliament, although the current regulation may be different, as reported in ABC.
*Other regions: Some cities, such as Talavera de la Reina in Castilla-La Mancha, have included a minimum distance of 300 metres, backed by court rulings, according to different sources.

Distance can be measured in a straight line or in radius, and this difference is important when applying the regulations.
Premises that were already in operation before the regulations were implemented in some regions are often exempt.
In the case of Madrid, proposals to ban betting shops from being located less than 500 metres from educational establishments have even been analysed, as reported by Telemadrid.

Sources:
https://gestiona.comunidad.madrid/wleg_pub/secure/normativas/contenidoNormativa.jsf?opcion=VerHtml&nmnorma=12618&eli=true#no-back-button
https://www.ordenacionjuego.es/
https://www.telemadrid.es/programas/cronicas-subterraneas/investigaciones/apuestas/programa-completo-2-2072812740–20181202113000.html
https://code.earthengine.google.com/3c977b75e9f966a7c0ec3a8dd96588c6?noload=true

Alberto C.
Geodata Analyst

Precision Elevation Data for Forest Giants: LiDAR vs ETH Global Canopy Height in Mata do Buçaco (Portugal)

Why precision elevation data matters

High‑resolution elevation data underpins almost every spatial analysis we do in GIS—especially in forests where vertical structure defines habitat, biomass, wind exposure, fire behavior, hydrology, and the microclimates that sustain rare species. In rugged or densely vegetated environments, a coarse or biased elevation model propagates error everywhere: orthorectification drifts, hillshades mislead, slope/aspect misclassify, and canopy metrics saturate. The result is decisions made on blurred terrain that hides the very patterns we seek to manage. Precision elevation—derived from airborne LiDAR (Light Detection and Ranging)—solves this by separating the ground from the vegetation and delivering both a bare‑earth Digital Terrain Model (DTM) and a Digital Surface Model (DSM). Subtracting DTM from DSM gives a Canopy Height Model (DHM) that captures the true vertical architecture of the forest at sub‑meter resolution.

My visit to the “premises” late August 2025 (almost yesterday!)

This post uses the Mata do Buçaco (Bussaco), near Luso in central Portugal, to illustrate why precision matters and how it compares to the widely used global product ETH_GlobalCanopyHeight_2020_10m_v1. We will look at the site, the LiDAR technology, and a practical comparison workflow for GIS users.

Surprise! a RMSE deviation result too high! Why?

Mata do Buçaco: a compact sanctuary of forest giants

Mata do Buçaco is a walled arboretum and national forest just above the spa town of Luso, north of Coimbra. Despite its modest footprint (~1.0–1.5 km across, ~100–110 hectares), it packs a dendrological collection of remarkable diversity curated over centuries. The topography rises from low foothills to the crest of the Serra do Buçaco, creating a humid, fog‑prone microclimate with precipitation notably higher than the surrounding region. That microclimate, plus deliberate introductions by botanists and gardeners since the 17th century, explain today’s extraordinary vertical structure: towering conifers (including giant sequoias), Mexican cypress, Atlantic and Tasmanian eucalypts, and groves of native broadleaves stitched between ornamental plantings and relic laurel‑oak patches.

Walk any of the shaded paths and the “feel” of the forest is its third dimension: deep crowns stacked in tiers, emergent stems breaking above the canopy, and abrupt transitions where the slope pitches toward gullies and water stairs like the Fonte Fria. For mapping, this means Buçaco is the perfect stress‑test for vertical data. Local reports and lidar‑based profiles identify emergent trees approaching 60–65 m in height—exceptional for continental Europe—and many stands with 40–55 m canopy tops (giant sequoias Sequoiadendron giganteum, Tasmanian mountain ash Eucalyptus regnans, and mature Eucalyptus globulus among others). Add the steep relief and stone architecture of the palace‑convent complex and you have a site where coarse models tend to smear peaks, clip crowns, and understate vertical extremes.

From a data user’s perspective, Buçaco is interesting because it’s small enough to survey with dense airborne LiDAR yet diverse enough to benchmark against global canopy products. It’s also highly visited and well‑documented, which makes it a prime candidate for open, reproducible analyses that other practitioners can repeat.

LiDAR (and why it excels in forests)

Principle of operation. Airborne LiDAR instruments emit near‑infrared laser pulses toward the Earth’s surface and record the time‑of‑flight of returned photons. Distance = (c × Δt) / 2, where c is the speed of light and Δt is the measured two‑way travel time.
Full‑waveform vs discrete return. Modern sensors either store the entire returned energy waveform (full‑waveform) or extract distinct echoes (discrete returns). In forests, multiple returns (first, intermediate, last) capture interactions with the canopy top, internal branches, understory, and finally the ground.
Point cloud. Each pulse becomes a 3D point with XYZ, intensity, scan angle, GPS time, and often classification labels (ground, vegetation, building, water). Typical densities for national programs range from 2 to >12 points/m²; local surveys can exceed 20–30 points/m².
DTM and DSM. Ground classification filters (e.g., progressive TIN densification, cloth simulation) isolate ground returns to build a DTM. Interpolating the highest returns per cell builds a DSM that traces the top of canopy and built features.
Canopy Height Model (DHM). DHM = DSM − DTM at a chosen grid (often 0.5–2 m). Because the DTM is true bare earth, DHM measures canopy height above ground rather than above sea level—critical on steep slopes like Buçaco’s.
Vertical accuracy. With good boresight calibration and GNSS/INS trajectories, vertical RMSE for DTMs is commonly 5–15 cm in open ground; DHM accuracy depends additionally on canopy penetration and interpolation choices but still outperforms passive methods.
Structure metrics. From the point cloud or DHM we derive height percentiles (P10…P95), gap fraction, rugosity, leaf‑area proxies, and individual‑tree segmentation. These are the metrics that drive biomass, habitat, windthrow risk, ladder‑fuel detection, and view‑shed quality.
Radiometry & intensity. Intensity encodes target reflectance and range effects; after calibration, it helps distinguish materials (e.g., conifer vs broadleaf, moisture gradients) and detect powerlines or archaeological traces.
Waveform advantages. Full‑waveform captures the vertical distribution of scattering elements; deconvolution yields canopy penetration in denser stands and improves ground detection under eucalyptus and conifers.
Limitations. LiDAR is weather‑ and budget‑dependent. Dense undergrowth, scan angle, and leaf‑on conditions can reduce ground hits. Interpolation choices (max vs. percentile) affect DHM peaks—important when claiming “record” trees.

Bottom line: when you need true heights, crown architecture, and centimeter‑scale terrain under forest, airborne LiDAR remains the gold standard.

The global benchmark: ETH_GlobalCanopyHeight_2020_10m_v1

The ETH Zurich Global Canopy Height (GCH) product provides a wall‑to‑wall canopy top height map at 10 m ground sampling distance for the year 2020. It fuses GEDI lidar footprints (spaceborne, sparse but vertically precise) with globally consistent Sentinel‑2 optical imagery using a deep learning model to predict canopy heights between footprints. The result is a globally consistent raster that is easy to stream in Earth Engine or GIS platforms and ideal for continental to global analyses where airborne LiDAR is unavailable.

Global Canopy Height in TIF format extracted from GEE cloud computation

Visualization of Global Canopy Height over the spot

Strengths

Global coverage at 10 m with a single epoch (2020), enabling cross‑region comparisons.
Trained on physically meaningful lidar targets (GEDI L2A/L2B canopy top metrics), correcting for many radiometric and terrain confounders in passive imagery.
Includes uncertainty metrics and tends to preserve macro‑patterns (ecotones, disturbance scars, plantation heights).

Known trade‑offs for sites like Buçaco

Saturation at the tall end. In stands with emergent stems >50 m, 10‑m pixels average crowns and can under‑predict peak heights; local maxima are “flattened.”
Terrain complexity. On steep slopes, small georegistration or DTM mismatches between Sentinel‑2 and GEDI training can leak terrain into predicted canopy height.
Edge effects. The palace complex, walls, and clearings introduce sharp transitions that are sub‑pixel at 10 m, broadening edges and obscuring narrow corridors.
Understory structure. The model predicts canopy top height, not vertical distribution; it cannot replace LiDAR‑derived structure metrics for habitat or fire modeling.

In short, ETH GCH is an excellent baseline and context layer, but for site‑scale management, airborne LiDAR remains the reference.

Practical comparison: LiDAR DHM vs ETH GCH over Buçaco’s 8818 vegetation GCP

Below is a workflow you can reproduce in QGIS/ArcGIS Pro or Google Earth Engine (GEE):

Ingest data.
- Airborne LiDAR: download the point cloud (LAS/LAZ) or prebuilt DTM/DSM tiles for the Buçaco area.
- ETH GCH 2020: load the ETH_GlobalCanopyHeight_2020_10m_v1 raster.
Build the LiDAR DHM.
- Classify ground → DTM (0.5–1 m).
- Highest‑return DSM (0.5–1 m) with spike filtering over built structures.
- DHM= DSM − DTM, then smooth lightly (e.g., Gaussian σ = 0.5–1 px) preserving peaks.
Harmonize grids. Aggregate DHM to 10 m by maximum or high percentile (P95) to compare fairly with ETH pixels while preserving tall peaks.
Sample and compare.
- Randomly sample 5,000–20,000 points (I created 8818 GCP to sample) within the forest wall; extract DHM_10m and ETH_10m.
- Compute bias (ETH − DHM), RMSE to see where ETH under/over‑estimates–> RMSE=12,97m (a bit too high!). Please try the code in GEE and you will also see a deviation map.
Tall‑tree check. Use a local maxima detector on the 1 m DHM to identify emergent crowns; intersect with ETH to quantify peak loss at 10 m.
Topographic controls. Regress residuals against slope, aspect, and curvature from the LiDAR DTM to diagnose terrain‑related biases.
Reporting. Summarize by species zones (sequoia groves, eucalyptus stands, relic laurel) if you have stand polygons or classify by crown texture.

Typical outcome in Buçaco (what to expect):

Median ETH bias close to zero over mid‑height stands (20–35 m).
Increasing underestimation in the tallest groves (e.g., −5 to −12 m at local maxima).
Larger residuals near walls/buildings and along steep steps and gullies.

RMSE calculation (think it needs further development thou)

Why sharing these data multiplies their value

Open elevation and canopy datasets have network effects. When agencies publish LiDAR DTMs/DSMs and derived DHMs under open licenses, practitioners can:

Validate global products locally, quantifying where models work and where they fail.
Stack analyses, from biodiversity corridors and storm‑damage assessments to micro‑hydrology, archaeology, and trail design, all anchored to the same precise terrain.
Build reproducible workflows, so results can be peer‑checked, improved, and extended.
Accelerate response, e.g., after windstorms or fires when canopy loss and debris flows must be mapped within days.
Educate and engage, by providing compelling 3D visualizations that show citizens and decision‑makers the invisible vertical dimension of their landscapes.

Portugal’s national investment in open, high‑accuracy remote sensing—airborne LiDAR and very‑high‑resolution imagery—has put the country to the level of Spain or France in terms of accurate shared open data.

Key sources used

Overview and context on Mata do Buçaco’s location, flora, and microclimate. Wikipedia
ETH Global Canopy Height 2020 (10 m) description and access. Nico Lang nlang.users.earthengine.app research-collection.ethz.ch Gee Community Catalog
Portugal DGT’s open LiDAR and elevation data catalog (for the “open data” section). Centro de Dados | DGT+1 EuroGeographics
GEE code for ETH – Global Canopy Height extraction https://code.earthengine.google.com/7a933d8ea6fa99a1c71690e14afd11d2

Hope you guys like it.

Alberto C.
GIS Analyst, Open Data evangelist and Portugal lover

¡Al final se nos quema la península este 2025!

Este agosto, España y Portugal estan viviendo una temporada de incendios excepcionalmente dura. En España, las llamas han calcinado ya casi 390.000 hectáreas (más de seis veces la media reciente) y han dejado varias víctimas mortales, en Portugal, las superficies quemadas superan las 200.000 hectáreas, el 2% del total de su territorio (!) muy por encima del promedio 2006–2024 para estas fechas. El humo cruzó fronteras y degradó la calidad del aire a cientos de kilómetros…

A escala europea, el área ardida acumulada a 19 de agosto asciende a 895.000 hectáreas—casi cuatro veces la media para esta época del año. Es un pico que confirma lo excepcional del verano. ¿No hay algo raro?. ¿Es escepcional?

Wildfire Ermida, Portugal. Early August 2025

Wildfire Verin (Ourense), Spain. Mid August 2025

Fire Danger Chart (Burnt Area Locator, EFFIS) – https://effis.jrc.ec.europa.eu/apps/effis_current_situation?p_fdf_day=20250822&p_wf_id=0&p_bl=maptiler_hybrid&p_rda_layers=modis_hs,viirs_hs&p_rda_range=2025-07-23/2025-08-22&p_opt_layers=ghsl

Si quieres entender un episodio como este sin ruido de los medios, el EFFIS (European Forest Fire Information System, del programa Copernicus) es tu herramienta:

Mapa de situación actual: muestra el peligro de incendio (FWI), focos activos y evolución diaria en Europa y Mediterráneo. Es la referencia pública y se actualiza continuamente.
Detección NRT de focos: integra “hotspots” (MODIS/VIIRS) que llegan en cuestión de horas tras el paso satelital para ver dónde arde ahora mismo.
Evaluación rápida de daños (RDA): mapea áreas quemadas con imágenes diarias para dimensionar el impacto durante la campaña, también con latencia de pocas horas.

Además, el Mecanismo de Protección Civil de la UE ha preposicionado medios aéreos y brigadas (rescEU) en Portugal y España este verano, lo que marca diferencia en los picos de simultaneidad.

EFFIS system. Copernicus (Burnt Area Locator)

La pregunta es, ¿por qué este año está habiendo tantos incendios?. No es un único factor: es la suma, y este año han “encajado” a mi juicio demasiadas piezas.

*Vientos y meteorología de episodio: rachas y condiciones locales han “disparado” la propagación en varios frentes activos. (Se aprecia en el rastro de humo en las imágenes satelitales sobre toda la península el 15 de agosto).

*Calor prolongado y sequedad extrema: la ola de calor más larga registrada en España (16 días) dejó combustibles finos listos para arder y propagarse rápido.

*Déficit hídrico acumulado: menos lluvia efectiva y suelos más secos elevan el índice de peligro (FWI) y alargan ventanas de ignición.

*Paisajes con mucha “carga de combustible”: Excesos de lluvia primaveral (llegando en elgunos casos a superar varias veces la media del periodo, como en Madrid), por otro lado el sempiterno abandono rural y la continuidad del matorral/masa forestal favorecen los fuegos de gran tamaño, especialmente con viento. (Conclusión coherente con los incendios recientes y el patrón observado en Galicia, Castilla y León y Extremadura).

Para finalizar no nos olvidemos del fuego de orígen antrópico de muchas igniciones: desde negligencias (gente haciendo barbacoas en medio el bosque, increíble, ¿no?) a presuntos incendios provocados investigados por las autoridades, con algunas personas que ya han entrado en prisión preventiva.

Sanabria surroundings. Zamora, Spain. 20250816

Para finalizar, si tienes tiempo, hoy y próximos días: revisa el FWI y su anomalía (riesgo meteorológico), los hotspots (dónde arde), y el RDA (qué se ha quemado). Si el FWI está en Muy Alto/Extremo y hay focos cercanos, espera propagación rápida. Contexto: compara en el portal de estadísticas las hectáreas de 2025 con su media 2006–2024 por país. Te ayuda a separar percepción de magnitud real. Tienes los links abajo.

Ponferrada surroundings. León, Spain. 20250816

Traducción para aquellos que no necesariamente están puestos en la materia (no tienen por qué, por otra parte): No es “mala suerte”. Es exposición + vulnerabilidad + clima más cálido. 2025 es la enésima prueba de estrés. La buena noticia: tenemos un sistema (EFFIS), cooperación europea reforzada y lecciones claras para prevención (paisaje, interfaz urbano-forestal) y respuesta (medios preposicionados, detección temprana). La mala: sin reducir riesgos estructurales y sin adaptación seria, estos picos serán cada vez más frecuentes (un conocido ha perdido por ejemplo su casa y todas sus pertenencias en uno de estos incendios~>nos puede pasar a cualquiera!)… Así, si has llegado hasta aquí, dime qué piensas. En este caso permitidme que os diga que “hay que mojarse”.