AWS, in collaboration with aerial imagery provider Vexcel, has unveiled a new multimodal artificial intelligence solution designed to make vast collections of aerial imagery searchable using natural language. This development addresses a long-standing challenge for industries heavily reliant on geospatial data, including insurance, real estate, government, infrastructure, and agriculture. The system allows users to query billions of pixels of aerial data, such as identifying swimming pools in a suburb or counting solar panels across a city, without the need for manual tile-by-tile inspection or the development of custom computer vision models for each specific search criterion. The collaboration utilized Vexcel's extensive aerial imagery program, which operates one of the world's largest aerial imagery collections, gathering high-resolution data across over 45 countries and territories.
Traditionally, extracting specific information from aerial imagery has been a labor-intensive and often slow process. It typically requires either human analysts to manually review map tiles, inspecting each one in turn, or the creation of specialized computer vision models. These bespoke models demand significant labeled data, considerable engineering time, and ongoing retraining for every new feature or object a customer wishes to identify. This approach is inherently inefficient and costly, particularly when new or evolving search requirements emerge. The new AWS solution integrates advanced multimodal embeddings, large language model (LLM) captioning, and robust vector search capabilities, primarily leveraging AWS services like Amazon Bedrock and Amazon OpenSearch Serverless. This architecture establishes a powerful "index once, query using natural language" paradigm, significantly streamlining the entire geospatial data analysis workflow and reducing operational overhead. The rigorous evaluation process, built on OpenStreetMap ground truth, highlighted Amazon Nova Multimodal Embeddings as a key component, delivering the highest F1 scores in benchmark queries.
The introduction of this multimodal AI search capability is poised to revolutionize how enterprises across various sectors interact with and derive critical insights from geospatial data. For developers, it offers a powerful and flexible framework to build innovative applications that can semantically understand and query visual data at scale, all without the previous burden of per-feature model training. Industries such as insurance can now rapidly assess property damage following natural disasters, real estate firms can quickly identify specific features in development zones, and agricultural businesses can monitor crop health or infrastructure more efficiently than ever before. This significant shift from bespoke model training to intuitive natural-language querying democratizes access to advanced geospatial analysis, enabling faster, more informed decision-making and unlocking entirely new use cases for aerial imagery data globally. The practical application of this work is already evident, as it has evolved into Vexcel Intelligence, a commercial searchable imagery product.