MS Thesis Project

Vision-and-Language Navigation for Autonomous Drone Search-and-Return in Urban Environments

Kunyi Yu

Department of Computer Science and Engineering, University of California, Riverside

Abstract

This study explores a search-and-return extension of aerial vision-and-language navigation (VLN), where a drone first follows a language instruction to search for a target and then returns to its starting point. The main challenge is that the return phase needs information collected and used from the search phase. This work builds a framework based on OpenFly-Agent with a return trigger, a landmark memory module, and a return policy with an optional LoRA adapter. The work also constructs SAR-Drone-VLN-3K, a generated dataset of search and return trajectories for training and analysis. Experiments show that the base OpenFly-Agent has limited return ability with landmark prompts, while the current LoRA adapter often moves near the start point but does not stop reliably. The results suggest that search-and-return navigation is more difficult than one-way aerial VLN problems and needs better memory design and decision-making.

Search-and-Return Replays

OpenFly-Agent + Augmented Return Prompt.
OpenFly-Agent + Augmented Return Prompt.

Overview

The project extends one-way aerial VLN into a two-phase task. The agent searches for the target, records compact landmark memory, triggers return, and then navigates back to the original start position.

Search-and-return framework overview
Runtime framework for search-and-return aerial navigation.

Key Figures

Landmark memory module
Landmark memory records selected search observations and converts them into return prompts.
Ground-truth search-and-return replay
Example frames from a ground-truth search-and-return trajectory.

Dataset Analysis

Height distribution
Height distribution
Trajectory length distribution
Trajectory length distribution
Action distribution
Action distribution
Prompt word cloud
Prompt word candidates