This presentation is about content-based indexing and retrieval in large collections of images and videos, with applications to object recognition for visual surveillance and content-based video copy detection. We will begin with an introduction to the most popular content-based image and video descriptors, with a focus on local descriptors that are particularly suited for object recognition and copy detection applications. Then we will describe two recent and orthogonal approaches to improve the descriptors. The first work proposes to enrich these local features by adding semi-local or global visual contexts, with the aim of reducing some problems inherent to these descriptors. We will show that such a context leads to more robust object recognition and is more suitable to visual video surveillance. The second approach concerns the characterization of the dynamical behavior of local descriptors in video sequences. We will demonstrate that tracking local features and indexing their behavior allow a more efficient detection of video copies, while reducing the volume of visual features to compute. The applications we are targeting require efficient access to high-dimensional data known to suffer from the so-called “curse of dimensionality” phenomenon. The presentation will also address this issue.