Computer vision is transforming the collection and processing of digital imagery for ecology and conservation. In aquatic environments, computer vision tools for automatic fish identification are heavily sought after, but robust and open-access fish datasets are hard to find. Here, I share some of the most used, open-access and updated fish datasets for automatic fish detection and classification.
Computer vision
Computer vision is the scientific field that develops and trains computers to understand and interpret objects from digital images or videos. Computer vision can automatically detect, count and even track objects from digital photos.
In ecology and conservation, computer vision is transforming information processing by quickly and accurately analysing the vast amount of digital imagery collected by researchers through previous decades.
Computer vision in aquatic ecosystems
In aquatic environments, one of the most sought-after computer vision tools is a platform that can automatically identify and detect fish species from underwater footage. Since cameras are becoming more common to monitor and study fish populations, researchers are developing tools that can reduce the tedious and time-consuming task of manually analysing the footage.
An automatic tool that streamlines the process and produces results more quickly than humans would be an important achievement for the monitoring of aquatic ecosystems.
Computer vision tools require data
Developing automatic tools for fish detection require LARGE amounts of data. Many aquatic researchers own enormous fish datasets; however, most of these datasets have
restricted availability
fish have not been manually annotated and labelled
What makes an excellent fish computer vision dataset?
A great dataset for a fish computer vision tool requires
1) sufficient images of all the fish to be identified
2) the user already knows the IDs of all the fish
3) the fish have been outlined (mask or bounding box) along with its label
4) the dataset has a variety of environmental conditions (i.e. high and low visibility footage)
An example image of masks outlining the fish. Image provided by the Moreton Bay Environmental Education Centre
While it is difficult to find datasets with these needs, here are some publically available datasets of fish imagery for computer vision tasks. These datasets have been used in several peer-reviewed computer vision/fish classification studies and can help the development of a fish computer vision tool.
Datasets
Extensive datasets from 4 NOAA programs. Include still or video imagery of benthic fish and invertebrates across different locations, depths and backgrounds.
The dataset includes 3,960 images collected from 468 species across different backgrounds and illuminations.
Extensive collection of ~80k fish crops and ~45k bounding box annotations for a wide variety of fish species.
Fish4Knowledge (Fish Detection)
A large and well-known ground-truth dataset with 1700 minutes of fish footage.
Fish4Knowledge (Fish Species Recognition)
Dataset used in LifeClef 2015 competition.
20 manually annotated videos, 15 fish species to support the learning of fish recognition models.
Dual-Frequency Identification Sonar (DIDSON), fishery acoustic observation data of 8 fish species from the USA.
Fish-Pak: an image dataset of 6 different fish species, captured by a single camera from different pools located nearby the Head Qadirabad, Chenab River in Punjab, Pakistan.
Released as pre-print in September 2020. ~40k images for fish classification over ~15 habitats in tropical Australia. Preprint here.
Leave a comment if you know of any other computer vision fish datasets.
Follow the progress of FishID; an automatic platform for fish species identification and abundance quantification through this blog or @seabassphd.
Sebastian Lopez (Seabass), is a PhD Candidate at the Australian Rivers Institute where he is developing and applying artificial intelligence tools to monitor fish populations in marine ecosystems.
Comments