About The Research

Aeriform in-action is a multiview dataset for recognizing human actions in aerial videos. The proposed dataset consists of 32 high resolution videos containing 13 action classes with 55,477 frames (without augmentation) and 400,000 annotations captured at 30fps and a resolution of 3840 x 2160 pixels. The dataset addresses several concerns like camera motion, illumination changes, diversity in actions, dynamic transitions of actions etc. The action classes can be categorized as atomic actions, human- human interactions and human-object interactions. The 13 actions are carrying, drinking, handshaking, hugging, kicking, lying, punching, reading, running, sitting, standing, walking and waving. This dataset will provide a baseline for recognizing human actions in aerial videos and will encourage the embedding researchers to progress the field. Read more

Datasets

The dataset was captured using a multi-rotor hexacopter-X at the UIET, Panjab University, Chandigarh, India. Videos were recorded with a Sony IMX078 camera, featuring 8MP resolution, and a 3-axis solo gimbal set at pitch, roll, and yaw angles of 50°, 0°, and 0° respectively. The camera has a 110-degree field of view (FOV), with its angle set approximately at 50° after experimentation. It encompasses a variety of human interactions and includes individuals of varying genders, attires, heights, and body shapes. The dataset comprises 32 videos, half taken on sunny days and the remainder on cloudy ones, simulating natural scenarios with random actions and partial occlusions. Annotation frames depict bounding boxes around humans, with captions containing object names, actions, and IDs for tracking. Various scenarios captured under different weather conditions, such as shadows and occlusions, pose challenges for action recognition. The dataset showcases a range of action categories, with walking and standing being the most common, followed by handshaking and carrying objects.

Sample Dataset