Equitable Data Challenges

We strive for perfectly equitable accessibility for data in every situation, but there are certainly challenges.

Big Data

Big data should not be curated by hand. Even data sets with only thousands of files can be prohibitive. How do we automate and validate accessibility for millions or billions of files, especially with respect to alt text, captions, and transcripts?

Automated Alt Text

There is plenty of research into this topic, particularly in comparing market solutions such as Azure Computer Vision Engine and Cloudsight. We recommend exploring the following resources:

  • LAVIS’s BLIP software: https://github.com/salesforce/LAVIS
  • “Evaluating the effectiveness of automatic image captioning for web accessibility” by Leotta, Mori, and Ribauto

Audio Captioning

Similar to automated alt text, researchers are also looking into the feasibility of automated audio captioning. We recommend reading “Automated audio captioning: An overview of recent progress and new challenges” by Mei, Liu, and Wang.

Interpretation vs. Captioning

Presentations, papers, and websites typically present a researcher’s interpretation of their work. The goal of data release is to allow researchers to interpret the data themselves.

We note, however, that expert captions are themselves a form of ground truth, particularly for machine learning applications. In that sense, bias in ground truth labelling is by no means a new problem, and research into mitigating that bias can also be applied to accessible captions.

An exciting area of research that sidesteps this issue is alternate data probing. We include a few examples of this below:

  • Audio visualization
    • Truskinger, A., Brereton, M., & Roe, P. (2018, October). Visualizing five decades of environmental acoustic data. In 2018 IEEE 14th International Conference on e-Science (e-Science) (pp. 1-10). IEEE.
    • Cottingham, M. D., & Erickson, R. J. (2020). Capturing emotion with audio diaries. Qualitative Research, 20(5), 549-564.
  • Visual sonification
    • Ali, S., Muralidharan, L., Alfieri, F., Agrawal, M., & Jorgensen, J. (2020). Sonify: making visual graphs accessible.” In Human Interaction and Emerging Technologies: Proceedings of the 1st International Conference on Human Interaction and Emerging Technologies (IHIET 2019), August 22-24, 2019, Nice, France (pp. 454-459). Springer International Publishing.
    • Sawe, N., Chafe, C., & Treviño, J. (2020). Using data sonification to overcome science literacy, numeracy, and visualization barriers in science communication. Frontiers in Communication, 5, 46.
  • Tactile vibration responses
    • Yoshioka, T., Bensmaia, S. J., Craig, J. C., & Hsiao, S. S. (2007). Texture perception through direct and indirect touch: An analysis of perceptual space for tactile textures in two modes of exploration. Somatosensory & motor research, 24(1-2), 53-70.
    • Otake, K., Okamoto, S., Akiyama, Y., & Yamada, Y. (2022). Tactile texture display combining vibrotactile and electrostatic-friction stimuli: Substantial effects on realism and moderate effects on behavioral responses. ACM Transactions on Applied Perception, 19(4), 1-18.

Raw/Original Data Components

The goal of releasing a dataset is to connect cutting edge research to real application pipelines. Algorithms developed on raw data have a clear path back to applications pipelines. Raw data processing is itself a field of research.

Release raw data is critically important. But there are inherent challenges. Raw data could have:

Integral Color

The data could have integral color information (e.g., red/green, large blocks of harsh colors, etc.) that is necessary to be interpreted. An example is the NABirds Dataset.

Three birds from the NABirds dataset
Three birds from the NABirds dataset

In the image above, these three birds are intrinsically separated by the quality of their color. This makes this dataset inherently inaccessible to researchers with color vision deficiency, visual hypersensitivities, or migraines.

Flashing Light Sequences

The data could have flashing light sequences at more than 3 flashes per second. An example is AudioSet – a large-scale dataset of manually annotated audio events.

A snapshot from the AudioSet dataset
A snapshot from the AudioSet dataset

Datasets such as these intrinsically include data that may be difficult for researchers with photosensitive epilepsy, visual hypersensitivities, and migraines.

Harsh, Sudden, or Incessant Sounds

Raw data could contain harsh, sudden, incessant, or buzzing sounds. An example is the TUT Rare sound events development dataset.

Haulikko, kaikuvia laukauksia ulkona / A shotgun, echoing single shots, exterior by YleArkisto -- https://freesound.org/s/255267/ -- License: Attribution 4.0
Haulikko, kaikuvia laukauksia ulkona / A shotgun, echoing single shots, exterior by YleArkisto — https://freesound.org/s/255267/ — License: Attribution 4.0

This data can be difficult for researchers with auditory hypersensitivities or post-traumatic stress disorder (PTSD).

Sensitive Topics

Particularly where humans are involved, datasets can include sensitive topics. These can range from abuse to suicide to violence and beyond. An example is the REDDIT C-SSRS Suicide Dataset.

Suicide hotlines across different countries
Suicide hotlines across different countries

These can be particularly difficult for researchers with a history of trauma, PTSD, or an anxiety disorder.

What can be done?

The goal of equitably accessible data is to enable data users to set the terms of their own engagement.

At minimum, you can include a content warning/note on the data access page and in your README. Even better would be to incorporate warning flags into data access APIs.

Even better – create a helper code file or Jupyter notebook that demonstrates how to use accessibility tools or filters on your dataset.


Resources and References
  • Big Data
    • Li, J., Li, D., Savarese, S., & Hoi, S. (2023, July). Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning (pp. 19730-19742). PMLR.
    • Leotta, M., Mori, F., & Ribaudo, M. (2023). Evaluating the effectiveness of automatic image captioning for web accessibility. Universal access in the information society, 22(4), 1293-1313.
    • Mei, X., Liu, X., Plumbley, M. D., & Wang, W. (2022). Automated audio captioning: An overview of recent progress and new challenges. EURASIP journal on audio, speech, and music processing2022(1), 26.
  • Audio visualization
    • Truskinger, A., Brereton, M., & Roe, P. (2018, October). Visualizing five decades of environmental acoustic data. In 2018 IEEE 14th International Conference on e-Science (e-Science) (pp. 1-10). IEEE.
    • Cottingham, M. D., & Erickson, R. J. (2020). Capturing emotion with audio diaries. Qualitative Research, 20(5), 549-564.
  • Visual sonification
    • Ali, S., Muralidharan, L., Alfieri, F., Agrawal, M., & Jorgensen, J. (2020). Sonify: making visual graphs accessible.” In Human Interaction and Emerging Technologies: Proceedings of the 1st International Conference on Human Interaction and Emerging Technologies (IHIET 2019), August 22-24, 2019, Nice, France (pp. 454-459). Springer International Publishing.
    • Sawe, N., Chafe, C., & Treviño, J. (2020). Using data sonification to overcome science literacy, numeracy, and visualization barriers in science communication. Frontiers in Communication, 5, 46.
  • Tactile vibration responses
    • Yoshioka, T., Bensmaia, S. J., Craig, J. C., & Hsiao, S. S. (2007). Texture perception through direct and indirect touch: An analysis of perceptual space for tactile textures in two modes of exploration. Somatosensory & motor research, 24(1-2), 53-70.
    • Otake, K., Okamoto, S., Akiyama, Y., & Yamada, Y. (2022). Tactile texture display combining vibrotactile and electrostatic-friction stimuli: Substantial effects on realism and moderate effects on behavioral responses. ACM Transactions on Applied Perception, 19(4), 1-18.
  • Raw/Original Data Components
    • Van Horn, G., Branson, S., Farrell, R., Haber, S., Barry, J., Ipeirotis, P., … & Belongie, S. (2015). Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 595-604).
    • Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., … & Ritter, M. (2017, March). Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 776-780). IEEE.
    • “[HSD] Practise Toniic, ChrizZ, Feybi (Melbourne Shuffle Hamburg).” YouTube, uploaded by Feybi11 16 July, 2010 https://www.youtube.com/watch?v=4X2aUZFZlzc&t=31s
    • Diment, A., Mesaros, A., Heittola, T., & Virtanen, T. (2017). TUT Rare sound events, Development dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.401395
    • Gaur, M., Alambo, A., Sain, J. P., Kursuncu, U., Thirunarayan, K., Kavuluru, R., Sheth, A., Welton, R., & Pathak, J. (2019, May 4). Reddit C-SSRS Suicide Dataset. The World Wide Web Conference. https://doi.org/10.5281/zenodo.2667859
    • Michal Ptaszynski, Agata Pieciukiewicz, Pawel Dybala, Pawel Skrzek, Kamil Soliwoda, Marcin Fortuna, Gniewosz Leliwa, & Michal Wroczynski. (2022). Expert-annotated dataset to study cyberbullying in Polish language [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7188178