Glossary Term

Data Bias 

A class imbalance or distortion in the data from what we know is true based on meteorological and other knowledge of interest. 

 

Related Terms: Computational/Model Bias

Data itself can contain biases, which affect the ML model training, evaluation, and deployment. These biases can be caused by underlying human biases (e.g., unintentional or intentional) or by sampling and selection of data. For example, a completely unintentional bias has emerged in which some populations are served by poorer coverage of monitoring equipment like weather radars (see top figure) or hail reports coming mainly along highways and cities with few reports in rural areas (see bottom figure). 
From Jack Sillin @JackSillin: https://twitter.com/JackSillin/status/1372957704138981378?s=20
From Figure 11 in Allen and Tippet 2015: Point reports of hail >0.75 in (1.9 cm), over the Texas Panhandle and surroundings, with population choropleth of intercensal estimated population segregated by Jenks Natural Breaks: a) all reports 1955–2014, shown with mean population 1979–2012; b) hail reports 1955–1979 with 1979 population; c) as for (b) except hail reports 1955–1995 and 1995 population from the CIESIN gridded global population data; d) as for (c) except hail reports 1955–2005 and 2000 population from the CIESIN data. Primary interstates and highways are shown in red.

Return to Glossary