| Datasets & Tools | Description |
| AIST++ | 3D keypoints with corresponding images for dance motions covering 10 dance genres |
| AutoFlow | 40k image pairs with ground truth optical flow |
| C4_200M | A 200 million sentence synthetic dataset for grammatical error correction |
| CIFAR-5M | Dataset of ~6 million synthetic CIFAR-10–like images (RGB 32 x 32 pix) |
| Crisscrossed Captions | Set of semantic similarity ratings for the MS-COCO dataset |
| Disfl-QA | Dataset of contextual disfluencies for information seeking |
| Distilled Datasets | Distilled datasets from CIFAR-10, CIFAR-100, MNIST, Fashion-MNIST, and SVHN |
| EvolvingRL | 1000 top performing RL algorithms discovered through algorithm evolution |
| GoEmotions | A human-annotated dataset of 58k Reddit comments labeled with 27 emotion categories |
| H01 Dataset | 1.4 petabyte browsable reconstruction of the human cortex |
| Know Your Data | Tool for understanding biases in a dataset |
| Lens Flare | 5000 high-quality RGB images of typical lens flare |
| More Inclusive Annotations for People (MIAP) | Improved bounding box annotations for a subset of the person class in the Open Images dataset |
| Mostly Basic Python Problems | 1000 Python programming problems, incl. task description, code solution & test cases |
| NIH ChestX-ray14 dataset labels | Expert labels for a subset of the NIH ChestX-ray14 dataset |
| Open Buildings | Locations and footprints of 516 million buildings with coverage across most of Africa |
| Optical Polarization from Curie | 5GB of optical polarization data from the Curie submarine cable |
| Readability Scroll | Scroll interactions of ~600 participants reading texts from the OneStopEnglish corpus |
| RLDS | Tools to store, retrieve & manipulate episodic data for reinforcement learning |
| Room-Across-Room (RxR) | Multilingual dataset for vision-and-language navigation in English, Hindi and Telugu |
| Soft Attributes | ~6k sets of movie titles annotated with single English soft attributes |
| TimeDial | Dataset of multiple choice span-filling tasks for temporal commonsense reasoning in dialog |
| ToTTo | English table-to-text generation dataset with a controlled text generation task |
| Translated Wikipedia Biographies | Dataset for analysis of common gender errors in?NMT?for English, Spanish and German |
| UI Understanding Data for UIBert | Datasets for two UI understanding tasks, AppSim & RefExp |
| WikiFact | Wikipedia & WikiData–based dataset to train relationship classifiers and fact extraction models |
| WIT | Wikipedia-based Image Text dataset for multimodal multilingual ML |
参考:https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html