谷歌发布的数据集

2022-01-17 by kongxincai·0评论

Datasets & Tools	Description
AIST++	3D keypoints with corresponding images for dance motions covering 10 dance genres
AutoFlow	40k image pairs with ground truth optical flow
C4_200M	A 200 million sentence synthetic dataset for grammatical error correction
CIFAR-5M	Dataset of ~6 million synthetic CIFAR-10–like images (RGB 32 x 32 pix)
Crisscrossed Captions	Set of semantic similarity ratings for the MS-COCO dataset
Disfl-QA	Dataset of contextual disfluencies for information seeking
Distilled Datasets	Distilled datasets from CIFAR-10, CIFAR-100, MNIST, Fashion-MNIST, and SVHN
EvolvingRL	1000 top performing RL algorithms discovered through algorithm evolution
GoEmotions	A human-annotated dataset of 58k Reddit comments labeled with 27 emotion categories
H01 Dataset	1.4 petabyte browsable reconstruction of the human cortex
Know Your Data	Tool for understanding biases in a dataset
Lens Flare	5000 high-quality RGB images of typical lens flare
More Inclusive Annotations for People (MIAP)	Improved bounding box annotations for a subset of the person class in the Open Images dataset
Mostly Basic Python Problems	1000 Python programming problems, incl. task description, code solution & test cases
NIH ChestX-ray14 dataset labels	Expert labels for a subset of the NIH ChestX-ray14 dataset
Open Buildings	Locations and footprints of 516 million buildings with coverage across most of Africa
Optical Polarization from Curie	5GB of optical polarization data from the Curie submarine cable
Readability Scroll	Scroll interactions of ~600 participants reading texts from the OneStopEnglish corpus
RLDS	Tools to store, retrieve & manipulate episodic data for reinforcement learning
Room-Across-Room (RxR)	Multilingual dataset for vision-and-language navigation in English, Hindi and Telugu
Soft Attributes	~6k sets of movie titles annotated with single English soft attributes
TimeDial	Dataset of multiple choice span-filling tasks for temporal commonsense reasoning in dialog
ToTTo	English table-to-text generation dataset with a controlled text generation task
Translated Wikipedia Biographies	Dataset for analysis of common gender errors in?NMT?for English, Spanish and German
UI Understanding Data for UIBert	Datasets for two UI understanding tasks, AppSim & RefExp
WikiFact	Wikipedia & WikiData–based dataset to train relationship classifiers and fact extraction models
WIT	Wikipedia-based Image Text dataset for multimodal multilingual ML

参考：https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html

机器学习深度学习

发表评论取消回复

要发表评论，您必须先登录。