Multimodal learning with graphs

In our perspective, we observe that artificial intelligence for graphs (graph AI) has achieved remarkable success in modeling complex systems, ranging from dynamic networks in biology to interacting particle systems in physics. However, the increasingly heterogeneous graph datasets call for multimodal methods that can combine different inductive biases—the set of assumptions that algorithms use to make predictions for inputs they have not encountered during training. Learning on multimodal graph datasets presents fundamental challenges because the inductive biases can vary by data modality and graphs might not be explicitly given in the input.

To address these challenges, multimodal graph AI methods combine different modalities while leveraging cross-modal dependencies. Here, we survey 145 studies in graph AI and realize that diverse datasets are increasingly combined using graphs and fed into sophisticated multimodal methods, specified as image-intensive, knowledge-grounded and language-intensive models. Using this categorization, we introduce a blueprint for multimodal graph AI to study existing methods and guide the design of future methods.

align="center" Shown on the left are the different data modalities covered in our multimodal graph learning perspective. Shown on the right are key machine learning tasks for which multimodal graph learning (MGL) has been used successfully.

Below are studies from our perspective and the community on multimodal graph learning (MGL) and how they fall under the MGL blueprint. This website is meant as (1) a resource to those looking to use MGL for their application but unsure how each component is realized in practice and (2) a map of MGL as an emerging field.

This table is live meaning anyone can submit a study to be considered for this table and we will update the table every week. Entries are grouped by application area.

Use this link to submit a study to be added to this table.

Method	MGL Component 1: Identifying Entities	MGL Component 2: Uncovering Topology	MGL Component 3: Propagating Information	MGL Component 4: Mixing Representations	Application	Domain
FuNet	Hyperspectral pixels	Radial basis function similarity	miniGCN (GCN mini-batching)	Concatenation/sum/product	Hyperspectral image quad classification	Image
GraphFCN	Pixels	Edge weights based on Gaussian kernel matrix	GCN with weighted adjacency matrix	Graph loss added with fully connected network	Image semantic segmentation	Image
GSMN	Images/relations/attributes	Visual graph for images combined with textual graph	Node-level and graph-level matching	Similarity function for positive and negative pairs	Image-text matching	Image
RAG GAT	Super-pixels	Region adjacency graph	Graph attention network	Sum pooling combined with an MLP	Superpixel image classification	Image
TextGCN	Words/documents	Occurrence edges in text and corpus	GCN	No mixing/single-channel model	Text classification	Language
CoGAN	Sentences/aspects	Sentences and aspects as nodes	Cooperative graph attention	Softmax decoding block	Aspect sentiment classification	Language
MCN	Sentences/mentions/entities	Document-level graph	Relation-aware GCN	Sigmoid function for probability of entity pairs	Document-level relation extraction	Language
GPGNN	Word and position encodings	Generated adjacency matrix	Message passing	Pair-wise representation product	Relation extraction	Language
QMGNN	Atoms	Chemical bonds	Weisfeiler-Lehman Network and global attention	Concat with quantum mechanical descriptors	Regio-selectivity prediction	Natural science
GNS	Particles	Radial particle connectivity	General graph neural network	No mixing/single-channel model	Particle-based simulation	Natural science
MaSIF	Discretized protein surface mesh	Overlapping geodesic radial features	System of Gaussian kernels on a local geodesic system	No mixing/single-channel model	Ligand site prediction and classification	Natural science
MMGL	Patients	Modality aware latent graph	Adaptive graph learning based on a GCN	Sub-branch prediction neural network	Disease prediction	Natural science