Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data

Speaker: Victor Veitch (Columbia University, USA)

Abstract:

This paper introduces a new method for learning from data with relational structure—for example, a social network, or linked documents. There are many natural ways to (sub)sample such data, and this raises some important practical questions: How should the dataset be randomly split into training and testing sets? How should minibatches be subsampled for stochastic gradient descent? The answer is implicitly determined by the choice of model, but finding the right sampling scheme explicitly is very difficult. We sidestep this problem by introducing a new method for learning from relational data that promotes the sampling scheme to an explicit component of model design. The high level idea is to define a notion of empirical risk for relational data, and learn by minimizing this empirical risk. The relational empirical risk we define is the average loss incurred on random subsamples selected according to the sampling scheme. We show that the minimization problem can be generically (approximately) solved by a simple stochastic gradient algorithm. The new approach significantly simplifies model design, and we give several illustrative examples applying the method to semi-supervised node classification.