Highlights from ICML 2018
ICML (International Conference on Machine Learning) is one of the two top-impact conferences for Machine Learning – the other being NIPS (Annual Conference on Neural Information Processing System). This year the 35th ICML conference took place in Stockholm. With thousands of academics and companies attending this is the place to be if you work at the cutting-edge of Machine Learning. As in many recent years the event was sold out months beforehand. However, the conference makes all of their papers freely available through the Proceedings of Machine Learning Research. Due to the scale of the conference each published work gets a short oral presentation slot – either ten or twenty minutes – along with a poster at the end of the day. With ten presentations taking place simultaneously there is a lot going on, and no way to cover all of it. Here I present some of the highlights from my experiences at ICML this year.
Fairness in Machine Learning
Fairness has become a big concern for Machine Learning (see figure 1 below) due to articles such as Machine Bias  where it was discovered that the Machine Learning used for identifying who will be a criminal was biased towards black people. This has sparked off much debate and work. However, as yet there doesn’t seem to be a clear solution, apart from being aware of the problem and trying to ensure that bias in an existing system is not replicated into a Machine Learning system.
A tutorial  on fairness was run in which the whole concept of fairness was discussed. This included the definition of the different types of discrimination: Statistical discrimination which is considered ethical and fine to practice (e.g. charging more for male car insurance ) as opposed to taste-based discrimination where the discriminator sacrifices utility (e.g. selecting male candidates over female irrespective of ability). Sources of bias were discussed such as biased labels or biased features.
One of the best papers this year was also on the subject of fairness – “Delayed impact of Fair Machine Learning” . In this work the authors look at how processes which are intended to provide fairness can over time cause the opposite effect.
From the Turing test to Rosie
Two of the keynote speakers this year focused on the subject of how we can make computers and eventually robots that can think. Ever since we have had computers people have sought to identify what intelligence is and how we can reproduce it in robots – in one of the speaker’s presentation she asked how could we get to Rosie the robot in the cartoon series The Jetsons. Alan Turing was one of the first to publish in this area with his famous paper defining the Turing test.
Joyce Chi  talked about the problems with teaching robots how to perform tasks such as making a smoothie. These included common background knowledge (we all know how to cut a strawberry), what are items (what is a knife), and what knowledge we can presume (a recipe for making a smoothie wouldn’t tell you to put the lid on the blender). We therefore need to train robots to know these things, but also give them the ability to ask questions as they are learning – a bit like a child. Joyce has had good results using Reinforcement Learning to learn when to ask questions.
Josh Tenenbaum  argued that we don’t yet really have AI, systems don’t have common sense or the ability to make abstract models about the world. He argued that we should look at approaches more like how children learn – one common school of thought is that children use a scientific approach of hypothesis and proof. The need for probabilistic programming was highlighted along with better approaches to bring all the different types of machine learning together. He also argued that we need physics engines – the sort found in games – integrated into these systems to allow them to build models from the world and predict how they will change over time.
Generative Adversarial Networks (GANs) were big at ICML – having their own track. Yoon et al.  tackled the problem of not having enough data through GANs. Here they used other datasets which were similar to the data set they wanted and trained a GAN to translate data to extend the data they did have. In other work Yoon et al.  showed, for cases where some values are missing in the data, that by creating a mask which indicated which items were real and which had been faked this can be used to improve the accuracy of faked data.
Ganin et al.  got computers painting pictures (in painting programs) and use Reinforcement Learning along with a GAN to produce art. Lucas et al.  proposed a solution to the problem that most GANs will only reproduce one type of image. Pu et al.  proposed a method for producing joint distributions using GANs.
Another hot topic in machine learning is the susceptibility to adversarial attacks. Weng et al.  argued that a definition of robustness is required in order to improve the resilience of models and propose such a metric. Karmon et al.  demonstrate how an image can be misclassified by adding a small sub-picture into the main picture, along with what format this sub-picture should take.
Producing Better Networks
The problem of finding methods to produce better Neural Networks was widely covered. Here “better” can be quite ambiguous and depends on the application: It can mean a model with improved accuracy, faster computation or lower energy consumption. Huo et al.  proposed a new approach to computing backpropagation in parallel. Pham et al.  propose an efficient way of identifying the Neural Network (NN) architecture that yields the highest accuracy by creating a super-graph containing all networks that you want to test. Then when you take a NN out of this super-graph you can use the trained weights from the super-graph. Thus, reducing the time to train each neural network you may wish to test. Bender et al.  showed how a one-shot model could be used to more efficiently identify the best network. Jin et al.  proposed sampling from a compact subset of parameters to identify more compact networks. While Cai et al.  propose a path-level approach which allows for results from previously trained networks to be re-used.
Max Welling in his keynote argued that we should be concerned about the intelligence per KWh. He demonstrated how a reparameterization trick can be used to cause most of the weights in a NN to zero – thus allowing for a much simpler network which is less energy consuming. These ideas were also addressed by Dai et al. . While Kalchbrenner et al.  showed how reducing the computation and memory requirements for a network allowed text to voice to be done on your mobile phone.
A related topic is the challenge of objectively comparing the performance of neural networks. Bajgar et al.  discussed the issue of ‘sloppy’ reporting of results in papers. Many papers report the ‘best’ results they obtained after training their network several times. Thus, leading to over-selling their work. They propose a metric which can be re-used by all when reporting results to allow for fair comparison between approaches.
While lots of fantastic research has been presented at ICML 2018 it also became increasingly apparent that a mere demonstration of technical capability is not sufficient for developing and deploying a mature product. Machine learning is still facing major challenges around fairness, security and cost, and this field still has a long way to go.
 Moritz Hardt, UC Berkeley, Fall 2017
 Material available at: https://policylab.stanford.edu/projects/defining-and-designing-fair-algorithms.html
 Although this is now Illegal under EU law, the speaker was from the USA where it is not.
 Keynote: Language to Action: towards Interactive Task Learning with Physical Agents
 Keynote: Building Machines that Learn and Think Like People