Lexical Normalization for Neural Network Parsing Whereas parser performance on news texts keeps getting closer to human performance, parsers still perform drastically worse out-of-domain. Multiple approaches have already been explored for domain adaptation; up-training, weighing of training data and integrating disfluency detection are some common approaches. In this work we will focus on another approach: normalization. More concretely, we examine different approaches on integrating normalization into a neural network dependency parser. We will focus on Twitter data; we annotated a small treebank using the Universal Dependencies format for evaluation purposes. Our baseline parser is the UUParser(de Lhoneux et al, 2017), which is an Arc-Hybrid BiLSTM parser. This parser exploits character embeddings, and includes an option to initialize with pre-trained word embeddings. We use word2vec to train word embeddings on a big Twitter corpus. Both the character embeddings and the pre-trained model increased the performance of the parser. In a domain adaptation setup, where we train on Wall Street Journal and test on Twitter, the performance improvement is even bigger. This is probably mainly an effect of solving the unknown word problem, which is also adressed by a normalization-based approach. This leads to our main research question: can we make use of normalization to increase performance beyond the use of character level models and pre-trained word embeddings? We use an existing normalization model, which does normalization on the word level. It generates candidates using the Aspell spell checker, word embeddings and a lookup list. Features are directly taken from the generation step and supplemented with N-gram probabilities. A random forest classifier is then trained to predict the probability that a candidate belongs to the `correct' class; this enables the system to output a top-n list of candidates and their probabilities. When using normalization as a straightforward pre-processing component, we observe a small increase in LAS. However, the normalization component makes mistakes, these propagate directly to the parser. And even if we would have access to a perfect normalization sequence, it might still be informative to take the original token into account during parsing. To fully exploit the potential of the normalization model we combine the vectors of the top-n normalization candidates into one vector. We weight the vector of each candidate by the probably from the normalization model, and then sum the vectors of all candidates.It should be noted that this approach can also be generalized to other neural network parsers, and even to other tasks. In our initial experiments normalization improved parser performance, even when using charactel embeddings, and pre-trained word embeddings. The integration of multiple normalization candidates improved performance even further, indicating that the top-n candidates are also informative. More in-depth evaluation will be included in the presentation.