25 Works

Documents, Destiny and Returning to Pakistan

Usman Mahar
In this article, I inquire how rejected asylum seekers and returning migrants make sense of their (im)mobility. More specifically, I focus on the idea of taqdeer (destiny) to explore how irregular Pakistani migrants experience and interpret restrictive mobility regimes. I outline the complex and equivocal attitudes of these migrants or more appropriately mobile bodies upon the rejection of their asylum –– the only means of acquiring kagaz (papers) available to them in Germany. By contextualising...

Automatic Domain Adaptation Outperforms Manual Domain Adaptation for Predicting Financial Outcomes

Marina Sedinkina, Nikolas Breitkopf & Hinrich Schütze
In this paper, we automatically create senti- ment dictionaries for predicting financial out- comes. We compare three approaches: (i) manual adaptation of the domain-general dic- tionary H4N, (ii) automatic adaptation of H4N and (iii) a combination consisting of first man- ual, then automatic adaptation. In our experi- ments, we demonstrate that the automatically adapted sentiment dictionary outperforms the previous state of the art in predicting the finan- cial outcomes excess return and volatility. In particular,...

Sentence Meta-Embeddings for Unsupervised Semantic Textual Similarity

Nina Poerner, Ulli Waltinger & Hinrich Schütze
We address the task of unsupervised Seman- tic Textual Similarity (STS) by ensembling di- verse pre-trained sentence encoders into sen- tence meta-embeddings. We apply, extend and evaluate different meta-embedding meth- ods from the word embedding literature at the sentence level, including dimensionality re- duction (Yin and Schu ̈tze, 2016), generalized Canonical Correlation Analysis (Rastogi et al., 2015) and cross-view auto-encoders (Bolle- gala and Bao, 2018). Our sentence meta- embeddings set a new unsupervised State of...

Language in mind and brain

Christina Sanchez-Stockhammer (Ed.), Franziska Günther (Ed.) & Hans-Jörg Schmid (Ed.)
The question of how human language works is investigated by neuroscientists, psycholinguists and linguists, but there are important differences in their approaches. The aim of the workshop “Language in Mind and Brain” was to bridge the gap between the different research traditions and to explore and expand their common ground in order to contribute to an interdisciplinary, more integrated investigation of human language. Researchers from the fields of neuroscience, cognitive science and linguistics discussed the...

15th Annual Meeting of the European Association of Vertebrate Palaeontologists, Munich, Germany

Proceedings of the 20th International Conference on Multimedia in Physics Teaching and Learning

Predicting the Growth of Morphological Families from Social and Linguistic Factors

Valentin Hofmann, Janet Pierrehumbert & Hinrich Schütze
We present the first study that examines the evolution of morphological families, i.e., sets of morphologically related words such as “trump”, “antitrumpism”, and “detrumpify”, in social media. We introduce the novel task of Morphological Family Expansion Predic- tion (MFEP) as predicting the increase in the size of a morphological family. We create a ten-year Reddit corpus as a benchmark for MFEP and evaluate a number of baselines on this benchmark. Our experiments demonstrate very good...

Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings

Yadollah Yaghoobzadeh, Katharina Kann, Timothy Hazen, Eneko Agirre & Hinrich Schütze
Word embeddings typically represent differ- ent meanings of a word in a single conflated vector. Empirical analysis of embeddings of ambiguous words is currently limited by the small size of manually annotated resources and by the fact that word senses are treated as unrelated individual concepts. We present a large dataset based on manual Wikipedia an- notations and word senses, where word senses from different words are related by semantic classes. This is the basis...

A Multilingual BPE Embedding Space for Universal Sentiment Lexicon Induction

Mengjie Zhao & Hinrich Schütze
We present a new method for sentiment lex- icon induction that is designed to be appli- cable to the entire range of typological di- versity of the world’s languages. We eval- uate our method on Parallel Bible Corpus+ (PBC+), a parallel corpus of 1593 languages. The key idea is to use Byte Pair Encodings (BPEs) as basic units for multilingual em- beddings. Through zero-shot transfer from English sentiment, we learn a seed lexicon for each...

Understanding “Voluntary” Returns to Pakistan

Usman Mahar

A Graph Auto-encoder Model of Derivational Morphology

Valentin Hofmann, Hinrich Schütze & Janet Pierrehumbert
There has been little work on modeling the morphological well-formedness (MWF) of derivatives, a problem judged to be complex and difficult in linguistics (Bauer, 2019). We present a graph auto-encoder that learns em- beddings capturing information about the com- patibility of affixes and stems in derivation. The auto-encoder models MWF in English sur- prisingly well by combining syntactic and se- mantic information with associative informa- tion from the mental lexicon.

Analytical Methods for Interpretable Ultradense Word Embeddings

Philipp Dufter & Hinrich Schütze
Word embeddings are useful for a wide vari- ety of tasks, but they lack interpretability. By rotating word spaces, interpretable dimensions can be identified while preserving the informa- tion contained in the embeddings without any loss. In this work, we investigate three meth- ods for making word spaces interpretable by rotation: Densifier (Rothe et al., 2016), linear SVMs and DensRay, a new method we pro- pose. In contrast to Densifier, DensRay can be computed in...

Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly

Nora Kassner & Hinrich Schütze
Building on Petroni et al. (2019), we pro- pose two new probing tasks analyzing fac- tual knowledge stored in Pretrained Language Models (PLMs). (1) Negation. We find that PLMs do not distinguish between negated (“Birds cannot [MASK]”) and non-negated (“Birds can [MASK]”) cloze questions. (2) Mispriming. Inspired by priming methods in human psychology, we add “misprimes” to cloze questions (“Talk? Birds can [MASK]”). We find that PLMs are easily distracted by misprimes. These results suggest...

Kulturen des Wirtschaftens

Program and Book of Abstracts

13th International Conference on Cochlear Implants and Other Implantable Auditory Technologies. Munich, Germany, June 18-21, 2014

13th International Conference on Cochlear Implants and Other Implantable Auditory Technologies. Munich, Germany, June 18-21, 2014

17th Annual Meeting of the Gesellschaft für Biologische Systematik

Deep Metazoan Phylogeny 2011 - new data, new challenges

Selected Papers from the 20th International Conference on Multimedia in Physics Teaching and Learning

Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum

Tagungsband der 53. Studentischen Tagung Sprachwissenschaft (StuTS), 9. - 12. Mai 2013, LMU

Im vorliegenden Tagungsband sind zehn Vorträge, die auf der 53. Studentischen Tagung Sprachwissenschaft (StuTS) vom 9.-12.05.2013 an der LMU in München gehalten wurden, gesammelt. Die Autoren sind Studenten der Linguistik von verschiedenen Universitäten in Deutschland und im europäischen Ausland. Ihre Aufsätze behandeln Themen, mit denen sich die Studenten in Seminaren und in Abschlussarbeiten beschäftigt haben. Die Aufsätze wurden von anderen Teilnehmern der Tagung in einem Peer Review überprüft.

2. Forum für literaturwissenschaftliche Japanforschung

BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

Timo Schick & Hinrich Schütze
Pretraining deep language models has led to large performance gains in NLP. Despite this success, Schick and Schu ̈tze (2020) recently showed that these models struggle to under- stand rare words. For static word embeddings, this problem has been addressed by separately learning representations for rare words. In this work, we transfer this idea to pretrained language models: We introduce BERTRAM, a powerful architecture based on BERT that is capable of inferring high-quality embeddings for...

Embedding Learning Through Multilingual Concept Induction

Philipp Dufter, Mengjie Zhao, Martin Schmitt, Alexander Fraser & Hinrich Schütze
We present a new method for estimating vector space representations of words: embedding learning by concept induction. We test this method on a highly parallel corpus and learn semantic representations of words in 1259 different languages in a single common space. An extensive experimental evaluation on crosslingual word similarity and sentiment analysis indicates that concept-based multilingual embedding learning performs better than previous approaches.

Registration Year

  • 2021
  • 2020
  • 2019
  • 2018

Resource Types

  • Event