Master 2 internship (MLIA, ISIR, Sorbonne Université)
Context
Ad-hoc Information Retrieval (IR) is at the core of Information Access and deals with the problem of retrieving documents that best satisfy a user's information need (i.e. the basic component of a search engine). With the introduction of Transformers, and specifically BERT11, models have shifted from lexical matching to semantic and contextualized matching, leading to substantial improvements in effectiveness. Since 2019, several architectures have been proposed to address the effectiveness/efficiency trade-off, including dense5 and sparse6 approaches that rely on an index, as well as cross-interaction models like monoBERT7 that are more effective but limited to document re-ordering. Recently, a new type of model appeared, viewing IR as a generative process, unlocking the possibility for better adaptation and generalization8.
Domain adaptation and Zero-Shot Learning are important problems for any application of machine learning, and hence in Information Retrieval - they impact all types of neural models (although dense and sparse approaches in IR are the most affected). This limitation is concerning for two reasons. First and foremost, training data might not be available in sufficient quantities in the target domain (e.g. bio-medical documents or in French). Most works in IR evaluate systems trained on another dataset to assess their generalization capabilities, but designing models able to generalize is an open issue that only a few works have tried to cope with as for instance Houlsby et al.4. Second, datasets are evolving in IR1 which corresponds to the more general question of continual learning 3. In IR, robust models able to cope with new documents without performance decrease are not yet there2.
In the context of the ANR GUIDANCE project, which deals with Dialogue-Based Information Access, the problem of domain adaptation is even more concerning as the models are expected to be more versatile in their outputs.
Internship
The goal of the internship is to gather insights and/or develop new models to cope with domain adaptation. During the internship, one of the two research directions can be followed:
-
building new methods to understand where task and domain knowledge are located in Information Retrieval -- or for other NLP tasks. A very recent and related research area is to understand where knowledge is located in Transformers -- and in particular focusing on the role of feed-forward networks9. This new direction of research for LLMs and might have impacts on domain adaptation, continuous learning, and explainability, since locating knowledge might allow to manipulate it.
-
designing prospective Transformer architectures that aim at building models (somehow) separating knowledge from task - a direction already explored in NLP10, but which is still under-explored.
Other information
This internship might be followed by a thesis (founded by the ANR GUIDANCE project). The internship will be located in the ISIR lab in Sorbonne Université, and supervised by Josiane Mothe (IRIT, Toulouse) and Benjamin Piwowarski (ISIR).
An ideal candidate should:
- Master a deep learning framework (e.g. PyTorch, Tensorflow) and Python;
- Have a good background in applied mathematics (probabilities and linear algebra);
- Ideally, a knowledge of the Natural Language Processing and Information Retrieval research fields.
Please contact benjamin.piwowarski@cnrs.fr and mothe@irit.fr if you are interested.
References
-
Lovón-Melgarejo, J., Soulier, L., Pinel-Sauvagnat, K., & Tamine, L. (2021). Studying catastrophic forgetting in neural ranking models. In D. Hiemstra, M.-F. Moens, J. Mothe, R. Perego, M. Potthast, & F. Sebastiani (Eds.), ECIR (pp. 375–390). Springer. https://doi.org/10.1007/978-3-030-72113-8_25 ↩
-
Gerald, T., & Soulier, L. (2022). Continual Learning of Long Topic Sequences in Neural Information Retrieval (M. Hagen, S. Verberne, C. Macdonald, C. Seifert, K. Balog, K. Nørvåg, & V. Setty, Eds.; pp. 244–259). Springer International Publishing. https://doi.org/10.1007/978-3-030-99736-6_17 ↩
-
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., Hadsell, R., 2016. Overcoming catastrophic forgetting in neural networks. arXiv:1612.00796 [cs, stat]. ↩
-
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., 2019. Parameter-Efficient Transfer Learning for NLP. ↩
-
Xiong, L., Xiong, C., Li, Y., Tang, K.-F., Liu, J., Bennett, P., Ahmed, J., & Overwijk, A. (2020). Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. arXiv:2007.00808 [Cs]. http://arxiv.org/abs/2007.00808 ↩
-
Formal, T., Lassance, C., Piwowarski, B., & Clinchant, S. (2022). From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2353–2359. https://doi.org/10.1145/3477495.3531857 ↩
-
Lin, J. (2019). The Neural Hype, Justified! A Recantation. ACM SIGIR Forum, 53(2), 6. ↩
-
Tay, Y., Tran, V. Q., Dehghani, M., Ni, J., Bahri, D., Mehta, H., Qin, Z., Hui, K., Zhao, Z., Gupta, J., Schuster, T., Cohen, W. W., & Metzler, D. (2022). Transformer Memory as a Differentiable Search Index. arXiv:2202.06991 [Cs]. http://arxiv.org/abs/2202.06991 ↩
-
Dai, D., Dong, L., Hao, Y., Sui, Z., Chang, B., & Wei, F. (2022). Knowledge Neurons in Pretrained Transformers. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 8493–8502. https://doi.org/10.18653/v1/2022.acl-long.581 ↩
-
Felhi, G., Roux, J. L., & Seddah, D. (2022). Exploiting Inductive Bias in Transformers for Unsupervised Disentanglement of Syntax and Semantics with VAEs. http://arxiv.org/abs/2205.05943 ↩
-
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805 ↩