cv
Basics
| Name | Abdelrahman Abdallah |
| Label | Ph.D. Candidate · Machine Learning & NLP |
| abdelrahman.abdallah@uibk.ac.at | |
| Phone | +20 11 1137 1734 |
| Url | https://abdoelsayed2016.github.io/ |
| Summary | Machine learning engineer and final-year Ph.D. candidate with experience in natural language processing, information retrieval, and computer vision. Passionate about building LLM-based systems, OCR and QA pipelines, and tools that support both research and real-world applications. |
Work
-
2022.10 - Present Research Assistant
Digital Science Center (DiSC), University of Innsbruck
Research assistant working on NLP and information retrieval within the Digital Science Center.
- Knowledge extraction and information retrieval from unstructured text documents.
- Methods for natural language processing and information retrieval.
- Application of text mining methods to the field of digital history.
-
2022.01 - 2022.10 Machine Learning Researcher
Università Ca' Foscari
Researcher working on ML models for climate change and risk assessment.
- Comparative analysis of graph neural networks and random forests for climate-related tasks.
- Contributed to a review paper on ML/AI models for risk assessments.
- Surveyed graph neural networks for spatio-temporal data.
-
2021.08 - 2022.06 Machine Learning Engineer
KMG Engineering
Machine learning engineer focusing on vision and NLP applications.
- Developed GAN-based models for image inpainting.
- Built English grammar correction models using deep learning.
- Worked on curve detection and tracking tasks.
-
2021.08 - 2025.05 Machine Learning Engineer (part-time)
DISCO App
Machine learning engineer working on OCR and NLP for digital receipts.
- Worked on receipt extraction, OCR systems, and NLP components.
- Built and improved OCR accuracy for receipts and downstream information extraction and classification.
- Contributed to an application deployed on the Google Play Store.
-
2019.11 - 2021.06 Machine Learning Researcher
National Open Research Laboratory for Information and Space Technologies, Satbayev University
Researcher focusing on handwriting recognition, table detection, and document analysis.
- Built handwritten Kazakh and Russian databases for handwriting recognition research.
- Reviewed recent approaches to handwritten recognition for Cyrillic characters.
- Created a table detection dataset and developed models for table detection and classification.
-
2019.06 - 2019.11 Software Developer
CCC at Limkokwing University
Software developer working on web applications with database backends.
- Developed and implemented a scanning component using PHP and MySQL.
- Designed databases and table structures using n-tier architecture for web applications.
- Built web applications using the ExpressionEngine PHP framework.
-
2016.07 - 2019.06 Research and Teaching Assistant
Assiut University, Faculty of Computers and Information
Research and teaching assistant in Computer Science.
- Taught classes, supervised laboratory sessions, and graded assignments and projects.
- Represented teams in meetings with executives to discuss project goals and milestones.
- Kept up with emerging technologies and applied them to teaching and research projects.
-
2016.01 - 2017.08 Web Developer
FastKood Company
Web developer building data-driven websites and internal tools.
- Converted UI mockups into HTML, JavaScript, AJAX, and JSON.
- Worked with UNIX and Apache servers.
- Developed data architecture designs for targeted customer analysis.
- Created workflow charts and diagrams to support production teams and meet client deadlines.
-
2015.06 - 2016.01 Software Developer
Overcoffeesolutions
Software developer focusing on object-oriented applications.
- Developed object-oriented software and intuitive graphical user interfaces.
- Implemented scanning components using MySQL and solid database design.
- Built multiple web applications following n-tier architecture.
Volunteer
-
2023.01 - Present Remote
Reviewer
Conference and Journal Reviewer
Reviewer for major conferences and journals in NLP, IR, and computer vision.
- Conference reviewer: ACL, SIGIR, COLING, EMNLP, LREC-COLING, WACV.
- Journal reviewer: Pattern Recognition Letters, IET NBT, Heliyon, IET Signal Processing.
Education
-
2022.10 - Present Innsbruck, Austria
-
2019.09 - 2021.06 Almaty, Kazakhstan
MSc
Satbayev University, Faculty of Information and Telecommunication Technologies
Data Science and Machine Learning
-
2016.09 - 2017.06 Assiut, Egypt
-
2011.09 - 2015.06 Assiut, Egypt
Awards
- 2019.09.01
Scholarship for Master's Studies
Satbayev University
Scholarship to study for a Master's degree in Data Science and Machine Learning at Satbayev University.
Certificates
| NAACL 2025 Certificate | ||
| NAACL | 2025-01-01 |
Publications
-
2025 Evaluating Temporal Robustness of Large Language Models
ACL Findings 2025
Evaluation of how LLM performance changes under temporal shifts in data.
-
2025 TempDPR: Temporal Dense Passage Retrieval for Explicit Temporal Questions
Preprint
Temporal dense passage retrieval model for time-sensitive questions.
-
2025 From Retrieval to Generation: Evaluating the Best Approach
Preprint
Study comparing retrieval-based and generation-based approaches for QA and related tasks.
-
2025 CascadePLS-ViT: Cascade With Patch-Level Self-Supervised Vision Transformers for Breast Cancer Classification in Mammography
ISBI 2025
Patch-level self-supervised ViT cascade for mammographic breast cancer classification.
-
2025 ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval
NAACL 2025
Zero-shot LLM-based reranking approach that uses answer scent signals.
-
2025 DynRank: Improving Passage Retrieval with Dynamic Zero-Shot Prompting Based on Question Classification
COLING 2025
Dynamic prompting strategy for better zero-shot passage retrieval using LLMs.
-
2025 Wrong Answers Can Also Be Useful: PlausibleQA - A Large-Scale QA Dataset with Answer Plausibility Scores
SIGIR 2025
QA dataset with plausibility scores for both correct and incorrect answers.
-
2025 MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR Texts
CIKM 2025
Dataset designed to test robustness of QA models on noisy multilingual OCR outputs.
-
2025 RankArena: A Unified Platform for Evaluating Retrieval, Reranking and RAG
CIKM 2025
Unified platform for benchmarking retrieval, reranking, and RAG pipelines with human and LLM feedback.
-
2025 ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering
EMNLP 2025
Large-scale dataset for temporally complex question answering over dynamic collections.
-
2025 How Good are LLM-based Rerankers? An Empirical Analysis of State-of-the-Art Reranking Models
EMNLP Findings 2025
Empirical analysis of modern LLM-based rerankers across multiple datasets and settings.
-
2025 DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation
EMNLP Findings 2025
Dual-stage LLM-based document reranking framework using reasoning agents and distillation.
-
2024 ArabicaQA: A Comprehensive Dataset for Arabic Question Answering
SIGIR 2024
Large-scale dataset for Arabic open-domain question answering.
-
2024 IHRRB-DINO: Identifying High-Risk Regions of Breast Masses in Mammogram Images Using Data-Driven Instance Noise (DINO)
MICCAI 2024
DINO-based framework to highlight high-risk regions in mammogram images.
-
2024 Detecting Temporal Ambiguity in Questions
EMNLP Findings 2024
Detection of temporally ambiguous questions in open-domain QA datasets.
-
2024 HiGenQA: Exploring Hint Generation Approaches for Open Domain Question Answering
EMNLP Findings 2024
Study of automatic hint generation strategies for improving open-domain QA.
-
2024 A Survey of Recent Approaches to Form Understanding in Scanned Documents
Artificial Intelligence Review
Survey of transformers and language models for form understanding and scanned document analysis.
-
2023 Exploring the State of the Art in Legal QA Systems
Journal of Big Data
Comprehensive review of legal question answering systems and datasets.
Skills
| Machine Learning & Data Science | |
| Machine Learning | |
| Deep Learning | |
| Natural Language Processing | |
| Information Retrieval | |
| Open-Domain Question Answering | |
| Large Language Models |
| Computer Vision | |
| Handwritten Text Recognition | |
| OCR | |
| Object Detection | |
| Generative Adversarial Networks | |
| Image Retrieval | |
| Image Processing | |
| Image Segmentation |
| Programming Languages | |
| Python | |
| PHP | |
| Java |
| ML & DL Frameworks | |
| PyTorch | |
| TensorFlow | |
| Keras | |
| scikit-learn |
| Web Development | |
| Laravel | |
| HTML | |
| CSS | |
| JavaScript | |
| jQuery |
| Tools | |
| PyCharm | |
| Anaconda | |
| Jupyter Notebook | |
| Git | |
| Linux |
Languages
| Arabic | |
| Native |
| English | |
| Fluent (Duolingo: 140) |
Interests
| Natural Language Processing | ||||||
| Large Language Models | ||||||
| Information Retrieval | ||||||
| Open-Domain Question Answering | ||||||
| Keyword Information Extraction | ||||||
| Text Generation | ||||||
| Computer Vision | ||||||
| Handwriting Recognition | ||||||
| OCR | ||||||
| Medical Imaging | ||||||
| Image Retrieval | ||||||
| Segmentation | ||||||
References
| References available upon request | |
| Academic and professional references can be provided upon request. |
Projects
- 2024.01 - Present
Rankify
Creator and maintainer of Rankify, a Python toolkit for retrieval, reranking, and RAG evaluation.
- Comprehensive Python package for information retrieval and reranking evaluation.
- Supports evaluation of retrieval, reranking, and RAG systems with automated metrics and human feedback.
- Integrated with the RankArena platform and has 500+ GitHub stars.
- 2024.01 - Present
RankArena
Lead developer of RankArena, a unified web platform for evaluating retrieval, reranking, and RAG systems.
- Provides standardized protocols for benchmarking IR models against strong baselines.
- Supports both human and LLM-based feedback for evaluation.
- Accepted at CIKM 2025.