Alejandro Molina Villegas

PhD Computer Sciences

Temporary Lecturer and Researcher, University of Avignon, France

some_text I am an enthusiastic researcher on Data Mining and Statistical Methods applied to Natural language processing and linguistics engineering. My fields of expertise include Text Simplification, Discourse Analysis, Summarization, Web Mining, Citizen Science and Turing-like evaluation. I am currently working in a project dealing with Sentiment Analysis (with Xerox as partner) and I am applying for a funding, from the National Institutes of Health, to work in a project about NLP methods applied to Genomics.


2013 Ph.D. Natural Language Processing (exam in September). Université d'Avignon et des Pays de Vaucluse (UAPV). Dissertation: A study on sentence compression for the automatic summarization. Avignon, France.

2009 M.S. Computer Science. Universidad Nacional Autónoma de México (UNAM). Dissertation: Semantic clustering of definitional contexts. Mexico City.

2006 Bachelor in Computer Science. Universidad Autónoma Metropolitana -- Iztapalapa (UAM-I). Dissertation: NetTalk in Spanish: Learning to read through artificial neural networks. Mexico City.

2012-2013 Attaché temporaire d'enseignement et de recherche (teaching and research assistant). Centre d'Enseignement et de Recherche en Informatique -- UAPV.

2009 Lecturer. B.S. Engineering Faculty -- UNAM.

2009 Web development trainer. 3CT Training Center and Technology Consulting.

2005-2007 Lecturer. High school. CETIS-37.

2005 e-mail systems manager. UAM-I.

2002–2004 Web services manager. Aleph Media.

Research Projects

2012-2016 Automatic Detection and Measurement of Textual Similarity. Universidad Nacional Autónoma de México (Mexico), University Pompeu Fabra (Spain), Laboratoire Informatique d'Avignon (LIA) (France).

2012-2015 Image on the Web: Analysis of the image life cycle through the Web 2.0. XEROX, AMI Software, Université de Lyon, Électricite de France, Laboratoire Informatique d'Avignon, Centre d'Études Politiques de l'Europe Latine (France).

2008-2011 Lexical relations extraction for restricted domains from definitional contexts in Spanish. Universidad Nacional Autónoma de México (Mexico).

2008-2009 Language development in Mexican children. Universidad Nacional Autónoma de México (Mexico).

2008-2009 Analysis of definitions in Spanish for automatic lexical extraction. Universidad Nacional Autónoma de México (Mexico).

2008 Statistical audit of an email server for fraud detection in a legal case (Expertise). Universidad Nacional Autónoma de México, confidential Bank (Mexico).

2005 Analysis of non-governmental organizations in Mexico. Universidad Autónoma Metropolitana (Mexico).

2012 A system for sentiment analysis in Twitter (corpus annotation).

2012 A Sentence Compression system for discourse analysis in Spanish.

2009 Describe. A search engine that allows to extract definitions from a term using its original context on the Web.

2008 Canary Died. An intelligent cross-language email system that allows to inspect, quickly and efficiently, thousands of emails in many languages.

Programming HTML/XML (10 years), PHP (8 years), MySQL (8 years), C (3 years), MatLab (4 years), R (3 years), Perl (3 years), Java (2 years), Bash (3 years). Advanced client/server environment Unix, Linux, Mac OS X.


Spanish: Mother Tongue (excellent writing).

English: TOEFL 2009. Score: 537.

French: DELF B2 session 2009 + four-year stay in France.

Distinctions & Affiliations

2012 Member of Association pour le Traitement Automatique des Langues

2011 Member of the Mexican Society of Artificial Intelligence. SMIA.

2009 Member of Natual Language Processing Group. LIA--UAPV.

2007 Member of Linguistic Engineering Group. UNAM.

2005 Student representative council of basic sciences and engineering. UAM-I.

2006 Best Score for 2006 promotion in General Exam. CENEVAL -- UAM-I.

1995 Third prize of scientific projects. CETIS-37.


2009-2013 Ph.D. Grant. National Council on Science and Technology of Mexico (CONACYT).

2006-209 M.S. Grant. National Council on Science and Technology of Mexico (CONACYT).

Peer-reviewed Publications


Molina A., Sanjuan E., Torres-Moreno J.M. A Turing test to evaluate a complex summarization task. Submited: CLEF 2013 Conference and Labs of the Evaluation Forum.

Molina A., Torres-Moreno J.M, da Cunha I., SanJuan E., Sierra G.: Discursive Sentence Compression. In press: Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing’13). Lecture Notes in Computer Science. Springer.

Molina A.: Sistemas Web colaborativos para la recopilación de datos bajo el paradigma de ciencia ciudadana. In Revista Komputer Sapiens, Vol. 1, Year. 5, pp.13–27. 2010. ISSN: 2007‐0691. Sociedad Mexicana de Inteligencia Artificial, Mexico. 2013.


Molina A., Torres-Moreno J.M., da Cunha I., Sanjuan E., Sierra G.: Sentence compression in Spanish driven by discourse segmentation and language models. In: Cornell University ArXiv: 1212.3493, Computation and Language (cs.CL), Information Retrieval (cs.IR), Vol.1212. 2012.


Molina A., Torres-Moreno J.M., da Cunha I., Sanjuan E., Sierra G., Velázquez-Morales P.: Discourse Segmentation for Sentence Compression. In: Advances in Artificial Intelligence LNCS (Lecture Notes in Computer Science), Vol.7094, pp. 316-327. ISSN: 0302-9743.

Cabrera-Diego L.A., Molina, A., Sierra, G.: A Dynamic Indexing Summarizer at the QA@INEX 2011 track. In: Initiative for the Evaluation of XML Retrieval Working Notes Series (INEX Question Answering Track’11). Springer-Verlag New York Inc. Vol.7424, pp. 154-159. ISBN 978-90-814485-8-1.


Sierra G.,Torres-Moreno J.M., Molina A.: Regroupement sémantique de définitions en espagnol. In Proceedings of the 11th Conférence Internationale Francophone sur l'Extraction et la Gestion des Connaissances/Atelier d’Evaluation des méthodes d'Extraction de Connaissances dans les Données (EGC/EVALECD’10), pp. 41–50. Hammamet, Tunisia. 2010.

Linhares A., Molina A., Torres-Moreno J.M., Peinl P.: Usando Grafos e Algoritmo de Karp na Sumarização Automática de Documentos. In: Proceedings of the 42th Simpósio Brasileiro de Pesquisa Operacional (SBPO'10). Rio de Janeiro. 2010.

Molina A., Sierra G., Torres-Moreno J.M.: La energía textual como medida de distancia en agrupamiento de definiciones. In: Proceedings of the 10th Journées Internationales d'Analyse statistique des Données Textuelles (JADT'10). Vol.3, pp. 215-226. Rome. 2010. ISBN: 978-88-7916-450-9

Molina, A., da Cunha, I., Torres-Moreno, J.M., Velázquez-Morales P.: La compresión de frases: un recurso para la optimización de resumen automático de documentos. In: Journal Linguamática, Vol. 2, num. 3, pp.13–27. 2010. ISSN: 1647-0818.

da Cunha I., Molina A., Velázquez-Morales P., Torres-Moreno J.M.: Optimización de resumen automático mediante compresión de frases. In: Proceedings of the 28th Congreso Internacional de la Asociación Española de Lingüística Aplicada (AESLA’10). Vol.28, pp. 73-83. Vigo, Spain. 2010. ISBN: 978-84-8158-479-0.

da Cunha I., Molina A., Sierra G., Torres-Moreno J.M., Velázquez-Morales P.: Compresión automática de frases en español : un abordaje desde el análisis del discurso. In Proceedings of the 11th Encuentro Internacional de Lingüística del Noreste (EILN’10). Sonora, Mexico. 2010.


Sierra G., Alarcón R., Molina A. Aldana E.: Web Exploitation for Definition Extraction. In: Proceedings of the IEEE Latin American Web Congress (LA-WEB’09). pp. 217-223. Mérida, Mexico. 2009. ISBN:␣978-0-7695-3856-3.

