Digital Humanities at the Sourasky Central Library

The field of Digital Humanities has evolved, and in some cases, even redefining, traditional research of various subjects, resources, and methodologies using scientific techniques and tools. This research field combines traditional humanities and social sciences queries and research methods with digital tools


Opening Hours Our Mission
At the Sourasky Central Library we offer support and resources for Tel Aviv University researchers and students who wish to use computer-based technologies to answer research questions related to the humanities and arts.


Opening Hours Guiding Rules

  • Providing guidance, tutorials, and seminars to develop DH skills for conducting research and teaching.
  • Preferring Open Source software tools, making research products easily accessible, and preserving the data after the research has been completed (or financial resources have run out).
  • Providing support and advice in writing research proposals that include the use of digital tools, from the stage of writing the proposal to the implementation stage.
  • Focusing on four major tools: Converting images to text (OCR), Spatial analysis (GIS), Distant Reading (DR), Content Management Systems (CMS).

  • Optical Character Recognition (OCR)
  • Geographic Information System (GIS)
  • Distant Reading
  • Content Management System (CMS)
  • Digital Humanities Lab

OCR process


Opening Hours What is Optical Character Recognition (OCR)?

When scanning a page from a book, a newspaper or any other textual source, the output is an image of the page – quite similar to an image photographed by our mobile phone. The computer does not identify the textual characters, thus, searching for words or phrases is not possible.

Optical Character Recognition (also known as OCR) is a process that enables the computer to identify printed or handwritten text fonts in the scanned image by using designated software. This software can identify the fonts in the scanned text and convert each one of them into a single character.



Opening Hours Optical Character Recognition and DH

Today, Optical Character Recognition is the starting point of computational or quantitative analysis of textual sources. The process, in which many scanned sources can be converted to machine-readable texts, is a mandatory stage in analyzing a large quantity of textual research objects in computational methods. Simplified text images can be generated from sources that were OCRed, so can textual-strings (with one meaning or the other) be tagged and research objects statistically analyzed.



Opening Hours Optical Character Recognition tools

  • Adobe Acrobat Pro: The commercial version of the popular PDF file editor easily and efficiently converts PDF files into searchable files. The program supports 42 languages. Once completing the OCR process, the original document can be edited and saved in other formats. In our DH Lab we have fully licensed Adobe Acrobat Pro 2017 installed on two working stations.
  • Tesseract: Google’s open-source OCR engine. This engine supports 165 languages, including Hebrew and Arabic (see the full list of languages here). Tesseract does not have a graphic user interface and a coherent use requires some technical expertise. Students and researchers who need guidance and assistance should contact the Reference and Guidance Department. In our DH lab there are two workstations with a full installation of Tesseract version 5.
  • ABBYY FineReader: Leading commercial software for optical character recognition. The software supports 201 languages, including Hebrew and Arabic. This software has advanced image processing capabilities, and it even includes an option to train character recognition and create new language patterns by the user. In the digital humanities laboratory there is one position with a full license for ABBYY FineReader 16.
  • OCR on Demand: Send us a PDF file - we will send you back a searchable PDF > OCR Service Request.

For more information: Main Entrance Hall | | 03-6404823

Tel Aviv University makes every effort to respect copyright. If you own copyright to the content contained
here and / or the use of such content is in your opinion infringing, Contact us as soon as possible >>