Hindi Tamil Gujarati Bangla Assamese Kannada Malayalam Marathi Oriya Punjabi Telugu
Background
VishwaBharat@tdil
Request a CD
Quick Registration
TDIL Team at DIT
Keep watching this area for latest News and Updations
Technology Development for Indian Languages



Communication among human beings is inherently multi-modal, visual and aural modes being the primary modes. Currently, the principal means of human machine communication is heavily biased towards the convenience of the machine rather than that of man. Mouse and keyboard are primary input devices and visual display unit is the primary output device. Usage of such interfaces requires special skills and mental attitude, which many people are not endowed with. This machine-centric mode of communication needs to be changed in favour of human-centric interfaces so that all people share the benefit of the power of computers. While visual mode is most effective in capturing information, speech remains the preferred and most convenient means of conveying information. The advantage of verbal communication has become even stronger today due to convergence of computers and telecommunication systems, which allows people to access information on computers located remotely.

The verbal communication involves natural language, and this brings to fore the role of linguistics in the information technology. Therefore, human-centric interface to computer is the need of the day. Human beings are endowed with a unique faculty-the language-through which they share information, thoughts and ideas effortlessly among themselves. Facilitating human machine interaction using natural language involves several facets of human language technology: speech compression, recognition and understanding of speech and script, machine translation, text generation, synthesis of speech and cursive script. Both forms of language, spoken and written, are useful for interaction with machine.

Indian languages work on Computer is being carried out for last two decades accomplishing variety of tasks, including data processing, word processing, desktop publishing etc. on different platforms and operating systems.

Department of Information Technology initiated the TDIL (Technology Development for Indian Languages) programme with the objective of developing Information Processing Tools and Techniques to facilitate human-machine interaction without language barrier; creating and accessing multilingual knowledge resources; and integrating them to develop innovative user products and services. Projects were funded to develop linguistic resources such as corpora and dictionaries; and basic information processing tools such like fonts, text editor, spell checker; OCR and Text-to-Speech. Standards were also evolved.

Various efforts have been made at various places in Private, Public and Government Level in the country to develop Indian Language technology products and Services. Language Technologies and tools developed by Resource Centres and Coilnet Centres need to be brought fast to the use for feedback generation and made them available for productizing. The research & development efforts should have impact on society at large. It necessarily means that the developmental efforts should not be restricted to the lab environment, but these need to be deployed fast, which finally bring the users feedback & experience for further refinement.

Government is all set to provide the following products and solutions in public domain over the next one year:

  • Free fonts (TTF and OTF) and word processors in all Indian languages. As a first step, popular True Type Fonts (TTF) for the classical language Tamil have been selected through interaction with publication industry for release in public domain free of cost. TTF fonts are widely in use with systems having Windows 95/ Windows 98/ Windows NT platform. For systems with Windows 2000/XP/2003 and Linux platforms OTF (Open type font) are being released. This effort is first of its kind and most of the word processors (existing and new) will be able to make use of these fonts. This in turn will make more fonts available for users with Tamil Net/Tamil 99 and Typewriter keyboards for data entry.
  • Optical Character recognition (OCR) in all Indian languages for information extraction, retrieval and digitization. OCR makes it possible to convert scanned image (printed page scanned using scanner) to editable text so that it can be used for other applications and can be modified accordingly. The publishing industry is one of the major beneficiaries, who can use this software to reproduce and bringing out the new editions.
  • Speech Interfaces for Systems such as Railway Information, Healthcare, Agriculture, Disaster Management and other public utility services. This will help people at large to take benefits of advanced technologies in Indian Languages. The interface with the system can use speech recognition engine to recognize human voice and convert it to text for information extraction. Text to speech can be used to read out text for visually impaired persons for applications such as fetching information from Internet.
  • Internet access tools for Indian languages like Browsers, Search Engines and email. With these, sending email in Indian languages will be possible and search engine will provide facility to search information in other Indian languages while giving query in one of the Indian languages.
  • On line translation services tools amongst Indian languages and English. It will help people to translate content in English & any Indian languages to target Indian language of their choice.

The products/services are being made available through TDIL Data Centre with online helpdesk.

Government has a mission and time bound activity plan for TDIL-DC (Language Technology Utilities Distribution Channel) to boost research, productization, deployment and support through

  • Opening developed technologies to the market
  • Upgrading or refining technologies to formulate products
  • Developing need based new technologies

In order to achieve these, following steps actions are being taken

  1. Distribution of Indian languages technologies/tools through TDIL Data
    Centre in a phase wise manner
  2. Acquisition of developed tools, technologies, products and services
  3. Spreading awareness and publicizing the efforts of government in Language Technology areas through available tools and technologies
  4. Facilitating user for free downloads of tools, utilities, products etc.
  5. Promoting Public-Private partnership in Language Technology
  6. Launching missions of specific application areas

Constituents of Framework

Best Viewed at 800 by 600 pixels
© Dept. of Information Technology, MoCIT, Govt. of India || Site and Data Centre maintained by C-DAC Noida