|
abstract |
[noun] An abstract is a short, concise
description of a document, which covers the full scope of its contents. |
|
ambiguity |
[noun] Ambiguity is a state whereby a
word or sentence can be understood in different ways; the former because
the word has more than one meaning or the latter because either the
ambiguity of a word is not resolved by the context of the sentence or
the structure of the sentence can be analysed in such a way as to convey
more than one meaning. Ambiguity can only be resolved by
understanding context. The word within its sentence, the sentence within
the discourse. |
|
anaphora |
[noun] The occurrence of anaphora is
characteristic of human language, both written and spoken. It is a
linguistic element which makes reference back to another element such as
in a relative clause like ‘He picked up the packet of sweets , which [the
packet] was on the table’. The most common occurrence of anaphora
is in the use of pronouns as in, for example, ‘David went to see a play
at the theatre. He enjoyed it [the play] very much’. There
other form of anaphora which are more difficult to identify and
interpret, such as, ‘Peter wanted to go out as well as John [wanted to
go out]; Petra also [wanted to go out]’. The frequent natural
occurrence of anaphora is a good illustration, among many, of the
difficulty of programming a computer to analyse text and gain a proper
understanding. |
|
authoring tools |
[phrase] Authoring tools help in the
preparation of texts. Generally, they are facilities provided in
association with word processing, desktop publishing, and document
management systems to aid the author of documents. They typically
include an on-line dictionary and thesaurus, spell-checking,
grammar-checking, and style-checking, and facilities for structuring,
integrating and linking documents. Authoring tools can also be
provided which enable users to author better quality documents in
languages, other than their own, which they understand to a degree but
in which they could not normally compose a document. |
|
Avatar |
An Avatar is the virtual representation
of a real participant in an activity, in a Virtual Reality environment.
So for example, a person could be represented in a meeting held in a
virtual environment, or , in a distance learning situation the tutor
could be represented by his Avatar in his dealings with students.
Obviously, an effective Avatar will need to have human characteristics,
including speech and language understanding. |
|
CALL |
[a] Computer Aided Language Learning.
|
|
character recognition
Top
|
[phrase] Optical Character Recognition
of written or printed language requires that a symbolic representation
of the language is derived from its spatial form of graphical marks.
For most languages this means recognising and transforming characters.
There are two cases of character recognition: recognition of printed
images, referred to as Optical Character Recognition (OCR), and
recognising handwriting, usually known as Intelligent Character
Recognition (ICR). OCR from a single printed font family can achieve a
very high degree of accuracy. Problems arise when the font is
unknown or very decorative or when the quality of the print is poor.
In these difficult cases, and in the case of handwriting, good results
can only be achieved by using linguistic intelligence. This involves
word recognition techniques which use language models, such as lexicons
or statistical information about word sequences. |
|
computational linguistics |
[phrase] Computational linguistics is a
field concerned with the processing of natural language by computers.
The term is more often used in an Academic context. It is closely
related to Natural Language Processing and Language Engineering. |
|
computer aided language learning |
[phrase] Computer Aided language
Learning is a form of self tutoring using computer learning packages to
teach language; the use of this technique has been increasing for a
number of years and the rate of increase has accelerated dramatically as
a result of the introduction of the CD-ROM. Generally, there are
many ways in which language technologies can help to improve the
effectiveness of learning in this way, particularly by being more
sensitive to individual needs through a better understanding of each
student’s interaction with the course material. In language
learning, the availability of on-line dictionaries, thesauri, and
grammars also greatly enhances the quality of information available to
the student as well as enabling the package to refine its presentation
to the student. When speech recognition and generation are added
to the package then it also becomes possible to practise pronunciation,
with the package able to evaluate performance and tutor the student
accordingly. |
| computer-aided
translation |
[phrase] Computer aided translation is
the process of assisting a human translator in translating from one
language to another using computer software tools. |
|
concept search |
[phrase] Concept search is a term used
in the context of information retrieval to mean that the search is made
using a semantic analysis of the search filter matched against a
semantic analysis of the database. This technique can be contrasted to
simple keyword searches where indices of each database are constructed
as inverted files of single words which are then used to fulfil the
search criteria. The result from a search which is linguistically based
is potentially much more effective in terms of precision (i.e. selecting
information which is relevant) and recall (i.e. not missing much of the
relevant information available) |
|
continuous speech |
[phrase] Continuous speech is when the
speaker makes no allowances for the listener (e.g. a speech recognition
device) by pausing between words. |
|
controlled language |
[phrase] Controlled language is language
which has been designed to restrict the size of the vocabulary and/or
the structure of language used, in order to make recognition and
processing easier. This is an approach which is particularly valid in
certain environments; typical uses of controlled language are in areas
where precision of language and speed of response is critical such the
police and emergency services, aircraft pilots, air traffic control,
etc.. |
|
corpus |
[noun] A corpus is a body of language,
either text or speech, which has been collected and annotated for uses,
such as: analysis of language to establish its characteristics analysis
of human behaviour (in terms of their use of language) in certain
situations training a system, usually to adapt its behaviour to
particular linguistic circumstances verifying empirically a theory
concerning language providing a test set for a language engineering
technique or application to establish how well it works in practice
There are national corpora of hundreds of millions of words but there
are also corpora which are constructed for particular purposes.
For example, a corpus could comprise recordings of car drivers speaking
to a simulation of a voice operated control system which recognises
spoken commands. Such a corpus is then used to help establish the
user requirements for a voice operated control system for the market.
|
|
dialogue
Top |
[noun] A dialogue is an interactive, two
way alternate flow of language between two individuals, an individual
and a machine, or between two machines. Most frequently the term
is used in the context of speech but this need not be so. Dialogues
could equally take place through the exchange of text or through a
mixture of text and speech, for example a deaf person may speak to a
system which then respond by displaying text. Speech dialogue can be
established by combining speech recognition with simple generation,
either from concatenation of stored human speech components or
synthesising speech using rules. |
|
dialogue design |
[phrase] Dialogue design is an activity
needed to enable a machine to converse effectively with a human being.
It is particularly important in the application of speech processing to
real life systems and plays a crucial role in determining the tolerance
of a human being to a conversation with a machine and hence to the
successful outcome of the transaction. Good dialogue design
probably owes more to psychology and a knowledge of the intended
function of the system than to language engineering. The technique
known as ‘wizard of Oz’ testing is frequently used to evaluate the
effectiveness of a dialogue. Providing a library of speech recognisers
and generators together with a graphical tool for structuring their
application allows some-one who is neither a speech expert nor a
computer programmer to design a dialogue. Facilities are being
developed to place control of the design and implementation of dialogue
in the hands of the users of a system e.g. market researchers,
telemarketing managers, bankers, etc.. |
|
dictionary |
[noun] A dictionary is a list of the
words or a selection of words of a language, arranged in alphabetical
order, with a definition of each, possibly also giving its pronunciation,
part of speech, etymology, etc. |
|
discourse |
[noun] Discourse is a contiguous stretch
of language comprising more than one sentence (text) or utterance (speech). |
|
discourse analysis |
[phrase] Discourse analysis identifies
the linguistic dependencies which exist between sentences or utterances.
Successful analysis depends upon the discourse comprising properly
formed sentences within a rational context. |
|
document image recognition |
[phrase] Document image analysis is
closely associated with character recognition but involves the analysis
of the document to determine firstly its make-up in terms of graphics,
photographs, separating lines, and text and then the structure of the
text to identify headings, sub-headings, captions etc. in order to be
able to process the text effectively. |
|
domain |
[noun] Domain is a term usually applied
to the area of application of the language enabled software e.g.
banking, insurance, travel, etc.. The significance in language
engineering is that the vocabulary of an application is often restricted
so the language resource requirements are effectively limited by
limiting the domain of application. |
|
evaluation
Top |
[noun] Evaluation is regarded as a
critical part of system development in the world of language
technologies. Due to the nature of one of the most important
goals, i.e. developing systems which are easy, natural, and comfortable
to use, end-users are frequently involved in the evaluation process. In
common with IT in general, where evaluation has been generally referred
to as ‘testing’, there are three types of evaluation: diagnostic
evaluation (system testing) during which the system is tested to ensure
that it is free from errors and meets the design specification; adequacy
evaluation (user acceptance testing) in which the system is evaluated in
terms of its fitness for the purpose i.e. functionality, usability and
cost-effectiveness; performance evaluation (performance testing) during
which the performance of the system is evaluated in terms of the design
parameters established at the outset, such as response times in an
interactive system, network transit times in communications systems, and
precision and recall in information retrieval. |
|
finite state machine (fsm)
Top |
[phrase] a finite state machine
comprises a number of states, which are represented in a data structure,
and functions which determine changes of state resulting from input and
trigger consequent output. They are rather like dynamic decision
tables. In language processing they are used for applications
where an approximation of proper grammatical description is sufficient
to provide the results required. |
|
formalism |
[noun] A formalism is a means to
represent the rules used in the establishment of a models of linguistic
knowledge. |
|
generate |
[verb] To generate ,in the context of
language technologies, is to produce language in one form from another
form of language or information; see natural language generation. |
|
globalisation
Top |
[noun] Globalisation is the process of
preparing software for use in any language and cultural environment
either by designing it to be usable in this way or by adding facilities
to existing software to facilitate subsequent localisation. It is
synonymous with Internationalisation. |
|
grammar |
[noun] Grammar is used to refer to a
number of areas of knowledge: traditionally, the morphological and
syntactic properties of a human language; a system of structural rules
which are the basis of linguistic generation and understanding; a
language theory or a model of linguistic competence. A grammar can be a
systematic description of the regularities of a human language but the
features of such grammars vary according to their intended application.
In language engineering a grammar (recorded electronically) most
commonly describes the structure of a language at different levels: word
(morphological grammar), phrase, sentence, etc.. A grammar can deal with
structure both in terms of surface (syntax) and meaning (semantics and
discourse). |
|
grammar checker |
[phrase] A grammar checker is a software
facility which checks text for the correctness of its grammar. It
is usually embedded in a word processor or desktop publishing package |
|
human aided machine translation |
[phrase] Human aided machine translation
is the process of machine translation which is improved by the
assistance of a human being. |
|
human language technologies
Top |
[phrase] Human Language Technologies are
technologies which are concerned with different aspects of language
engineering. At the broadest level these technologies cover:
applying language knowledge to human machine interaction; providing
automated multi-linguality in systems; managing information recorded as
human language. These technologies include: speech recognition, spoken
language understanding, and speech generation; speaker identification
and verification; dialogue design and analysis controlled language
design and processing document image analysis, optical character
recognition, and handwriting recognition: recognition and understanding
of multi-modal human communication computer assisted text creation and
editing; language analysis and understanding; information extraction and
summarisation language generation; (synthetic) speech generation
language identification, machine translation and computer aided
translation. production of language resources and the tools to support
it, evaluation. |
|
hidden Markov model |
[phrase] A hidden Markov model (HMM) is
like a finite state machine in which not only transitions are
probabilistic but also output. HMMs are commonly used in speech
recognition systems to help to determine the words represented by the
sound wave forms captured. In this case, an HMM describes the
realisation of a concatenation of elementary processes which represents
the sequence of acoustic parameters extracted from a human utterance |
|
hypertext |
[noun] Hypertext is a method commonly
used for help files and in the World Wide Web whereby highlighted text
is used to provide a link (rather like an index) to related text ( often
a more detailed explanation of the item highlighted |
|
index
Top |
[verb] To index is to build a concise
means of reference to information within a database which, for textual
information, can be based on keywords or concepts. |
|
information extraction |
[phrase] Information extraction is the
process of selecting information from a database using linguistic. It is
distinguished from conventional information retrieval in that the
information is selected and delivered to tight specifications, using
templates, and is often delivered in the form of fragments of documents. |
|
information retrieval |
[phrase] Information retrieval is usually
used as a generic term to cover the access to and delivery of
information from natural language databases by whatever method. Usually
the information is delivered in the form of complete documents.
|
|
interlingua |
[noun] An Interlingua is an invented
language which can be used as a common, formal representation into which
source natural language may be translated and from which target natural
language can be generated. |
|
internationalisation |
[noun] Internationalisation is the
process of preparing software for use in any language and cultural
environment either by designing it to be usable in this way or by adding
facilities to existing software to facilitate subsequent localisation.
Internationalisation is synonymous with the term Globalisation.
However the latter is becoming less used in the language context because
it has become used in recent years in a much wider context. |
|
interpret |
[verb] To interpret is, generally, to
attribute meaning to language; but also, to translate from one language
to another, usually orally, in real-time. |
|
language enabled |
[phrase] Language enabled describes a
computer application which has been improved in functionality,
performance, and/or presentation by the use of language engineering |
|
language engineering |
[phrase] Language engineering is the
application of knowledge of language to the development of computer
systems which can recognise, understand, interpret and generate human
language in all its forms. |
|
language resources
Top |
[phrase] Language resources are
essential components of language engineering. They are one of the
main ways of representing the knowledge of language which is used for
the analytical work leading to recognition and understanding. The work
of producing and maintaining language resources is a huge task.
Resources may be produced, according to standard formats and protocols
to enable access, in many EU languages, by research laboratories and
public institutions. Many of these resources are being made
available through the European Language Resources Association (ELRA).
Lexicons, terminology databases dictionaries of proper names,
terminology databases, grammars, wordnets, and corpora are all
repositories of language knowledge. |
|
lemmatise |
[verb] To lemmatise is to break an
inflected word into its root (base form) and ending components. |
|
lexicon |
[noun] A lexicon is a repository of
words and knowledge about those words. This knowledge may include
details of the grammatical structure of each word (morphology), the
sound structure (phonology), its part of speech, and the meaning of the
word in different textual contexts, e.g. depending on the word or
punctuation mark before or after it. Lexicons may be ordered
either alphabetically or semantically. A useful lexicon may have
hundreds of thousands of entries. Lexicons are needed for every
language of application. There are a number of special cases which are
usually researched and produced separately from general purpose lexicons:
dictionaries of proper names, terminology databases, and wordnets. |
|
localise |
[verb] To localise is to adapt software
to the local requirements in terms of language and culture (including
legal practice and business conventions, for example).
Localisation is more likely to be efficient and cost effective if
systems are designed taking localisation into account. |
|
machine translation |
[phrase] Machine translation is the
process of automatically translating from one language to another by
computer. |
|
machine aided translation |
[phrase] Machine aided translation is
synonymous with computer-aided translation. |
|
machine readable dictionary |
[phrase] A machine readable dictionary is
one which can be read by computer software. |
|
mark up
Top |
[verb] To mark up is to annotate
language in order to have a record of certain of its properties. For
example, a document can be marked up in such a way that its structure
and presentation are described so that it can be reproduced by software
other than that used for its creation. Language can also be marked
up to record its syntactical and semantic properties in preparation for
future use in developing an application, or for linguistic research. |
|
morpheme |
[noun] A morpheme is the smallest
meaningful element of language i.e. as a semantic element it cannot be
divided into smaller elements. |
|
morphology |
[noun] Morphology is the science of the
structure of words |
|
multi-lingual |
[adjective] Multi-lingual is properly
used to mean that something exists in a form that can handle several
languages but is, in practice, often used to describe the characteristic
that versions exist for several languages. |
|
natural language generation
Top |
[phrase] A structured representation of a
text can be used as the basis for generating natural language. An
interpretation of structured data or the underlying meaning of a
sentence or phrase can be mapped into a surface string in a selected
fashion; either in a chosen language or according to stylistic
specifications by a text planning system. |
|
natural language processing |
[phrase] Natural language processing is a
term in use since the 1980s to define a class of software systems which
handle text intelligently. |
|
OCR |
[a] Optical Character Recognition. |
|
Optical Character Recognition
Top |
[phrase] Recognition of written or
printed language requires that a symbolic representation of the language
is derived from its spatial form of graphical marks. For most
languages this means recognising and transforming characters.
There are two cases of character recognition: recognition of printed
images, referred to as Optical Character Recognition (OCR), and
recognising handwriting, usually known as Intelligent Character
Recognition (ICR). OCR from a single printed font family can achieve a
very high degree of accuracy. Problems arise when the font is
unknown or very decorative or when the quality of the print is poor.
In these difficult cases, and in the case of handwriting, good results
can only be achieved by using ICR. This involves word recognition
techniques which use language models, such as lexicons or statistical
information about word sequences. |
|
onomastics |
[noun] Onomastics is the scientific
investigation of proper names. |
|
parse |
[verb] To parse is to analyse language in
order to establish its structure and relationships at a the levels of
syntax and/or semantics. |
|
part of speech
Top |
[phrase] The element in a classification
of words according to form and meaning. The current classification in
use in Europe, is based on the work of Dionsyios Thrax (a grammarian of
the first century BC), and comprises nouns, verbs, adjectives, adverbs,
articles, pronouns, prepositions, and conjunctions. |
|
phoneme |
[noun] A phoneme is the smallest unit of
sound (analogous to a morpheme) which can be identified from an acoustic
flow of speech and which is semantically distinct.. |
|
proper names |
[phrase] A proper name is the name of a
place, person, animal, or thing. Dictionaries of proper names are
essential to effective understanding of language, at least so that they
can be recognised within their context as places, objects, or person, or
maybe animals. They take on a special significance in many
applications, however, where the name is key to the application such as
in a voice operated navigation system or in a holiday reservations
system or railway timetable information system based on automated call
handling. |
|
semantics |
[noun] Semantics is the analysis of
language to determine meaning. |
|
shallow parser
Top |
[phrase] A shallow parser is computer
software which parses language to a point where a rudimentary level of
grammatical structure and meaning can be realised; this is often used in
order to identify passages of text which can then be analysed in further
depth to fulfil the particular objective. |
|
speaker identification |
[phrase] A human voice is as unique to an
individual as a fingerprint. This makes it possible to identify a
speaker and to record the characteristics of his or her voice for use as
the basis for future verification. |
|
speaker independent |
[phrase] Speaker independent is a term
applied to a speech recognition system which is capable of recognising
speech regardless of the speaker, i.e. it does not need to be trained to
recognise individual speakers. |
|
speaker verification |
[phrase] Once a human voice has been
identified. Since it is unique, this makes it possible to use this
identification as the basis for verifying that an individual is entitled
to access a service or a resource. The types of problems which have to
be overcome are, for example, recognising that the speech is not
recorded, selecting the voice through noise (either in the environment
or the transfer medium), and identifying reliably despite temporary
changes (such as caused by illness). |
|
speech recognition |
[phrase] The sound of speech is received
by a computer in analogue wave forms which are analysed to identify the
units of sound (phonemes) which make up words. Statistical models of
phonemes and words are used to recognise either discrete or continuous
speech input. The production of quality statistical models requires
extensive training samples (corpora) and vast quantities of speech have
been collected and continue to be collected for this purpose. There are
a number of significant problems to be overcome if speech is to become a
commonly used medium for dealing with a computer. The first of these is
the ability to recognise continuous, or spontaneous, speech rather than
speech which is deliberately delivered by the speaker as a series of
discrete words separated by a pause. The next is to recognise any
speaker, avoiding the need to train the system to recognise the speech
of a particular individual. There is also the serious problem of the
noise which can interfere with recognition, either from the environment
in which the speaker uses the system or through noise introduced by the
transmission medium, the telephone line, for example. Noise reduction,
signal enhancement and key word spotting can be used to allow accurate
and robust recognition in noisy environments or over telecommunications
networks. Finally, there are the problems of dealing with regional
accents, dialects, language spoken by a foreigner, and language which is
spoken ungrammatically, which is probably most of it. |
|
speech synthesis |
[phrase] Speech is synthesised from
filled templates, by playing ‘canned’ recordings or concatenating units
of speech (phonemes, words) together. Speech generated has to account
for aspects such as intensity, duration and stress in order to produce a
continuous and natural response. |
|
speech to text |
[phrase] Speech to text is the process of
analysing speech and producing its textual equivalent; a typical example
of a speech to text application is in dictation systems. |
|
spell checker |
[phrase] A spell checker is software
which checks the spelling of words, usually embedded in another program
such as a word processor, desktop publishing package, spreadsheet,
presentation package, etc.. |
|
spontaneous speech |
[phrase] Spontaneous speech is often
used synonymously with continuous speech but more explicitly recognising
that there are other characteristics of speech which make it difficult
to understand, such as the tendency for people not to speak
grammatically correctly or to speak in ways which make it difficult to
maintain consistent context. |
|
style checker |
[phrase] A style checker is software
which checks a document to ensure that it conforms to a template
defining the structure of the text and the document containing it; also
the checking of the use of phrases or sentences in a predefined way. |
|
summarise |
[verb] To summarise is to produce a
concise description of a document, which covers the full scope of its
contents. |
|
syllable |
[noun] A syllable is a unit of
pronunciation which is more than a single sound (see phoneme above) and
smaller than a word. |
|
syntax |
[noun] Syntax is the system of rules
which describe how sentences can be formed from basic elements of
language, i.e. morphemes, words and parts of speech. |
|
tag
Top |
[verb] To tag is to annotate a corpus by
attaching information to the words, which describes, for example, the
grammatical context of the words and/or associations with other words. |
|
terminology |
[noun] Terminology is increasingly
important in today’s complex technological environment where there is a
host of terminologies which need to be recorded, structured, and made
available for language enhanced applications. Many of the most
cost-effective applications of language engineering, such as
multi-lingual technical document management and machine translation,
depend on the availability of the appropriate terminology banks.
|
|
text |
[noun] The term text is used frequently
to distinguish written, printed, or symbolically recorded (using
character encoding) language from speech. |
|
text alignment |
[phrase] Text alignment is the process of
organising different language versions of a text in order to be able to
identify equivalent terms, phrases, or expressions. |
|
text to speech |
[phrase] Text to speech is the process of
producing the speech equivalent of text; a typical example of a text to
speech application is an automatic announcement system at an airport or
railway station. |
|
thesaurus |
[noun] A thesaurus is a dictionary of
synonyms |
|
translate |
[verb] To translate is to transform a
text from one language to another in a way which preserves the original
meaning. |
|
translation memory |
[phrase] A translation memory is a system
which builds knowledge about translating from one language to another by
remembering and re-using previous translations. |
|
translator’s workbench |
[phrase] A translator’s workbench is a
software system providing a working environment for a human translator,
which offers a range of aids such as on-line dictionaries, thesauri,
translation memories, etc.. |
|
user modelling
Top |
[phrase] User modelling is a term used
most often in dialogue based speech recognition to describe where there
is a component which attempts to be sensitive to the various sorts of
users that the system may encounter |
|
utterance |
[noun] An utterance is the string of
sounds produced by a speaker between two pauses. |
|
version |
[noun] A version is an edition of a
document which is recorded as different from the previous edition.
It is used in configuration control and in Document Management systems. |
|
version control |
[phrase] Version control is the
management of the production, recording, and issue of documents as in
configuration control. |
|
voice authentication |
[noun] Voice authentication is synonymous
with speaker verification |
|
voice recognition |
[noun] voice recognition is synonymous
with speech recognition |
|
wizard of Oz testing
Top |
[phrase] Wizard of Oz testing is testing
in which the automated machine component is substituted by some form of
human intervention but in such a way that the user participating in the
test is unaware of the substitution. It is frequently used in the
process of evaluation, particularly to verify the effectiveness of a
dialogue based system. |
|
wordnet |
[noun] A wordnet is a network which
models the relationships between words, for example, synonyms, antonyms,
hyponyms, and so on. Such networks can be invaluable in
applications like information retrieval, translator workbenches, and
intelligent office automation facilities for authoring. |