A wide range of topics related to sound, vision, multimedia technologies, multimodal interfaces, and many others, are the subject of research in the Multimedia Systems Department. Solutions developed by MSD were presented during a number of exhibitions and many of them were awarded. Most of the developed solutions are also patented. The results of research are published in scientific journals and presented during scientific conferences. The publications, conference papers and patents are listed in the Database of references.

The most important topics of the current research are listed below.

  • Studio technology:
    • audio and video recording
    • signal processing
    • editing and mastering
    • post-production of audio and video
    • virtual reality
    • multichannel sound
  • Sound analysis and processing:
    • restoration of audio recordings
    • speech recognition
    • sound synthesis
    • algorithms for enhancing speech intelligibility and quality
    • spatial filtration and sound source localization
    • solutions assisiting persons with hearing and/or speech disorders
  • Image and video analysis and processing:
    • object recognition in video
    • detection and tracking of moving objects
    • visual speech recognition
    • analysis of images from ToF, thermal and infrared cameras
    • computer character animation
    • motion capture systems
  • Multimodal human-machine interfaces:
    • interfaces for disabled people
    • eye-gazin interfaces for controlling a computer
    • gesture-driven interfaces
    • voice-driven interfaces
    • analysis of brain waves
  • Security technologies:
    • biometric recognition of persons
    • signature validation
    • detection of security threats in cameras
    • detection of audio events
  • Information technology systems:
    • environmental monitoring, noise maps
    • multimedia telemedicine systems – hearing and eyesight diagnosis
    • road monitoring and traffic control systems
  • Implementations of algorithms for multimedia processing
    • digital signal processors (DSP)
    • development boards and embedded systems (Intel Galileo, Raspberry Pi, etc.)
    • parallel processing platforms (e.g GPU)
    • data processing on supercomputers
  • Multimedia applications of machine learning
    • cognitive systems
    • recognition and classification of sound and images
  • Musical acoustics
    • recognition of musical sound and phrases
    • subjective assessment processing
    • listening tests
    • singing voice quality assessment
  • Sound reinforcement:
    • acoustical design of rooms
    • reinfocement systems design
    • acoustic adaptation of rooms
    • acoustic measurements in rooms
  • Mobile technologies:
    • diagnostics and health monitoring
    • novel methods of human communication

Research projects

Multimedia Systems Department participates in many research projects, both European projects and the ones funded by Polish Ministry of Science and Education and other Polish institutions.

Polish projects - ongoing

ADMEDVOICE - Adaptive intelligent speech processing system of medical personnel with the structuring of test results and support of therapeutic process (INFOSTRATEG4/0003/2022).
The project aims to develop and implement a solution by means of which physicians will be able to voice-recall available diagnostic test results and clinical parameters of patients, fill in disease charts during medical history taking, create radiological descriptions, and prescribe treatment as required. The system to be developed will automatically generate templates for their completion, including medical history, radiology image descriptions, which will be editable by voice, and will also enable dictation of referrals for additional tests, prescriptions, and sick leave. The cloud-based speech recognition system will be built on the basis of the developed corpus of Polish speech extended by the dictionary of medical terms and trade names of medicines, intended to be used primarily by doctors of various specializations, including radiologists, surgeons, doctors working in hospital emergency wards, specialists giving medical advice. The variant of the solution extended with two-way communication, based both on speech recognition and speech synthesis, will be used when a doctor cannot operate a text editor manually. It is assumed that the corpus of recordings will be based on recordings of articulated speech in typical acoustic conditions of doctors' offices and also in surgical masks, in operating theatres and also in conditions that make effective speech recognition difficult, i.e. in the presence of reverberation, interference and simultaneous speech. In particular, to use the system in operating theatres and clinical emergency departments, the constructed hardware solution for speech signal acquisition will be used. An additional application of the system will be to search medical databases and repositories. Hence the repositories will be supplemented with the names of medical conditions and some drug components as well as trade names of popular medical products. Special attention will be paid to training the system for recognition of associated sets of words, elimination of repetitive erroneous terms, dictation of dimensions of radiological lesions, voice correction of entries in referenced templates, ability to read entries using synthetic speech, starting from highlighted cursor positions and replacement of existing fragments with new ones dictated by voice.

European projects - completed

COPCAMS (COgnitive & Perceptive CAMeraS) - ARTEMIS project funded under grant agreement No. 332913 (2013-2016). The project consortium consists of 21 partners from seven European countries. COPCAMS leverages recent advances in embedded computing platforms to develop large-scale, integrated vision systems. It aims to exploit new programmable accelerators, particularly many-cores, to power a new generation of greener, low-power smart cameras and gateways. This will be possible owing to a paradigm change: whereas previous generation of systems had simple cameras connected to powerful centralised computing servers through high-bandwidth networking, the COCPAMS vision is to push low-power, high-performance computing on the edge of the system and in the distributed aggregators. These “smart cameras” and “smart aggregators” will process video streams, extract significant semantic information and decide locally whether or not the streams’ content is of interest and is worth propagating. The decentralised, distributed decision-making will save both energy and bandwidth, while opening up opportunities for new distributed applications.
Project page.

ADDPRIV (Automatic Data relevancy Discrimination for a PRIVacy-sensitive video surveillance) – the project seeks to improve public safety by ensuring the individuals' privacy right, enriching the current video surveillance systems through an automatic discrimination of relevant data recorded. The project addresses the challenge of determining through an automatic, accurate and reliable manner which information obtained from a distributed system of surveillance cameras is relevant from the security perspective and which is not, and can be safely deleted. This will limit unnecessary data storage and will protect the citizens’ privacy right. ADDPRIV proposes novel knowledge and developments to limit the storage of unnecessary data throughout existing multicamera networks in order for them to better comply with citizen's privacy rights. ADDPRIV addresses the challenge of determining in a precise and reliable manner private data captured by video surveillance systems that are not relevant from a security perspective. ADDPRIV proposes solutions for automatic discrimination of relevant data recorded on a multicamera network, related to an individual whose suspicious behavior triggered an alert. Relevant data not only corresponds to video scenes capturing individuals' suspicious behavior (smart video surveillance), but also automatically extracting images on these individuals recorded before and after the suspicious event and across the surveillance network.
Project page.

PERFORM - an integrated high-budgeted European project from the domain of telemedicine, coordinated by Siemens. The Multimedia Systems Department is responsible for developing teleinformatic equipment designed for the remote monitoring of the patients that suffer from neurodegenerative diseases (mainly Parkinson).

INDECT - continuation and extension of the SECURITY project. The project is a Europe-wide venture, which will be realized in cooperation of Polish, German and European Police, and many prominent Polish and European technical universities. The Gdansk University of Technology is the initiator and main contributor to the project. The project was approved by the European Commission, who is its founder, in September 2007, and has the budget of several million Euro. SECURITY is the first integrated European project from the domain of security technologies, prepared an coordinated in Poland.

PRESTOSPACE - Preservation towards storage and access, Standardised Practices for Audiovisual Contents in Europe (FP6-IST-707336)
An integrated project of the 6. Framework Program realized in cooperation with such corporations as BBC or RAI. The Gdansk University of Technology was responsible for developing tools for the reconstruction of archive materials (old recordings and films). European archives repositories store nearly 200 millions of audio-visual material, part of which could be prevented from further depreciation through the use of these tools.

DESYME - Development System for Mobile Services, European CELTIC project.
An international project completed in 2007, that enables a self-reliant design and programming of various cellular phone services (formerly exclusively reserved for cellular-phone network operators).

Polish Projects - completed

INFOLIGHT - Cloud-based lighting system for smart cities.
Project founded by The National Centre for Research and Development, POIR.04.01.04-00-0075/19, realized in 2020-2023.
The aim of the project is a definition of novel multimodal system for intelligent lamps, providing functions of Internet of Things, integrated with a cloud computing service. The system will be proposed, built and evaluated. Optimal methods of environmental data acquisition from sensors and monitoring cameras will be developed and evaluated. Cloud and fog computing technologies will be applied for efficient data postprocessing, fusion and decision making. As a result the lighting conditions will be continuously monitored, and light emitting elements (LEDs, mirrors, lenses) optimally adjusted to provide required light temperature, intensity, directional characteristics. Project partners will build and examine hardware and communication layers, and Politechnika Gdanska will focus on data analysis layer: acquisition, pre-processing, synchronization, computations in cloud and fog architectures, decision making. PG will as well conduct a research on influence of light parameters on human. That will include ability of a driver to recognise objects on the road, reflex and reactions, well-being, and circadian rhythm. Algorithms of data processing and inference will form a set of basic services, combined into complex services by defining decision rules. All functions will be provided to an end user as a service: predefined applications and building blocks to facilitate creation of own solutions for smart city. Integration with external data sources is planned, including infrastructure of smart city and connected cars by a V2X (vehicle-to-everything) protocol. As a side result, the PG team will increase its competence in key technologies such as Internet of Things, programming of distributed applications, data analysis, fusion, and decision making.

BIOPUAP - A cloud-based biometric authentication system. Project No. POIR.01.01.01-0092/19, realized 2020-01-10 - 2022-06-30.
The main objective of the project is to enable PKO BP customers to use multi-modal biometric authentication in the Bank's branches, as well as in mobile channels. As a result of the project, solutions will be created that will allow the customer to leave his biometric samples at the Bank's stationary branch or mobile channel. This gives the opportunity to verify the identity of the customer without having any doc-uments with him. The implementation of the project will directly affect the strength-ening of the identity authentication process for operations carried out by the Bank's customers and employees. As a consequence, the security of banking services as well as social services provided via the Trusted Profile will be improved. The project will be implemented in four research stages - two stages are planned as industrial research and the next two as development works.

INZNAK - Intelligent Road Signs with V2X Interface for Adaptive Traffic Controlling.
The objective of the project is to develop a conceptual design and research tests of a new kind of intelligent road signs which will enable the prevention of the most common collisions on highways, resulting from the rapid stacking of vehicles resulting from accidental heavy braking. A range of products will be developed, including intelligent road signs: standing, hanging and mobile ones, displaying dynamically updated driving speed limit, determined automatically, by embedded electronic module, enabling multimodal measurement of traffic conditions (video, sound, and analysis of meteorological conditions). The intelligent road sign will communicate the speed calculated in relation to the information received from a row of similar signs placed along a stretch of highway that will communicate with each other via a wireless network remaining optionally adjustable remotely. Its development requires addressing a number of issues of research and technology, such as: effective, and independent of weather conditions, traffic estimation made on the basis of the simultaneous analysis of several types of data representation, the method of calculating the velocity gradient for various traffic situations considering the road topology, creating a platform for self-organizing and reliable wireless connection and performing scheduled on an adequate scale tests of prototypes. The planned implementation will lead to the development of products that increase road safety for which there exists worldwide market demand. The solution also fits into, in an original way, in the rapidly growing trend of development of communication of cars with the road infrastructure, enabling an access to digital road infrastructure for all drivers.

IDENT – Multimodal, biometric system for verification of bank client identity. The aim of the project is to develop a technology for automatic identity verification, resulting in high accuracy of the verification and increased efficiency of client verification systems. The project assumes developing a multimodal system, consisting of the hardware layer and a specialized software for data acquisition from various sensors, data processing and fusion, leading to a reliable verification of a bank client. The developed technology will be validated on a group of 10 000 persons.

HCIBRAIN – methods of human-machine interaction for diagnosis and stimulation of patients with severe brain damages. The project aims to develop an integrated multimodal system for stimulation of patients with brain damage, and for recording ABR, EEG and ERP signals, as well as gaze tracking. A validated procedure for diagnosis and polysensorial cognitive therapy will be developed, constituting an efficient and widely available approach to diagnosis and rehabilitation of patients who are unable to communicate, mainly those in coma. Six specialized medical centers will take part in evaluation of the developed prototype.

INPREDO – the main aim of the project is to develop an intelligent system for determining the optimal traffic speed limits. A set of tools will be developed for providing guidelines on the allowed traffic speed depending on various conditions. Detailed recommendations concerning criteria and procedures of setting traffic speed limits will also be created. An additional aim is to create Dynamic Maps showing the current state of roads, including the traffic density and the determined speed limits.

ALOFON – the project aims to develop a methodology of an automatic phonetic transcription of English speech, based on the analysis of audio and video. Relationships between the allophonic differentiation in speech and the objective signal parameters, will be researched. The project assumes that a method for detection of small differences in the allophones and accent, will be developed. The automated phonetic transcription method will be used in many solutions, including English language learning (especially in the remote learning), phonetic and phonologic research (language corpora processing), automatic accent recognition, etc.

MODALITY – a project realized by the Audio Acoustics Laboratory, Multimedia Systems Department and Intel Technologies Poland. The project aims to enhance the audio and audiovisual communication with mobile computers. The experiments are aimed at improving the parameters of the audio system on mobile computers and human-machine interfaces. The two main topics are: Smart Sound technology and audiovisual speech recognition.

INNOTECH – a system for spatial recognition of gestures with a feedback. A project realized by the Multimedia Systems Department and Samsung Electronics Poland, co-funded by the National Center for Research and Development (NCBR) as a part of the In-Tech program INNOTECH (INNOTECH-K1_IN1_41_159382_NCBR_12).

MULTIMODAL - A new range of computer multimodal interfaces and their implementation in education and medicine (6 ZR9 2007 C/06828). The aim of the project is to elaborate and implement novel methods of man-computer communication, interacting with a user through other means than a traditional mouse and keyboard. The computer will be able to contact with the user in several ways - through tracking eye movement and visual attention, through an "intelligent ball pen" in the cases were dyslexia therapy is necessary, or through tracking lips movement as a help for people with paralyzed hands.

MAYDAY EURO 2012 - Supercomputing Contextual Analysis Platform Multimedia Data Stream for Identification of Objects, or Hazardous Specified Events. A structural type of project under the Operational Programme Innovative Economy 2007-2013 Priority 2, Infrastructure R & D, Measure 2.3. Investments related to the development of infrastructure of science, Sub-measure 2.3.3. Projects in the development of advanced communication services and applications. Main tasks: construction of the platform CASCADE (Streams Thread for analysis of data from cameras for Applications-defining alarms) with fiber-optic network connections from specific locations in the Tri-city; development of algorithms and analysis services for multimedia streams necessary for the development of three pilot applications: (1) protection of intellectual property, (2) supporting medical research, (3) identification of persons and events; development of repository services for the construction of further applications, and the assessment of those, both qualitative and utilitarian.
Project page.
KASKADA platform fome page.

SYNAT (System for Science and Technics) – a research task realized in 2010-2013 by a network of 16 Polish scientific institutions. The aim of the project was development of a hosting and communication platform for digitized knowledge, utilized by researchers, scientific institutions, students, etc. The project was founded by the National Center for Science and Research (NCBR), SP/I/1/77065/10. Multimedia Systems Department realized three tasks: Semantic methods of data searching in large collections of text documents; Methodology of integration of heterogeneous knowledge sources; Subsystems for the analysis of multimedia repositories, archiving and searching for multimedia data.

SECURITY - Multimedia system assisting in identification and prevention of delinquency, including violence in schools) and terrorism (R00-O0005/3).
The project is supported by the Polish Internal Security Platform. Its results will allow to monitor the degree of security in stadiums, schools and other places threatened with acts of terror. The idea of the project is to design and develop teleinformatic tools that would supplement the functions of already existing audio and video monitoring systems. The extension will be a function of automatic image and sound interpretation, which will let computer systems automatically discover potential threats and generate alerts to appropriate services responsible for public order and security.

NOISE - Methods for monitoring of urban aglomerations using modern information technology solutions and geoinformation technologies (R02 010 01)
The project's aim is to elaborate teleinformatic tools capable of monitoring noise and road traffic in agglomerations. The concept of this project was used by the City Council of Gdansk. Independently, the Gdansk University of Technology signed a license agreement with DGT company concerning the implementation of intelligent wireless monitoring stations in other cities.

APARATY_SŁUCHOWE (HEARING_AIDS) - New Methods of Signal Processing for Hearing Aids Applications (3 T11E02829)
A project dedicated to special non-invasive hearing aids, especially for newborn babies.

LARYNX - New Electronic Device for Patients after Laryngectomy.
An original concept of the artificial larynx for persons after laryngectomy, i.e. after the larynx amputation. A digital larynx and a miniaturized synthesizer was designed and produced in cooperation with "Intech", a company from Gdansk. The project was subsidized by a dedicated grant from the Chief Technical Organization Federation of Scientific and Technical Associations. Now, "Intech" starts serial production of the larynx prosthesis licensed by the Gdansk University of Technology.

CEMET - Centre of Medical Technologies (FP5 Excellence Center)

International Center of Hearing and Speech, PROKSIM, Warsaw-Gdańsk (Excellence Center)

Dithering Strategy Applied to Tinnitus Masking (project co-founded by the Institute of Physiology of Hearing)

VoIP - Hybrid Speech Codec for VoIP Telephony Employing Combined Source and Perceptual Coding (No. 3 T11D 004 28).
A project dedicated to the invention and development of more effective speech coders intended for the Internet telephony.

SDSA - Engineering and introduction to clinical tests prototype series of digital speech prosthesis basing on spectral modification of signals in the auditory feedback loop - a project, whose aim was to miniaturize the speech prosthesis (once invented and developed in the Gdansk University of Technology) dedicated for stuttering persons.

INFOPILOT - Air force digital system for the recording and the restoration of speech (148346/C-T00/2002).
A system that records speech, improves its quality and realizes transmission between ground stations and military aircraft pilots. Implemented in 2005 in a Polish military pilots training school of Deblin.

Expert System for Automatic Classification of Singing Voices (3 T11F 023 30)

New Methods for Forming and Ranking Musical Rhythm Hypotheses in Musical Excerpts (3 T11F02729)

Development and Implementation of the Universal System for the Diagnosis of Environmental Noise (internal University grant)

Rozwój koncepcji i zastosowań inteligentnych technik multimedialnych - w ramach Subsydium dla Uczonych Fundacji na rzecz Nauki Polskiej

Development and implementation of the Universal System for the Diagnosis of Environmental Noise (internal University grant)

4T11D01422 - New methods for searching and discovering multimedia content in telecommunication networks

7T11E05220 - Method for the assessment of cochlear implants efficiency

8T11D00218 - Methods of sound processing for the purpose of multichannel multimedia transmission

8T11D02819 - Perceptual coding of audio employing intelligent decision algorithms

8T11E03415 - New algorithms of digital hearing aids and methods for hearing aid fitting

8T11D02112 - New methods of intelligent filtration and coding of audio

8T11E03310 - New methods for the diagnosis and therapy of hearing impairments employing digital signal processing technology

4PO5D01609 - Correcting of speech impairments basing on signal modification in the auditory feedback loop

8T11C02808 - Applications of artificial intelligence methods to data analysis and processing in acoustics

7TO7B02009 - New methods of digital sound synthesis

8S50302106 - Rough sets applications

8T11D00208 - Development of methods for digital restoration and processing of audio signals

8S50401005 - Digital restoration and processing of audio signals

883169203 - Computer speech recognition system