Terms of Reference CREATION OF DATASETS AND DEVELOPMENT OF 2 OPEN SOURCE USE CASES FOR MACHINE TRANSLATION ENGLISH KINYARWANDA At Translators without Borders

1. Background

CLEAR Global seeks a Rwanda based organization to create data sets and build machine translation (MT) solutions for 2 pre-defined use cases. The selected partner will become a part of our global network of partners that use advanced language technology to support people who will benefit from an ICT solution supported by an automated Kinyarwanda-English machine translation component to reach more users and help bridge communication barriers.

CLEAR Global is the managing contract for the GIZ Machine Translation initiative under the “Digital Solutions for Sustainable Development” (DSSD) program in Rwanda. This project aims to “improve the preconditions for the use of machine translation in the public sector and in the digital ecosystem” (see below for more about CLEAR Global and GIZ DSSD). CLEAR Global will have overall responsibility for the management of the project and ensuring that the selected partner develops the technical skills to deliver an MT solution (including data set creation), in a content agnostic way, and an MVP version of the use case application by the end of the contract.

The overall objectives for this contract are:

  • With support from CLEAR Global in a coaching/mentoring role, data set creation, community engagement and development of 2 open source use cases for MT solutions (Kinyarwanda-English)
  • Manage the project to the agreed project plan,
  • Ensure the sustainability of the solution and the organization after the term of the contract has ended.

The MT project shall display the following broad features.

  • Scalable to other topics: While this project will focus on data set creation and development based on an agreed use case(s), the MT solution, and overall approach, shall have the potential to be scaled to other relevant topics and domains in the future to enable and enhance user interaction.
  • Knowledge transfer and knowledge sharing: This project also aims to ensure that skills transfer occurs in the process of the product development. A specific focus will be knowledge sharing between open-source communities. Therefore, the MT solution will be available open-source and training and documentation shall be provided to the wider community, in addition to the selected partner.

Two use cases, that have been identified in a preceding HCD phase, will be developed further under this contract. The 2 uses cases are:

  1. Tourism: Help tourists and locals to communicate effectively
  2. Education: enable eLearning

In addition to CLEAR Global and the partner, the use case product owners will play a significant role in developing sustainable solutions.

The Rwandan team will do the data set creation and technical development work, with CLEAR Global responsible for coaching and guiding the team as well as overall project management.

This division of labor between CLEAR Global and the Rwandan organization follows from GIZ’s mandate to strengthen the local ecosystem and aims to place the data set creation and technical development with the Rwandan organization. It is hoped that this project will enable the Rwandan organization to grow their capacity in data set creation and machine translation use case development as a catalyst to further projects of this kind.

A critical factor will be working with the managers of the Rwandan organization, ensuring that they take proactive leadership and have the skills and tools to build their team and work toward organizational sustainability. CLEAR Global will support the partner to develop their own network and future use cases, and build a sustainability plan to support their organization independently and as a part of CLEAR Global’s global network.

The Rwandan team shall have the following qualifications – please note that these are maximum requirements and organizations that do not have all these qualifications are also eligible to apply. CLEAR GLOBAL’s mentoring and coaching aims to support capacity development where it is needed (for the full personnel concept see point 4 below):

  • Experience in machine learning, artificial intelligence and natural language processing
  • Experience in full stack software development
  • Experience in mobile app development
  • Experience in UI/UX development and user-centered design
  • Soft skills including communication, project management and design
  • Community engagement
  • Existing sustainability model and plan to continue work in this space, e.g. a Rwandan startup with a business plan to be active in the AI space

2. Tasks to be performed by the Rwandan organization

Initially, CLEAR Global will work closely with the partner to develop and agree on detailed timelines (based on defined milestones), activities and deadlines based on the work packages described below. We will use an iterative and agile approach where possible and continuously assess project priorities. Through the process, the tasks will be gradually handed to the Rwandan organization.

CLEAR Global will convene a kick-off workshop bringing all stakeholders together in order to understand the roles of each partner, set expectations, and finalize the workplan. At the mid-point CLEAR Global will introduce the Rwandan team to partner organizations with an interest in supporting the development of use cases via a jointly planned workshop to demonstrate the prototype and start drumming up interest in the solution and develop potential use cases. This workshop will help the Rwandan partner to start finding other sources of funding and sustaining the solution. Finally, the Rwandan organization will lead in the final “end of project” workshop, to include the wider ecosystem of machine learning experts, as well as to demonstrate what the project was able to do.

Specifically, the Rwandan organization is required to deliver the following services:

Work package 1: Actively engage with CLEAR Global in a coaching and mentoring arrangement for the data set creation and technical development of two machine translation use cases

  • Working with CLEAR Global’s team lead, develop a skills audit and agree a coaching and mentoring plan based on the requirements of the project
  • Identify appropriate team members to participate in the coaching/mentoring and ensure that they are available and have the tools needed to make best use of the relationship. CLEAR Global currently plans on weekly individual coaching sessions with all members of the Rwandan team on
    • technical development including artificial intelligence, machine learning, natural language processing, software development and deployment
    • Agile project management
    • Product Management
    • Monitoring and evaluation and metrics
    • Community engagement and language related issues
  • Actively participate in regular workshops for developer team on key topics (e.g. neural machine translation (NMT), functional and non-functional requirements)
  • Manage the team to ensure adequate knowledge transfer

Work package 2: Dataset creation (2 domain-specific parallel text corpora English Kinyarwanda) and publication

The objective of this work package is to create two (2) domain-specific parallel text corpora in English-Kinyarwanda (one for each identified use case). Each dataset shall have ideally 100,000 domain-specific sentence pairs in total as a basis for developing translation models for the two identified use cases. However, please note that the exact number of sentences required will need to be determined by the contractor based on the performance of the deployed NMT model (see below).

For this work package, the data collection and curation efforts shall be conducted by a community of contributors (volunteers). This community can include, among others, local talent in translation and interpretation, university students in relevant disciplines, and other members of the Rwandan community.

The datasets shall be made openly available to the local AI community in Rwanda to foster local innovation.

Specifically, the tasks of the service provider include the following:

  • Compile existing open-source Kinyarwanda-English parallel text corpora
  • Create and implement marketing and community engagement campaign to mobilize community and recruit volunteers
  • Develop web-based platform for crowdsourcing sentence pair collection. The platform should have a simple user interface and allow contributors to add original sentences in Kinyarwanda or English and to provide translations for the sentences stored in the database.
  • Develop and implement incentive scheme to mobilize community and maximize contributions
  • Create a sign up and login system and database for user and community management. This will be ideally connected to CLEAR Global’s Translators without Borders community.
  • Populate web-based platform with source phrases and implement test run
  • Organize at least two (2) hackathons or other community engagement events jointly with the local tech partner and other stakeholders to strengthen the community-based data collection effort
  • Implement quality control mechanism to validate English-Kinyarwanda sentence pairs
  • Package two (2) domain-specific datasets and make them publicly available for the Rwandan AI community on a platform to be selected in close coordination with the DSSD team

Work package 3: Technical development of two open-source machine translation use cases including regular user testing and feedback

The objective of this work package is to develop two (2) machine translation use cases in partnership with the respective product owners which have been identified previously through a user-centered design process led by DSSD and a human centered design (HCD) firm (see above). The product owners will assist in the design of the machine translation engine and advise on model deployment for practical use cases as best suited to the production environment of their business services. A continuous user testing and feedback approach shall be implemented along with the development process.

Additionally, the deployed translation engines for each use case shall be evaluated using automatic (ROGUE, BLEU, ChrF) and human evaluation. Evaluation scores shall be compared to industry standard NMT models to determine how ready the system(s) are for a production environment.

Specifically, the tasks of the contractor include the following:

1. Develop a software development roadmap for the machine translation use cases

  • Develop roadmap detailing approach for agile development of the machine translation use cases. This roadmap shall include development iterations, timelines, and responsibilities.

2. Design requirements and architecture framework for machine translation use cases

  • Requirements engineering for the AI-enabled system; identify and define functional and non-functional requirements for the machine translation use cases based on specifications provided by the HCD firm collaboratively with the product owners.
  • Evaluate and report toolkits and software to be used for components of the machine translation use cases (e.g., deep learning frameworks, pretrained models for transfer learning (such as OpenNMT or JoeyNMT)). The components should be open source where possible. Evaluate and recommend technology for integration of solution components.
  • Regularly re-assess architecture and recommend adaptations where necessary
  • All assessments should consider target business processes of the product owners as well as the “Principles for Digital Development”. These principles include inter alia the requirement to design with the user, the use of Open Standards, Open Data, Open Source, and Open Innovation wherever possible.
  • Ensure security and privacy by design to ensure adherence with GDPR and data privacy concerns.

3. Develop open-source NMT models and integrate with appropriate front-end and back-end

  • Develop two (2) use case translation models with appropriate software based on the product architecture – the software development process should entail a release of code under an appropriate open-source license (or contribution to existing open-source projects in this field), use of open standards and APIs as well as a documentation of code allowing third parties to re-use the software, as far as possible.
  • Develop and execute a concept for continuous technical testing of the translation use cases based on defined functional and non-functional requirements
  • Develop and/or adopt Natural Language Understanding models for Kinyarwanda and English including semantic representation for word sense disambiguation. The models should also be released as open-source models. Quantitative metrics and qualitative performance measurements will be shared with DSSD and the product owners for the two use cases.
  • Front-end integration with end-user platforms according to the identified use case(s). The choice of channels will depend on the two use cases as well as on user research and testing to be conducted by the contractor.
  • Integration with any backend such as CMS, or other backend systems in use by the product owner’s IT infrastructure.
  • Support quality assurance including a focus on potential biases, e.g., gender bias, in close cooperation with DSSD and product owners where appropriate. o Continuously deploy system during the duration of the contract and in cooperation with the implementation partner where necessary.
  • Continuously update and maintain the translation engine during the term of the contract.
  • Prepare, and execute hand-over of use case(s) and related open AI training data, open models as well as open-source software to relevant stakeholders and release on respective repositories.

4. Design an automated testing strategy for the machine translation use case pipeline

  • Develop and execute a concept including methodology for end-to-end testing to ensure that the deployed use cases meet expectations of the product owners. Accordingly, gather feedback from test users, which should include the HCD firm and other target groups as identified by the research, DSSD, and product owners to adapt the translation engine based on collected feedback.
  • Continuously perform integration tests to ensure optimal synchronization between modules/components of the deployed use case translation application.
  • Continuously perform unit testing to check model performance after feature engineering and/or new input data.
  • Additionally, perform production monitoring to measure the readiness of the translation engine for deployment. This includes conducting checks on training reproducibility in the use case service environment, computational performance (useful system usage metrics for cloud costs estimation), and dependency changes through the pipeline (to notify of upgrades or other version changes).

Work package 4: Project management, coordinating use case development and process implementation, including technical documentation and handover and community outreach

The objective of this work package is to effectively manage the project based on state-of-the art project management standards, including assuming a coordination role within the consortium of stakeholders.

Specifically, this includes:

1. Project management

  • Set up project management based on scrum framework (e.g., including iterations, sprints, retros, daily stand-ups)
  • Implement and regularly update appropriate project management tool and ensure adoption of tool within the team
  • Develop detailed project roadmap based on solution architecture, work packages, and pre-defined timeframes in cooperation with the team
  • Regularly review and adapt iterations cycle and iterations with the team
  • Define sprints and allocate tasks to team members, ensure timely completion of tasks
  • Use of effective strategies to synchronize, control, and integrate efforts of team members for overall attainment of the project goals.

2. Stakeholder coordination, engagement

  • Effectively coordinate and communicate with key project stakeholders, including the DSSD team, the product owner, and other contractors (i.e., HCD firm)
  • Ensure smooth collaboration of members of project consortium and other stakeholders
  • Maintain an efficient communication according to project management standards with all stakeholders

3. Create project documentation, guides on how to develop and implement machine translation use cases

  • Document all steps during development and implementation of the two use cases, including dataset creation, model development and software engineering
  • Develop playbook on how to develop and implement a machine translation project, including lessons learned and technical documentation
  • Develop training manuals (both development, integration, deployment as well as operational) to support proper handover and training to the Rwandan tech partner team and wider tech community

Work package 5: Business model development with use case clients (product owners)

In a last step, collaboratively with two product owners, the contractor will design two business models respective to each of the client’s machine translation use case.

1. Develop and implement sustainability plan

  • Develop a concept for ongoing and future operations (sustainability plans/scenarios) of the solutions in close cooperation with the Rwandan team and identify a product owner for production and hand over in close coordination with DSSD.
  • Draft operational plan in close cooperation with product owners that proposes how the use case translation engine can be operated, maintained and updates after the end of this contract.

2. Support product owners with developing a business model

  • Organize at least one (1) workshop with each product owner to support them with the development of an open-source business model for their use case. This should include:
  • Identify potential revenue streams and financing options
  • Preparing a pitch for securing future investment
  • Marketing
  • In essence, upon completion of this last work package, the contractor will deliver a framework that clearly describes the newly created business value of the translation use cases’ open datasets and models to the product owner, and how to sustain access to these resources for Rwanda’s AI community.

Certain milestones, as laid out in the table below, are to be achieved by certain dates during the contract term, and at particular locations:

Milestones & Deadlines

Work package 1 – Skill assessment and corresponding training plans

0.5 month after conclusion of

contract

Work package 2 – Compilation of existing open-source corpora and engagement of translator community

1 month after conclusion of

contract

Work package 2 – Development of web-based platform for sentence pair collection

1 month after conclusion of

contract

Work package 2 – Collect source and target phrases using the web-based platform.

3.5 months after conclusion of

contract

Work package 2 – Packaging 2 domain-specific datasets and making them publicly open

4 months after conclusion of

contract

Work package 3 – Deliver MVP of first use case

6.5 months after conclusion of contract

Work package 3 – Build pipeline and test MT use case. Deploy & Test (use case 1)

6.5 months after conclusion of contract

Work package 3 – Unit, Integration, and E2E testing successfully completed (use case 1)

7 months after conclusion of contract

Work package 3 – Deliver MVP of second use case

8 months after conclusion of contract

Work package 3 – Build pipeline and test MT engine. Deploy & Test (use case 2)

8 months after conclusion of contract

Work package 3 – Unit, Integration, and E2E testing successfully completed (use case 2)

8.5 months after conclusion of contract

Work package 3 – Go-live for two use cases

9 months after conclusion of contract

Work package 4 – Project management, Product management & Use case development

Continuous till handover

Work package 4 – Handover of playbooks for tech and programs, lessons learned and technical documentation to Rwanda tech partner team

10 months after conclusion of the contract

Work package 5 – Business model development with use case clients (sustainability strategy)

10 months after conclusion of the contract

Final report overall project implementation report

11 months after conclusion of the contract

3. Concept and project management of the contractor

In the bid, the bidder is required to show how the objectives defined in Chapter 2 are to be achieved, if applicable under consideration of further specific method-related requirements (technical-methodological concept).

Technical-methodological concept

Strategy: Bidders should consider the tasks to be performed in light of the objectives of the services. Bidders should present a strategy, which includes an overview for each of the work packages.

The bidder should describe the work steps needed and take account of the milestones and contributions of other actors in accordance.

The bidder should also describe its potential plan and strategy to use the MT model after the end of the contract and to contribute to related open AI resources..

Project management

The Rwandan organization is responsible for selecting, preparing, training (for everything needed beyond the coaching/mentoring provided by CLEAR Global) and steering their experts assigned to perform the tasks. They will also manage their own costs and expenditures, accounting processes and invoicing in line with the requirements of CLEAR Global and GIZ. The bidder is required to draw up a personnel assignment plan with explanatory notes that lists all the experts proposed in the bid; the plan includes information on assignment dates (duration and expert days) and locations of the individual members of the team complete with the allocation of work steps as set out in the schedule.

4. Personnel concept

The bidder is required to provide personnel who are suited to filling the positions described, on the basis of their CVs, the range of tasks involved and the required qualifications.

Not all of the qualifications specified are required; CLEAR Global does not expect that any bidder will have all of the requirements. Therefore, organizations that do not meet all of the criteria will also be considered. The personnel concept can include individual persons for multiple roles, but the team has to include all profiles listed below.

  1. NLP Engineer: Expert with basic experience in Natural Language Processing for under-resourced languages, especially for Kinyarwanda.
  2. Project/Team Lead: Expert with experience in leading a small team as well as IT project management.
  3. Tech Lead: Expert with professional experience in software engineering (server side) and experience from mobile app development (iOS & Android)
  4. QA Engineer
  5. Language lead (Kinyarwanda)/community engagement lead: responsible for linguistic guidance and community engagement (crowdsourcing and mobilising community of volunteers)
  6. Product Manager: First experience with product management, ideally for AI products. Experience with defining personas and user stories, use case scoping.
  7. Data Scientist: Expert with experience in data collection and analysis for language corpora in Kinyarwanda.

5. Costing requirements

Assignment of personnel

The Rwandan developer team should plan for 244 expert days.

The bidder should provide a personnel assignment plan, detailing the number of days foreseen for each of the specified 5 expert roles.

The total budget assigned to the Rwandan team is 51.020,00 EUR

Equipment & hosting

If required, a fixed budget for procurement of equipment and hosting incurred during the project is available (e.g., computer, data collection tools). Reimbursement is against evidence and with prior approval from CLEAR Global.

6. Inputs of other actors

GIZ is expected to make the following available: Workshops logistics

CLEAR Global will:

  • Lead on the overall project implementation
  • Provide regular, structured support and guidance to the Rwandan organization
  • Make a playbook for the development of the machine translation solutions openly available
  • Ensure that there is good communication and dialogue between the partners in the project.

7. Requirements on the format of the bid

The structure of the bid must correspond to the structure of the ToRs. It must be legible (font size 11 or larger) and clearly formulated. The bid is drawn up in English.

The complete bid shall not exceed 10 pages (excluding CVs).

The CVs of the personnel proposed in accordance with the ToRs must be submitted. The CVs shall not exceed 4 pages. The CVs must clearly show the position and job the proposed person held in the reference project and for how long. The CVs should be submitted in English.

If one of the maximum page lengths is exceeded, the content appearing after the cut-off point will not be included in the assessment.

Please calculate your price bid based exactly on the aforementioned costing requirements.

Requirements:

The bids will include the organization’s

  • Commitment to working with marginalized communities (leaving no one behind)
  • Technical expertise (full stack software development, machine learning, NLP)
  • Ensure experience of project management and design
  • Present an articulated 3-5 sustainability model
  • CVs of relevant staff
  • References
  • Registration, confirmation of bank account and other due diligence requirements

The selection process will be in line with the requirements of the German procurement law and CLEAR Global’s internal procedures

CLEAR Global

CLEAR Global seeks to build real partnerships with local and global partners to ensure that people who speak marginalized languages have access to the information they need and want and can make informed decisions to survive and thrive.

CLEAR Global is building a network of local language technology and language services organizations that can both work independently and as part of a global partnership with CLEAR Global and our global partners. The strategy is to identify qualified technologists and language providers whose objectives and mission are similar to our own – to use technology and language technology to support people to get information and be heard. CLEAR Global will work with them on specific use cases with the partner organization benefiting from CLEAR Global’s global team of experts. The partner organization will first work closely with CLEAR Global’s team to get experience with CLEAR Global’s content agnostic playbooks, learning how to adapt them to the context and building agile management skills and experience. CLEAR Global will work with the partner organization to develop sustainability plans and to ensure that they have networks and contacts in their context. As projects and use cases expand in each context, the partner organization will need less support from the global team and move toward greater independence and into supporting other organizations in our network.

Key to our approach is to ensure that the partner organizations learn from each other; support CLEAR Global and our global partners to scale language enabled technology solutions to as many contexts as possible; can work independently when necessary; and have the knowledge and understanding both of the technology and the context to add real value and help marginalized language speakers get the information they need and want in a language and format they are most comfortable.

About GIZ, the DSSD program and DigiCenter

The Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) GmbH is a federally owned international cooperation enterprise for sustainable development with worldwide operations. GIZ has worked in Rwanda for over 30 years. The primary objectives between the Government of Rwanda and the Federal Republic of Germany are poverty reduction and promotion of sustainable development. To achieve these objectives, GIZ Rwanda is active in the sectors of Decentralization and Good Governance, Economic Development and Employment Promotion, Energy, and ICT (Information and Communications Technology).

The program “Digital Solutions for Sustainable Development” (DSSD) aims to promote the development of digital solutions, digital inclusion and professional ICT skills and capacities. In 2019, DSSD opened the Digital Transformation Center as a hub for innovation and collaboration among public and private sector, academia, and civil society. Over time, other projects such as FAIR Forward, Make-IT in Africa, Africa Cloud, and the Support Program to the Smart Africa Secretariat joined the Digital Transformation Center.

One of the focus areas of the Digital Transformation Center is Artificial Intelligence. The DigiCenter contributes to sustainable and open approaches to AI in Rwanda and at the regional level. This includes skills and capacity building, creating and opening up AI training data, and developing policy frameworks to promote AI adoption in Rwanda. A specific objective is to make AI solutions available for the Rwandan population in Kinyarwanda.

Therefore, the DigiCenter and particularly the DSSD project contribute to creating AI training data sets and piloting AI-based solutions in the areas of natural language processing (e.g., voice assistants) and machine translation.

Bids should be sent to bids@clearglobal.org by 23:59 CET 2 December, 2022

(Note – if you experience any issues with the above email please try milena.haykowska@clearglobal.org or andrew.bredenkamp@clearglobal.org )

How to apply

Bids should be sent to bids@clearglobal.org by 23:59 CET 2 December, 2022

(Note – if you experience any issues with the above email please try milena.haykowska@clearglobal.org or andrew.bredenkamp@clearglobal.org )

Job details

Share this job