AI Competition Life Cycle

Data Science Competition Life Cycle

Data science competitions are at the heart of our mission to democratise AI. These competitions are designed to bridge the gap between companies seeking innovative solutions to data problems and the talented data scientists eager to tackle these challenges. This section outlines the structured process of how data science competitions are envisioned to work within the bitgrit ecosystem and Network.

Problem Definition and Preparation

There are two ways in which challenges could be provided for data science competitions. One is when a Competition Provider already has a problem they want to solve, and set a competition through a Infrastructure Provider. The other is when an Infrastructure Provider independently sets challenges that are either currently needed or likely to be needed in society to stoke innovation.

When a Competition Provider already has a problem they want to solve, the process is straightforward. The challenge is used in the competition as it is, or with modifications (if the company does not want to publicly disclose their own challenges), according to the needs of the sponsor. On the other hand, when an Infrastructure Provider independently sets challenges, they are based on discussions among members with diverse backgrounds and societal stakeholders. The goal is to set challenges that are likely to become popular topics or will be needed in society in the future, and the data is prepared through unique channels.

For small business and enterprise competitions, the Infrastructure Provider is expected to work with the entity to establish the problem statement and data set. Thereafter, the Infrastructure Provider takes on the responsibility of preparing the necessary data (gather, clean, and structure the data, preparing it for analysis). This ensures a level playing field for all participants and saves valuable time that would otherwise be spent on data preprocessing.

Formulating Mathematical and Machine Learning Problems

With the problem and data in hand, a mathematical and machine learning problem is formulated. This step is essential to provide participants with a clear understanding of the task at hand. A balance between complexity and feasibility is sought, enabling both beginners and experts to participate.

Hosting the Competition

The competition phase typically spans more or less two months, during which participants, including data scientists and machine learning enthusiasts (collectively, Data Scientists), compete to develop the most effective solutions. We offer continuous support, maintaining an open line of communication with all participants. Effective marketing efforts are deployed to engage our community of experts, ensuring a diverse range of perspectives and ideas.

Competition Judgment and Reward

Upon the conclusion of the competition, the smart contract set up by the Infrastructure Provider will ensure that the top contenders receive their well-deserved recognition and rewards. The Infrastructure Provider must validate that the algorithms submitted comply with all the rules and guidelines, where additional restrictions have been added beyond the Smart Contract.

Each competition culminates with a leaderboard, showcasing the performance of all submissions. The top-performing individuals or teams (usually top three) are crowned the winners, with the prize contingent upon:

  • Delivery of winning solutions accompanied by comprehensive documentation; and

  • Granting a global, perpetual, non-exclusive license for the client's commercial use of the solution.

The top Data Scientists are awarded a share from the net prize pool based on their ranking in the competition, fostering a competitive yet collaborative spirit within the community. The Competition Provider will provide the prize money for the competitions in USD or stablecoins (some instances together with BGR).

The Infrastructure Provider may charge a commission on each competition (% on a case-by-case basis) as a fee for hosting and facilitating the competition.

Leaderboard

The "leaderboard" in a data science competition is a ranking system that publicly displays the performance of participants' models based on specified evaluation metrics such as accuracy or F1 score. The leaderboard indicates how well participants' models perform compared to other participants. The benefits of using a leaderboard in a competition are as follows:

1. Motivation: The leaderboard stimulates competitiveness and provides participants with the motivation to strive for better results. Seeing their own rank and its changes on the leaderboard serves as a driving force for participants to invest time and effort into improving their position.

2. Visibility: High ranks on the leaderboard enhance visibility and recognition within the data science community. Participants' names may gain attention and they may be recognized as exceptional data scientists by companies and potential employers, contributing to the activation of both participants and the community as a whole.

3. Eligibility for rewards: Participants who achieve top rankings are eligible for rewards. The leaderboard plays a critical role in determining the winners (as well as detecting fraudulent activities) and deciding the eligibility for rewards. This aspect is important for participants to compete either for recognition or monetary rewards.

As a new feature to further enhance data science competitions in the future, it is also being considered to provide a platform for participants to exchange information. In such a platform, participants who have achieved top rankings can engage in discussions with one another, allowing them to learn approaches and techniques they may not have been aware of. Such an exchange platform provides a cooperative environment where participants can discuss strategies, share ideas, and learn from each other.

Last updated