Description

A prerequisite for running applications in a cloud environment is the presence of a container registry. Another common scenario is users performing machine learning workloads in such environments. However, these types of workloads require dedicated infrastructure to run properly. We can leverage these two facts to help users save resources by storing their machine learning models in OCI registries, similar to how we handle some WebAssembly modules. This approach will save users the resources typically required for a machine learning model repository for the applications they need to run.

Goals

Allow PyTorch users to save and load machine learning models in OCI registries.

Resources

Looking for hackers with the skills:

ai mlops pytorch oci cloud

This project is part of:

Hack Week 24

Activity

  • 11 months ago: horon liked this project.
  • 11 months ago: jguilhermevanz started this project.
  • 11 months ago: jguilhermevanz added keyword "ai" to this project.
  • 11 months ago: jguilhermevanz added keyword "mlops" to this project.
  • 11 months ago: jguilhermevanz added keyword "pytorch" to this project.
  • 11 months ago: jguilhermevanz added keyword "oci" to this project.
  • 11 months ago: jguilhermevanz added keyword "cloud" to this project.
  • 11 months ago: jguilhermevanz originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios

    Description

    Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.

    This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.

    The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.

    Goals

    By the end of Hack Week, we aim to have a single, working Python script that:

    1. Connects to Prometheus and executes a query to fetch detailed test failure history.
    2. Processes the raw data into a format suitable for the Gemini API.
    3. Successfully calls the Gemini API with the data and a clear prompt.
    4. Parses the AI's response to extract a simple list of flaky tests.
    5. Saves the list to a JSON file that can be displayed in Grafana.
    6. New panel in our Dashboard listing the Flaky tests

    Resources