Project here: https://confluence.suse.com/display/AAI/HackWeek19 Will keep working out of HackWeek as "best effort" personal project to make it evolve and keep learning.

What this project is about?

Data Scientist ofter starts working on their laptop before moving into company resources. As in many other cases they have to solve many challenges by themselves before actually start working on "their stuff". The idea is to build a prototype we will eventually try to evolve in a product that answers the following pre-requisites:

  • Rapid Time to work: I, as Data Scientist or Data Engineer, need to install the playground quickly and be ready to work
  • Everything at the right place: I as Data Scientist or Data Engineer want an easy way to find things and use them
  • No time to waste: I as Data Scientist or Data Engineer want to be able to replicate the model synchronizing it with another infrastructure through a "click and done" model
  • No complexity rule: I as Data Scientist or Data Engineer want to avoid waste time in complex configurations or debug things. Complexity needs to hided to me

Project Team requirements

Because this is a first attempt to prototype I have to ask for some "not official" rules to be applied:

  • Max 7/9 people in the team with a max of 3 Engineers
  • If you apply you have to make yourself available from 10 am to 5 pm CET (if you're on a different time zone you have to consider we'll have a lot of team discussion so could be challenging)
  • This is a 5 days sprint approach where everyone needs to be open, collaborative, bold, creative.

FAQ

  • I'm not an engineer or an expert: Great this project require (possibly) at least 1 person from marketing, sales-engineering, services, support
  • Am I required to code?: No, but you're required to share your ideas and views, while the end goal is to build a prototype (that's why we need a couple of engineers) the scope is to have something to show and demonstrate we may build something useful for the Data Scientist community
  • Woah this seems to be a super serious project: Nah it's a fun experiment to learn how much we may push our limit through rapid prototyping and "be different"
  • So how do I signup?: easy just join the team here on hackweek and/or contact me alessandro.festa@suse.com for further details.

This project is part of:

Hack Week 19

Activity

  • almost 5 years ago: jordimassaguerpla liked this project.
  • almost 5 years ago: rsblendido joined this project.
  • almost 5 years ago: jeffpr joined this project.
  • almost 5 years ago: FSzekely liked this project.
  • almost 5 years ago: bfilho left this project.
  • almost 5 years ago: bfilho joined this project.
  • almost 5 years ago: bfromme liked this project.
  • almost 5 years ago: bfromme joined this project.
  • almost 5 years ago: rsblendido liked this project.
  • almost 5 years ago: gboiko liked this project.
  • almost 5 years ago: afesta added keyword "innovation" to this project.
  • almost 5 years ago: afesta added keyword "projectmanagement" to this project.
  • almost 5 years ago: afesta added keyword "ai" to this project.
  • almost 5 years ago: afesta added keyword "artificial-intelligence" to this project.
  • almost 5 years ago: afesta added keyword "machinelearning" to this project.
  • almost 5 years ago: afesta added keyword "prototype" to this project.
  • almost 5 years ago: afesta added keyword "agile" to this project.
  • almost 5 years ago: afesta liked this project.
  • almost 5 years ago: afesta started this project.
  • almost 5 years ago: afesta originated this project.

  • Comments

    • hennevogel
      almost 5 years ago by hennevogel | Reply

      Can you explain what kind of output you would expect? Like an application? A set of packages? Some IaC description?

      • afesta
        almost 5 years ago by afesta | Reply

        This is something we have to decide during the hack week, usually a prototype based on a target of the challenge decided by the team. If this will be simple artifacts made of a sum of existing items, an application or a set of packages has to be decided. The scope is to foster innovation under a very fast cycle (5 days) and get a result that allows us to learn if: is doable, what we need to address to make it a real product and how long could take. Don't expect huge development or impossible challenges, this is about pure innovation and ideas.. and build a way to demonstrate our idea.

    • afesta
      almost 5 years ago by afesta | Reply

      This is something we have to decide during the hack week, usually a prototype based on a target of the challenge decided by the team. If this will be simple artifacts made of a sum of existing items, an application or a set of packages has to be decided. The scope is to foster innovation under a very fast cycle (5 days) and get a result that allows us to learn if: is doable, what we need to address to make it a real product and how long could take. Don't expect huge development or impossible challenges, this is about pure innovation and ideas.. and build a way to demonstrate our idea.

    • bmwiedemann
      almost 5 years ago by bmwiedemann | Reply

      If you have a need for this project for 2x NVIDIA Tesla T4, 16GB - ping me.

      • afesta
        almost 5 years ago by afesta | Reply

        So cool! To be honest I'll like more to use your brain for the project...willing to give me a chance and have fun for a week with this crazy PM?

    • rsblendido
      almost 5 years ago by rsblendido | Reply

      Is this about Kubeflow?

      • afesta
        almost 5 years ago by afesta | Reply

        Could be. I mean the only "constraint" is that ideally should work on a laptop and Kubeflow works on K8's but if you use something like MLRun you may overcome many challenges. The ultimate goal of the project is to provide Data scientists a playground so that they do not need to learn and install and configure everything but it's easy enough to start from your laptop (and eventually) move it to a server/cloud environment.

    • jeffpr
      almost 5 years ago by jeffpr | Reply

      @afesta : I will be working with you for the SUSEcon demos - just thought I would hop in here when I can.

      • afesta
        almost 5 years ago by afesta | Reply

        Cool!

    Similar Projects

    Save pytorch models in OCI registries by jguilhermevanz

    Description

    A prerequisite for running applications in a cloud environment is the presence of a container registry. Another common scenario is users performing machine learning workloads in such environments. However, these types of workloads require dedicated infrastructure to run properly. We can leverage these two facts to help users save resources by storing their machine learning models in OCI registries, similar to how we handle some WebAssembly modules. This approach will save users the resources typically required for a machine learning model repository for the applications they need to run.

    Goals

    Allow PyTorch users to save and load machine learning models in OCI registries.

    Resources


    Automated Test Report reviewer by oscar-barrios

    Description

    In SUMA/Uyuni team we spend a lot of time reviewing test reports, analyzing each of the test cases failing, checking if the test is a flaky test, checking logs, etc.

    Goals

    Speed up the review by automating some parts through AI, in a way that we can consume some summary of that report that could be meaningful for the reviewer.

    Resources

    No idea about the resources yet, but we will make use of:

    • HTML/JSON Report (text + screenshots)
    • The Test Suite Status GithHub board (via API)
    • The environment tested (via SSH)
    • The test framework code (via files)


    COOTWbot by ngetahun

    Project Description

    At SCC, we have a rotating task of COOTW (Commanding Office of the Week). This task involves responding to customer requests from jira and slack help channels, monitoring production systems and doing small chores. Usually, we have documentation to help the COOTW answer questions and quickly find fixes. Most of these are distributed across github, trello and SUSE Support documentation. The aim of this project is to explore the magic of LLMs and create a conversational bot.

    Goal for this Hackweek

    • Build data ingestion Data source:
      • SUSE KB docs
      • scc github docs
      • scc trello knowledge board
    • Test out new RAG architecture

    • https://gitlab.suse.de/ngetahun/cootwbot


    ghostwrAIter - a local AI assisted tool for helping with support cases by paolodepa

    Description

    This project is meant to fight the loneliness of the support team members, providing them an AI assistant (hopefully) capable of scraping supportconfigs in a RAG fashion, trying to answer specific questions.

    Goals

    • Setup an Ollama backend, spinning one (or more??) code-focused LLMs selected by license, performance and quality of the results between:
    • Setup a Web UI for it, choosing an easily extensible and customizable option between:
    • Extend the solution in order to be able to:
      • Add ZIU/Concord shared folders to its RAG context
      • Add BZ cases, splitted in comments to its RAG context
        • A plus would be to login using the IDP portal to ghostwrAIter itself and use the same credentials to query BZ
      • Add specific packages picking them from IBS repos
        • A plus would be to login using the IDP portal to ghostwrAIter itself and use the same credentials to query IBS
        • A plus would be to desume the packages of interest and the right channel and version to be picked from the added BZ cases


    Research how LLMs could help to Linux developers and/or users by anicka

    Description

    Large language models like ChatGPT have demonstrated remarkable capabilities across a variety of applications. However, their potential for enhancing the Linux development and user ecosystem remains largely unexplored. This project seeks to bridge that gap by researching practical applications of LLMs to improve workflows in areas such as backporting, packaging, log analysis, system migration, and more. By identifying patterns that LLMs can leverage, we aim to uncover new efficiencies and automation strategies that can benefit developers, maintainers, and end users alike.

    Goals

    • Evaluate Existing LLM Capabilities: Research and document the current state of LLM usage in open-source and Linux development projects, noting successes and limitations.
    • Prototype Tools and Scripts: Develop proof-of-concept scripts or tools that leverage LLMs to perform specific tasks like automated log analysis, assisting with backporting patches, or generating packaging metadata.
    • Assess Performance and Reliability: Test the tools' effectiveness on real-world Linux data and analyze their accuracy, speed, and reliability.
    • Identify Best Use Cases: Pinpoint which tasks are most suitable for LLM support, distinguishing between high-impact and impractical applications.
    • Document Findings and Recommendations: Summarize results with clear documentation and suggest next steps for potential integration or further development.

    Resources

    • Local LLM Implementations: Access to locally hosted LLMs such as LLaMA, GPT-J, or similar open-source models that can be run and fine-tuned on local hardware.
    • Computing Resources: Workstations or servers capable of running LLMs locally, equipped with sufficient GPU power for training and inference.
    • Sample Data: Logs, source code, patches, and packaging data from openSUSE or SUSE repositories for model training and testing.
    • Public LLMs for Benchmarking: Access to APIs from platforms like OpenAI or Hugging Face for comparative testing and performance assessment.
    • Existing NLP Tools: Libraries such as spaCy, Hugging Face Transformers, and PyTorch for building and interacting with local LLMs.
    • Technical Documentation: Tutorials and resources focused on setting up and optimizing local LLMs for tasks relevant to Linux development.
    • Collaboration: Engagement with community experts and teams experienced in AI and Linux for feedback and joint exploration.


    FamilyTrip Planner: A Personalized Travel Planning Platform for Families by pherranz

    Description

    FamilyTrip Planner is an innovative travel planning application designed to optimize travel experiences for families with children. By integrating APIs for flights, accommodations, and local activities, the app generates complete itineraries tailored to each family’s unique interests and needs. Recommendations are based on customizable parameters such as destination, trip duration, children’s ages, and personal preferences. FamilyTrip Planner not only simplifies the travel planning process but also offers a comprehensive, personalized experience for families.

    Goals

    This project aims to: - Create a user-friendly platform that assists families in planning complete trips, from flight and accommodation options to recommended family-friendly activities. - Provide intelligent, personalized travel itineraries using artificial intelligence to enhance travel enjoyment and minimize time and cost. - Serve as an educational project for exploring Go programming and artificial intelligence, with the goal of building proficiency in both.

    Resources

    To develop FamilyTrip Planner, the project will leverage: - APIs such as Skyscanner, Google Places, and TripAdvisor to source real-time information on flights, accommodations, and activities. - Go programming language to manage data integration, API connections, and backend development. - Basic machine learning libraries to implement AI-driven itinerary suggestions tailored to family needs and preferences.


    Explore simple and distro indipendent declarative Linux starting on Tumbleweed or Arch Linux by janvhs

    Description

    Inspired by mkosi the idea is to experiment with a declarative approach of defining Linux systems. A lot of tools already make it possible to manage the systems infrastructure by using description files, rather than manual invocation. An example for this are systemd presets for managing enabled services or the /etc/fstab file for describing how partitions should be mounted.

    If we would take inspiration from openSUSE MicroOS and their handling of the /etc/ directory, we could theoretically use systemd-sysupdate to swap out the /usr/ partition and create an A/B boot scheme, where the /usr/ partition is always freshly built according to a central system description. In the best case it would be possible to still utilise snapshots, but an A/B root scheme would be sufficient for the beginning. This way you could get the benefit of NixOS's declarative system definition, but still use the distros package repositories and don't have to deal with the overhead of Flakes or the Nix language.

    Goals

    • A simple and understandable system
    • Check fitness of mkosi or write a simple extensible image builder tool for it
    • Create a declarative system specification
    • Create a system with swappable /usr/ partition
    • Create an A/B root scheme
    • Swap to the new system without reboot (kexec?)

    Resources

    • Ideas that have been floating around in my head for a while
    • https://0pointer.net/blog/fitting-everything-together.html
    • GNOME OS
    • MicroOS
    • systemd mkosi
    • Vanilla OS