Based on the little-used HTML5 outline spec, investigate&implement an in-browser tool (currently a chrome extension or browser user script) to easily, interactively scrap a documentation web page into an 'index-content' map for (offline) searching.

Motivated by the fact that most scrappers today are command line tools, too tech-savvy.

Currently only target documentation web pages, which are much better structured and so easier to scrap. Also I think these pages benefit most from indexing&scraping.

Looking for hackers with the skills:

web scrapper indexer documentation

This project is part of:

Hack Week 13

Activity

  • about 9 years ago: cxiong added keyword "web" to this project.
  • about 9 years ago: cxiong added keyword "scrapper" to this project.
  • about 9 years ago: cxiong added keyword "indexer" to this project.
  • about 9 years ago: cxiong added keyword "documentation" to this project.
  • about 9 years ago: cxiong started this project.
  • about 9 years ago: cxiong originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    WebUI for your data by avicenzi

    A single place to view every bit of data you have.

    Problem

    You have too much data and you are a data hoarder.

    • Family photos and videos.
    • Lots of eBooks, TV Shows, Movies, and else.
    • Boxes full of papers (taxes, invoices, IDs, certificates, exams, and else).
    • Bank account statements (multiple currencies, countries, and people).

    Maybe you have some data on S3, some on your NAS, and some on your local PC.

    • How do you get it all together?
    • How do you link a bank transaction to a product invoice?
    • How to tag any object type and create a collection out of it (mix videos, photos, PDFs, transactions)?
    • How to store this? file/folder structure does not work, everything is linked together

    Project Description

    The idea is a place where you can throw all your data, photos, videos, documents, binaries, and else.

    Create photo albums, document collections, add tags across multiple file-formats, link content, and else.

    The UI should be easy to use, where the data is not important for now (could be all S3 or local drive).

    Similar proposals

    The closest I found so far is https://perkeep.org/, but this is not what I'm looking for.

    Goal for this Hackweek

    Create a web UI, in Svelte ideally, perhaps React.

    It should be able to show photos and videos at least.

    Resources

    None so far, this is just an idea.


    Uyuni developer-centric documentation by deneb_alpha

    Description

    While we currently have extensive documentation on user-oriented tasks such as adding minions, patching, fine-tuning, etc, there is a notable gap when it comes to centralizing and documenting core functionalities for developers.

    The number of functionalities and side tools we have in Uyuni can be overwhelming. It would be nice to have a centralized place with descriptive list of main/core functionalities.

    Goals

    Create, aggregate and review on the Uyuni wiki a set of resources, focused on developers, that include also some known common problems/troubleshooting.

    The documentation will be helpful not only for everyone who is trying to learn the functionalities with all their inner processes like newcomer developers or community enthusiasts, but also for anyone who need a refresh.

    Resources

    The resources are currently aggregated here: https://github.com/uyuni-project/uyuni/wiki


    Testing and adding GNU/Linux distributions on Uyuni by juliogonzalezgil

    Join the Gitter channel! https://gitter.im/uyuni-project/hackweek

    Uyuni is a configuration and infrastructure management tool that saves you time and headaches when you have to manage and update tens, hundreds or even thousands of machines. It also manages configuration, can run audits, build image containers, monitor and much more!

    Currently there are a few distributions that are completely untested on Uyuni or SUSE Manager (AFAIK) or just not tested since a long time, and could be interesting knowing how hard would be working with them and, if possible, fix whatever is broken.

    For newcomers, the easiest distributions are those based on DEB or RPM packages. Distributions with other package formats are doable, but will require adapting the Python and Java code to be able to sync and analyze such packages (and if salt does not support those packages, it will need changes as well). So if you want a distribution with other packages, make sure you are comfortable handling such changes.

    No developer experience? No worries! We had non-developers contributors in the past, and we are ready to help as long as you are willing to learn. If you don't want to code at all, you can also help us preparing the documentation after someone else has the initial code ready, or you could also help with testing :-)

    The idea is testing Salt and Salt-ssh clients, but NOT traditional clients, which are deprecated.

    To consider that a distribution has basic support, we should cover at least (points 3-6 are to be tested for both salt minions and salt ssh minions):

    1. Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)
    2. Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator)
    3. Package management (install, remove, update...)
    4. Patching
    5. Applying any basic salt state (including a formula)
    6. Salt remote commands
    7. Bonus point: Java part for product identification, and monitoring enablement
    8. Bonus point: sumaform enablement (https://github.com/uyuni-project/sumaform)
    9. Bonus point: Documentation (https://github.com/uyuni-project/uyuni-docs)
    10. Bonus point: testsuite enablement (https://github.com/uyuni-project/uyuni/tree/master/testsuite)

    If something is breaking: we can try to fix it, but the main idea is research how supported it is right now. Beyond that it's up to each project member how much to hack :-)

    • If you don't have knowledge about some of the steps: ask the team
    • If you still don't know what to do: switch to another distribution and keep testing.

    This card is for EVERYONE, not just developers. Seriously! We had people from other teams helping that were not developers, and added support for Debian and new SUSE Linux Enterprise and openSUSE Leap versions :-)

    Pending

    FUSS

    FUSS is a complete GNU/Linux solution (server, client and desktop/standalone) based on Debian for managing an educational network.

    https://fuss.bz.it/

    Seems to be a Debian 12 derivative, so adding it could be quite easy.

    • [W] Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)
    • [W] Onboarding (salt minion from UI, salt minion from bootstrap script, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator) --> Working for all 3 options (salt minion UI, salt minion bootstrap script and salt-ssh minion from the UI).
    • [W] Package management (install, remove, update...) --> Installing a new package works, needs to test the rest.
    • [I] Patching (if patch information is available, could require writing some code to parse it, but IIRC we have support for Ubuntu already). No patches detected. Do we support patches for Debian at all?
    • [W] Applying any basic salt state (including a formula)
    • [W] Salt remote commands
    • [ ] Bonus point: Java part for product identification, and monitoring enablement


    ddflare: (Dynamic)DNS management via Cloudflare API in Kubernetes by fgiudici

    Description

    ddflare is a project started a couple of weeks ago to provide DDNS management using v4 Cloudflare APIs: Cloudflare offers management via APIs and access tokens, so it is possible to register a domain and implement a DynDNS client without any other external service but their API.

    Since ddflare allows to set any IP to any domain name, one could manage multiple A and ALIAS domain records. Wouldn't be cool to allow full DNS control from the project and integrate it with your Kubernetes cluster?

    Goals

    Main goals are:

    1. add containerized image for ddflare
    2. extend ddflare to be able to add and remove DNS records (and not just update existing ones)
    3. add documentation, covering also a sample pod deployment for Kubernetes
    4. write a ddflare Kubernetes operator to enable domain management via Kubernetes resources (using kubebuilder)

    Available tasks and improvements tracked on ddflare github.

    Resources

    • https://github.com/fgiudici/ddflare
    • https://developers.cloudflare.com/api/
    • https://book.kubebuilder.io