The SMBus standard specifies an address resolution protocol (SMBus ARP.) It has two key features :

  • Handle I2C slave address collisions. If two SMBus slaves would use the same I2C address, ARP lets one of them pick a different address to avoid the address collision.
  • Automatically and reliably identify SMBus slaves. Each SMBus slave supporting ARP has a unique device ID, which can be used to automatically instantiate the right I2C device and subsequently let the needed driver be loaded.

Plan

If implemented properly, sensors-detect would no longer be needed on a number of systems, as all required drivers would get loaded automatically.

There has been some work done on SMBus ARP in the past, but nothing good enough to be integrated in the upstream kernel. Mark D. Studebaker wrote proof-of-concept code in 2002, for kernel 2.2. The code was updated for kernel 2.4 but development stopped in 2005.

More recently, Corentin Labbe has been working on proof-of-concept code for kernel 2.6/3. As I recall the code did not actually work, but maybe it can be used as a base.

Results

Miserable failure. Proper hardware support is rare, and prerequisites aren't met.

Looking for hackers with the skills:

kernel hardware

This project is part of:

Hack Week 10

Activity

  • about 9 years ago: zhigangg joined this project.
  • over 11 years ago: jdelvare started this project.
  • over 11 years ago: duwe liked this project.
  • over 11 years ago: wpreston2 liked this project.
  • over 11 years ago: ptesarik liked this project.
  • over 11 years ago: jdelvare added keyword "hardware" to this project.
  • over 11 years ago: jdelvare added keyword "kernel" to this project.
  • over 11 years ago: jdelvare originated this project.

  • Comments

    • jdelvare
      over 11 years ago by jdelvare | Reply

      Things didn't go as good as I hoped. Firstly my own machine no longer replies to the SMBus ARP address, while I'm almost certain it used to. I can't explain it.

      Then I tried to find a machine on the Suse network that would be suitable, but that kind of information doesn't show in orthos, so finding the right machine was difficult. I finally found "knorr", which does reply to the SMBus ARP address, but uses an HT-1000 south bridge for which we don't support SMBus PEC, which is a prerequisite for SMBus ARP. I don't even know if the chipset supports it, and the datasheet is not publicly available.

      So I gave up on "knorr" and now found "fux" which does reply to the SMBus ARP address and uses the i2c-i801 SMBus driver which implements SMBus PEC support. I can start my tests now.

    • jdelvare
      over 11 years ago by jdelvare | Reply

      The curse goes on. "fux" has an ICH10 south bridge, and testing revealed that PEC (CRC) errors during SMBus block transactions (at least) lock up the SMBus controller. I suspect an undocumented erratum. I managed to find a software workaround, I must discuss it with upstream.

      When testing that software workaround on another chip (ICH5) supported by the i2c-i801 driver, I hit another bug related to PEC error handling. I'm still puzzled by that one, I don't understand what is going on and have no idea how to work around it.

      It should be clear by now that I'm not going to complete my hack week project. It turns out that the world is not yet ready for SMBus ARP. First of all we need proper SMBus PEC support on a wide range of machines. Only when this is available, it will make sense to look into SMBus ARP support again.

    Similar Projects

    early stage kdump support by mbrugger

    Project Description

    When we experience a early boot crash, we are not able to analyze the kernel dump, as user-space wasn't able to load the crash system. The idea is to make the crash system compiled into the host kernel (think of initramfs) so that we can create a kernel dump really early in the boot process.

    Goal for the Hackweeks

    1. Investigate if this is possible and the implications it would have (done in HW21)
    2. Hack up a PoC (done in HW22 and HW23)
    3. Prepare RFC series (giving it's only one week, we are entering wishful thinking territory here).

    update HW23

    • I was able to include the crash kernel into the kernel Image.
    • I'll need to find a way to load that from init/main.c:start_kernel() probably after kcsan_init()
    • I workaround for a smoke test was to hack kexec_file_load() systemcall which has two problems:
      1. My initramfs in the porduction kernel does not have a new enough kexec version, that's not a blocker but where the week ended
      2. As the crash kernel is part of init.data it will be already stale once I can call kexec_file_load() from user-space.

    The solution is probably to rewrite the POC so that the invocation can be done from init.text (that's my theory) but I'm not sure if I can reuse the kexec infrastructure in the kernel from there, which I rely on heavily.

    update HW24

    • Day1
      • rebased on v6.12 with no problems others then me breaking the config
      • setting up a new compilation and qemu/virtme env
      • getting desperate as nothing works that used to work
    • Day 2
      • getting to call the invocation of loading the early kernel from __init after kcsan_init()
    • Day 3

      • fix problem of memdup not being able to alloc so much memory... use 64K page sizes for now
      • code refactoring
      • I'm now able to load the crash kernel
      • When using virtme I can boot into the crash kernel, also it doesn't boot completely (major milestone!), crash in elfcorehdr_read_notes()
    • Day 4

      • crash systems crashes (no pun intended) in copy_old_mempage() link; will need to understand elfcorehdr...
      • call path vmcore_init() -> parse_crash_elf_headers() -> elfcorehdr_read() -> read_from_oldmem() -> copy_oldmem_page() -> copy_to_iter()
    • Day 5

      • hacking arch/arm64/kernel/crash_dump.c:copy_old_mempage() to see if crash system really starts. It does.
      • fun fact: retested with more reserved memory and with UEFI FW, host kernel crashes in init but directly starts the crash kernel, so it works (somehow) \o/
    • TODOs

      • fix elfcorehdr so that we actually can make use of all this...
      • test where in the boot __init() chain we can/should call kexec_early_dump()