SUSE Hack Week: Automatically guess changelog entries for Haskell packages from upstream

All our packages have a *.changes file and ideally that file would mention relevant changes from upstream, i.e. during version updates. Currently, we find that information manually when packaging Haskell software. This is unsatisfactory because of the sheer size of the package set and the massive number of updates we receive from Hackage. The packaging itself is completely automatic, but editing the *.changes files is not and this issue prevents us from shipping more packages in Tumbleweed and elsewhere.

Now, during this Hackweek I'd like to write code that given two Haskell package versions old and new can determine the upstream changelog entries between those two and create a pleasantly formatted entry for the *.changes file that won't need any manual editing to pass the review team. Then I'd like to integrate this into cabal2obs, or maybe even cabal2spec.

Useful links:

Looking for hackers with the skills:

haskell

This project is part of:

Hack Week 17

Activity

almost 7 years ago: psimons started this project.

about 7 years ago: calmeidadeoliveira liked this project.

about 7 years ago: mimi_vx liked this project.

about 7 years ago: psimons added keyword "haskell" to this project.

about 7 years ago: psimons originated this project.

Comments

almost 7 years ago by psimons | Reply

My Hackweek project was to automates the generation of *.changes files for package updates, especially the part where reviewers expect us to copy the relevant bits from upstream's ChangeLog file into the record. Now, that worked out nicely. Let's assume that you'd want to update ghc-aeson version 1.2.4.0 to 1.3.0.0. Then you can simply extract their respective release tarballs and run this command:
```
$ guess-changelog aeson-1.2.4.0/ aeson-1.3.0.0/
### 1.3.0.0

Breaking changes:
* `GKeyValue` has been renamed to `KeyValuePair`, thanks to Xia Li-yao
* Removed unused `FromJSON` constraint in `withEmbeddedJson`, thanks to Tristan Seligmann

Other improvements:
* Optimizations of TH toEncoding, thanks to Xia Li-yao
* Optimizations of hex decoding when using the default/pure unescape implementation, thanks to Xia Li-yao
* Improved error message on `Day` parse failures, thanks to Gershom Bazerman
* Add `encodeFile` as well as `decodeFile*` variants, thanks to Markus Hauck
* Documentation fixes, thanks to Lennart Spitzner
* CPP cleanup, thanks to Ryan Scott
```
The function
```
guessChangelog :: FilePath -&gt; FilePath -&gt; IO (Either GuessedChangelog Text)
```
implements this detection using the following algorithm:

1) Scan both directories for files that look like they might be change logs.

2) If both directories contain the same candidate file, e.g. "ChangeLog", then use that.

3) "diff" the change log file and check that all modifications are additions at the top of the file.

4) Return those additions as Text. :-)

Of course, things don't always work out this nicely in practice and guessChangelog might fail with any of the following GuessedChangelog constructors:
- NoChangelogFiles: Neither release tarballs contains a change log file.
- UndocumentedUpdate FilePath: A change log file exists (and its name is returned), but it's the same in both tarballs. In other words, upstream probably forgot to document the release.
- NoCommonChangelogFiles (Set FilePath) (Set FilePath): Both tarballs contain a set of files that look like they might be a change log, but their intersection is empty! This happens when upstream has renamed the file, for example.
- MoreThanOneChangelogFile (Set FilePath): Multiple change log files exists in both directories. Now, it would probably work out okay if we'd just look at the diffs of both of them, respectively, but it felt like a good idea to err on the side of caution. This case is rare anyways.
- UnmodifiedTopIsTooLarge FilePath Int: guessChangelog accepts up to 10 lines of unmodified text at the top of the upstream change log file because some people like to have a short introduction text there etc. If that header becomes too large, however, then the tool returns this error because we expect upstream to add text at the top, not in the middle of the file.
- NotJustTopAdditions FilePath: This happens when upstream edit the file in ways other than just adding at the top. Sometimes people re-format old entries or rewrite URLs or fix typos, and in such a case it feels to risky to trust the diff.
The actual code currently lives the cabal2obs utility, , but I intend to package it up properly so that other people can use it, too, since there's nothing inherently Haskell-specific about it.

My Hackweek project was to automates the generation of *.changes files for package updates, especially the part where reviewers expect us to copy the relevant bits from upstream's ChangeLog file into the record. Now, that worked out nicely. Let's assume that you'd want to update ghc-aeson version 1.2.4.0 to 1.3.0.0. Then you can simply extract their respective release tarballs and run this command:

$ guess-changelog aeson-1.2.4.0/ aeson-1.3.0.0/
### 1.3.0.0

Breaking changes:
* `GKeyValue` has been renamed to `KeyValuePair`, thanks to Xia Li-yao
* Removed unused `FromJSON` constraint in `withEmbeddedJson`, thanks to Tristan Seligmann

Other improvements:
* Optimizations of TH toEncoding, thanks to Xia Li-yao
* Optimizations of hex decoding when using the default/pure unescape implementation, thanks to Xia Li-yao
* Improved error message on `Day` parse failures, thanks to Gershom Bazerman
* Add `encodeFile` as well as `decodeFile*` variants, thanks to Markus Hauck
* Documentation fixes, thanks to Lennart Spitzner
* CPP cleanup, thanks to Ryan Scott

The function

guessChangelog :: FilePath -&gt; FilePath -&gt; IO (Either GuessedChangelog Text)

implements this detection using the following algorithm:

1) Scan both directories for files that look like they might be change logs.

2) If both directories contain the same candidate file, e.g. "ChangeLog", then use that.

3) "diff" the change log file and check that all modifications are additions at the top of the file.

4) Return those additions as Text. :-)

Of course, things don't always work out this nicely in practice and guessChangelog might fail with any of the following GuessedChangelog constructors:

NoChangelogFiles: Neither release tarballs contains a change log file.
UndocumentedUpdate FilePath: A change log file exists (and its name is returned), but it's the same in both tarballs. In other words, upstream probably forgot to document the release.
NoCommonChangelogFiles (Set FilePath) (Set FilePath): Both tarballs contain a set of files that look like they might be a change log, but their intersection is empty! This happens when upstream has renamed the file, for example.
MoreThanOneChangelogFile (Set FilePath): Multiple change log files exists in both directories. Now, it would probably work out okay if we'd just look at the diffs of both of them, respectively, but it felt like a good idea to err on the side of caution. This case is rare anyways.
UnmodifiedTopIsTooLarge FilePath Int: guessChangelog accepts up to 10 lines of unmodified text at the top of the upstream change log file because some people like to have a short introduction text there etc. If that header becomes too large, however, then the tool returns this error because we expect upstream to add text at the top, not in the middle of the file.
NotJustTopAdditions FilePath: This happens when upstream edit the file in ways other than just adding at the top. Sometimes people re-format old entries or rewrite URLs or fix typos, and in such a case it feels to risky to trust the diff.

The actual code currently lives the cabal2obs utility, , but I intend to package it up properly so that other people can use it, too, since there's nothing inherently Haskell-specific about it.

# A First Level Header
## A Second Level Header

Use one asterisk to *emphasize*

Use two asterisks for **strong emphasis**

- Use hyphens
- for unordereed
- lists

This is an [link to example.com](http://example.com/)

This is an image ![an openSUSE geeko icon](https://en.opensuse.org/images/d/d0/Icon-distribution.png)

This is a user link @hans

This is a project link hw#some-cool-title

More Complex Markdown Help

Formatting Help

Similar Projects

This project is one of its kind!

Looking for hackers with the skills:

This project is part of:

Activity

Comments

almost 7 years ago by psimons | Reply

Similar Projects