Parsing html with lxml

I had to retrieve a list of repositories from my mirror. It was just work for python-lxml library!

from __future__ import print_function # only with python2
from lxml import html as lhtml
from urllib import urlopen
baseurl = 'http://my.mirror/path/'

html = lhtml.parse(urlopen(baseurl))
# get something like  ...
folders = html.findall('//td/a') 
header = folders.pop(0),  # python 3 supports:  header, *folders = html.findall('//td/a')

for f in folders:
    print(baseurl, f.attrib['href'], sep="/")

One git to bring them all, and in a repo bind them.

I had to reunite various git repos under a new one. To do this without losing logs, I found a stackoverflow hint that worked for me.

# add and get the old repo data
git remote add old_repo git@git.example.com:/foo/
git fetch old_repo

# merge into my master without commit…
git merge -s ours –no-commit old_repo/master

# …we need to relocate in the foo/ subdirectory before
git read-tree –prefix=foo/ -u rack_remote/master

# now… commit!
git commit -m “Imported foo as a subtree.”

The #git log presents files in the old place, so git log foo/ doesn’t work. We can instead
diff between various releases simply with

git diff rev1 rev2 —