Trace http calls with python-requests

Today python-requests is the de-facto standard library for rest calls.

As everything goes on TLS, you can trace api calls with the following:


import httplib as http_client
http_client.HTTPConnection.debuglevel = 1
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

MySQL JSON fields on the ground!

Having to add a series of custom fields to a quite relational application, I decided to try the new JSON fields.

As of now you can:

– create json fields
– manipulate them with json_extract, json_unquote
– create generated fields from json entries

You can not:

– index json fields directly, create a generated field and index it
– retain the original json datatype (eg. string, int), as json_extract always returns strings.

Let’s start with a simple flask app:

# requirements.txt
mysql-connector-python
Flask-SQLAlchemy==2.0
SQLAlchemy>=1.1.3

Let’s create a simple flask app connected to a db.

import flask
import flask_sqlalchemy
from sqlalchemy.dialects.mysql import JSON

# A simple flask app connected to a db
app = flask.Flask('app')
app.config['SQLALCHEMY_DATABASE_URI']='mysql+mysqlconnector://root:secret@localhost:3306/test'
db = flask_sqlalchemy.SQLAlchemy(app)

Add a class to the playground and create it on the db. We need sqlalchemy>=1.1 to support the JSON type!

# The model
class MyJson(db.Model):
    name = db.Column(db.String(16), primary_key=True)
    json = db.Column(JSON, nullable=True)

    def __init__(self, name, json=None):
        self.name = name
        self.json = json

# Create table
db.create_all()

Thanks to flask-sqlalchemy we can just db.session ;)

# Add an entry
entry = MyJson('jon', {'do': 'it', 'now': 1})
db.session.add(entry)
db.session.commit()

We can now verify using a raw select that the entry is now serialized on db

# Get entry in Standard SQL
entries = db.engine.execute(db.select(columns=['*'], from_obj=MyJson)).fetchall()
(name, json_as_string), = first_entry  # unpack result (it's just one!)
assert isinstance(json_as_string, basestring) 

A raw select to extract json fields now:

entries = db.engine.execute(db.select(columns=[name, 'json_extract(json, "$.now")'], from_obj=MyJson)).fetchall()

(name, json_now), = first_entry  # unpack result (it's just one!)
assert isinstance(json_now, basestring) 
assert json_now != entry.json['now']  # '1' != 1 

EuroPython 2015: insoliti incontri.

La diretta da EP continua, con insoliti incontri:

Guido van Rossum, creatore di Python e BDFL, ha rilasciato alcune importanti dichiarazioni: preferisce di gran lunga Dracula a Frankenstain, e preserverà la compatibilità tra python 3 e python 4.
With Guido van Rossum

Armin Rigo sta lavorando ad una nuova versione di pypy che migliora il multi-threading usando una logica Software Transactional Memory: l’accesso concorrente alla memoria viene gestito con un transaction log (eg. come fosse un db) anziché utilizzare dei lock.

Anche le pause caffè sono proficue: uno dei promotori del Barcelona Dojo – che lo scorso anno aveva seguito il mio training – mi ha aiutato a configurare una piattaforma per l’analisi dei log basata su Logstash (parser) -> Elasticsearch (database) -> Kibana (app di visualizzazione).

Non sono mancati poi i training: ho seguito quelli su MongoDB + Flask Web Framework e sulla Data Visualization.

Il talk più interessante e didattico invece, è stato quello sull’implementazione di sistema di configuration management geo-distribuibile basato su Consul.

E la conferenza continua: oggi è l’ultimo giorno di talk, domani e dopodomani ci saranno gli sprint – sessioni di hacking promosse dai maintainer dei vari software.

Slides with code, highlighting with latex

While preparing the slides for EuroPython, I decided to come back to Latex after more than 10 years from my thesis. The reason is that latex modules provide code syntax highlight – something unmanageable with libreoffice.

The `minted` package provides all that with just

\begin{minted}{python}
def your_python(code):
    return here
\end{python}

You can shorten all that with a macro

\begin{pycode}
# and you can even use
# math formulas $\sum_{x=0}^{n}x
in_comments = sum(range(n+1))
\end{pycode}

The hard issue I faced was to mark code (eg. like highlighter marking) because minted allows formatting only into comments. The following didn’t work.

# we wanted the append method to be emphasized, 
a = [1, 2]
a.\emph{append}(3)
# but the output was verbatim!

After some wandering I started from the ground. Minted package uses the python script pygment to convert code into a syntax-highlighted latex page:

#pygmentize -o out.tex -f latex -l python  -O full=true out.py

So it was pygment to refuse interpreting latex formatting into the code. Luckily the latest development branch
Moreover I found in the pygment development branch, a patch allowing escape sequences into pygment.

# setting pipe (|) as an escapeinside character
def now_this(works):
    works.|\emph{append}|(1)
# run with
#pygmentize -o out.tex -f latex -l python -O full=true,escapeinside="||" out.py

After installing the last pygmentize from bitbucket, I just patched minted and enjoy the new syntax directly from latex

% set the pygmentize option in the 
%  latex page!
\begin{pycode*}{escapeinside=||}
def now_this(works):
    works.|\colorbox{yellow}{append}|(1)
\end{pycode}

Reverse engineering included

With ipython you can write a function, like:

prompt [1]# def parse(line):
    ip, host = line.split()
    return "{host} IN PTR {ip}".format(host=host,ip=ip)

To edit our function, just use %edit and reference the line

prompt [2]# %edit 1

Once you modify the function, you cannot reference the newer code with edit, as

prompt [3]# %edit 2

just references “%edit 1” and not the newer code.

In this case we can simply recover the last code of our function with

prompt [4]# from inspect import getsourcelines
prompt [5]# getsourcelines(parse)
(['def parse(line):\n',
'    ip, host = line.split()[:2]\n',
'    return "{host} IN PTR {ip}".format(host=host,ip=ip)\n'],
1)

cx_oracle segfault by default

Playing with the twisted adbapi connecting to oracle. Stressing a bit the application (an sftp server authenticating on oracle) I found that when the connection pool is exhausted the application crashed.

The (simple) solution is to instantiate the pool with the threaded keyword.

dbpool = adbapi.ConnectionPool("cx_Oracle", 
  uri, 
  threaded=True  )

Twisted is an event-based framework: all the blocking calls should be done outside the listening thread. The a(synchronous)dbapi provide separate threads for db connections. Using cx_oracle in a non thread-safe way is a reasonable cause of segfault.

So checking cx_Oracle docs we found that thread safety is off by default to gain some performance.

Parsing html with lxml

I had to retrieve a list of repositories from my mirror. It was just work for python-lxml library!

from __future__ import print_function # only with python2
from lxml import html as lhtml
from urllib import urlopen
baseurl = 'http://my.mirror/path/'

html = lhtml.parse(urlopen(baseurl))
# get something like  ...
folders = html.findall('//td/a') 
header = folders.pop(0),  # python 3 supports:  header, *folders = html.findall('//td/a')

for f in folders:
    print(baseurl, f.attrib['href'], sep="/")