I am looking for a C/Python developer position

I am looking for a home based developer position, if possible with free softwares. If you are looking for a C and Python expert (over 5 years experience), I’m your guy! I have been working at EdenWall Technologies for 5 years, in network security, first as a developer, then as project manager (team of six developers). I worked on the NuFW firewall (based on Netfilter) and many firewall configuration tools (managing the firewall rules, networks, VPN, web proxy, etc.).

I contributed to many open source projects, but lately I have been spending most of my hacking time on Python. For example, I fixed a lot of bugs related to Unicode in Python 3.2. Contributing to Python teaches me to work with a team of one hundred developers around the world (explain a solution, review patches, follow commits, etc.). Furthermore, it improves my coding skills in C and Python, and my skills in software quality (writing tests, follow buildbots, etc.).

I am also the author of several open source softwares. Fusil is a fuzzing library including twenty fuzzers. By using it, I found bugs and vulnerabilities in many free softwares (libc, FreeType, ClamAV, libexif, binutils, etc.). Faulthandler is a Python module that displays the traceback when a crash occurs, after a timeout or when a signal is received (SIGUSR1). It runs on all operating systems, from Python 2.5 to 3.3. I recently integrated this module directly into Python 3.3. See also my github and bitbucket profiles.

If you are interested by my profile, contact me by email: victor.stinner@haypocalc.com, I will be available starting July 11th. Have a look at my CV for futher details of my skills.

Posted in Uncategorized | 1 Comment

Birth of the faulthandler project

In 2008, I wrote a proof-of-concept to test that it is possible to continue the execution of a program after a segmentation fault: stack.c. The operating system sends a synchronous SIGSEGV signal to the process which can handle it. Using a long jump, it is possible to continue the execution after an invalid memory read or write. With an alternate stack (see the sigaltstack() function), it is even possible to continue after a stack overflow. The alternate stack is required to be able to execute the signal handler.

Because my proof-of-concept worked as expected, I proposed a patch (issue3999), in september 2008, for Python to convert an evil segmentation fault (usually called “crash” or “fatal error”) to a classic and safe Python exception. I proposed it to improve the security (availability) of Python programs: it is possible to save all documents and display a nice error message before exiting, or even better, just log the error and continue the execution.

But the patch was rejected, because in some cases the long jump may leave some objects in an inconsistent state. In this case, do anything more than exiting is dangerous and can be worse. But someone proposed to just display the Python backtrace before existing, which was already possible indirectly with my patch if the exception was not catched by the program. I was angry because I spent a lot of time on this patch and I was still convinced that my patch was safe. I didn’t write the patch displaying the backtrace because I was unable to dump the Python backtrace in C, especially in a signal handler.

In 2009, I became crazy because of a very annoying bug in Xorg: after a period between 2 and 10 days, I lost my keyboard (“my keyboard is blo…”). I got the bug during 8 months without being able to get any useful information to isolate or understand it. I used two USB keyboards: I disconnected the second keyboard, nothing changed. I tried to keep a list of active applications: I didn’t find any useful information. But I had another problem: I didn’t know if Xorg logged something or not. Xorg doesn’t log messages with the timestamp, and because I never read Xorg logs, I was unable to see if there were new messages or not. So I wrote a simple patch to log messages with the timestamp. After posting it to the freedesktop bugtracker, I found another older patch. I used it to improve mine.

But my Xorg patch was rejected (as the older patch) because it used some functions which are not “signal-safe”. I learnt that a signal handler should only use “signal-safe” functions which are reentrant functions ensuring to be safe in a signal handler. And the list of these functions is short! The main problem was the localtime_r() and strftime() functions. In the GNU libc, these functions are clearly not signal-safe: they change temporary the timezone and use a lock for this.

Thanks to this experience in the signal handler world and my experience in CPython internals, I was able to write a signal handler displaying the Python backtrace: issue8863 (created in may 2010). The first version was naïve, buggy and unsafe:

  • if there was a loop in the frame linked list, the signal handler filled stderr and never finished
  • it used Python high-level functions such as _PyUnicode_AsString() (encode a unicode string to UTF-8) which may allocate memory on the heap using the Python memory allocator (pymalloc)
  • it used the buffered functions to write into stderr (eg. fputs)
  • it displayed the wrong backtrace if the thread causing the fault doesn’t hold the GIL (global interpreter lock)
  • it doesn’t call the previous signal handler (eg. Apport on Ubuntu)
  • It also only caught the SIGSEGV fault

It took me 11 versions to write a safe handler:

  • Limit the backtrace to 100 frames (to avoid unlimited loop)
  • Only use signal-safe functions (eg. write())
  • Only allocate (a few) memory on the stack, not on the heap
  • Use PyGILState_GetThisThreadState() to get the backtrace of the thread that causes the fault, instead of getting the “current” thread
  • Call the previous signal handler. It restores the previous signal handler and gives back the control flow to the program. The program raises again the same fault on the same instruction and so the previous signal handler is called too.

But the patch was rejected because it is not safe. The fault handler writes into the file descriptor 2 which is supposed to be stderr, but it may be a network socket in a server. The fault handler may also cause troubles if Python is embedded in an application. The API to disable the fault handler was not decided. It was also too late for Python 3.2: the beta 2 was already released, and new features are not permitted after this strange (as explained in the Python 3.2 schedule). For all these reasons, my patch cannot be included in Python 3.2. I tried my last chance for Python 3.2 by proposing it a new patch with the fault handler disabled by default. But it was rejected too. Again, I was angry because I spent a lot of time on this new patch.

So I converted the patch to a third-party module. I posted it to the Python package index (PyPI) and announced directly the version 1.0 to the python-announce mailing list for christmas.

A dedicated module is more practical than only a signal handler: I added easily functions to enable and disable the fault handler, and then functions to dump the current backtrace (of the current thread or of all threads). The project is still under development and it can be found on github.com: faulthandler.

Posted in python | Leave a comment

I am writing a book: Programming with Unicode

In past years, I tried twice to write a book. The first time, 5 years ago, an editor contacted me in a hurry, wanting a book explaing how to learn Python by examples, a 300-pages book with many Windows screenshots. I was not available enough, which means not really interested, to be involved in this project. Especially because the book had to be written in less than 4 months! The second time, I got a contract with an editor to write a book about good programming practices, with a focus on Python and free softwares. But I never signed the contract and I never wrote more than the table of contents…

So I decided to write a book without any editor, without stress. I am writing it when I have one or two hours of free time, eg. in the train when I come back from Paris. I chose a subject which is important for me, Programming with Unicode. I also chose this subject because I know that it will help other developers. It is hard to find good and recent information about Unicode for developers. I try to sum up useful information and good practices in the same document.

I started the project in august 2010 with a private Mercurial project on Bitbucket. But I realized that I would be more motivated if I can share it, even if it is incomplete, and so I moved it to a public repository (still in Bitbucket). Recently, I moved to github.com because I now feel more comfortable with git than Mercurial, especially to modify local commits (git rebase -i) before pushing them.

It is written in reStructured text (reST), a plain text format. I prefer plain text than binary format (eg. OpenDocument) because it can be used in a revision control system, like Mercurial or git. I also prefer reST to LaTeX, because its syntax is readable and simpler. The reST format is more powerful than many other wiki-like syntax: it supports references in the document, footnotes and other nice features. Recently, I moved to Sphinx to have a better HTML output and the ePub output format (added to Sphinx 1.0, released in July 2010). Thanks to Sphinx, I split the unique long file (unicode.rst) into one file per chapter. It is easier for me to work on a smaller files, but I had to convert the internal references from `link`_ format to `:ref:link` or `:ref:link <label>`. Except this minor nit, I am very satisfied of Sphinx. It looks like the compilation is also faster because it only recompiles modified files, not all files.

I am writing mainly on two computers: my desktop computer and a laptop. Mercurial and git helped me to work offline (in the train) and resolve conflicts (because I always forgot to resynchronize the repository on both computers).

Today, the PDF is around 29 pages (including the front page and two pages of the table of contents) and the book is distributed under a non-free license: CC BY-NC-SA (NC as noncommercial). You can download the book on github. I will maybe use a free license later, it depends if I choose to sell it or not.

Posted in Uncategorized | 1 Comment

Share my configuration files with Mercurial

Since two years, I get regularly angry because my different computers have not the same configuration, especially for bash, vim and some development tools. An (ex)colleague showed me its central repository to share his configuration files on all of his computers. I decided to do the same. I am ashamed, because it was really easy to do and it is soooo practical!

Browse my configuration files to see my vimrc, gitconfig, etc. I wrote a short Python script (install.py) to install shared configuration files with symbolic links.

I only share my home configuration files (not the /etc directory), but it is enough for me. If a computer needs a specific change, it is not a problem: I edit the configuration file but I don’t commit the change.

I took the opportunity to clean up my configurations files and improve some of them:

  • vim now highlights nonbreaking spaces and tabulations in all modes (with a light grey), not only in C and Python modes
  • vim displays a light red line at column 80: it helps me to make sure that my source code fits in 79 columns. It is a new feature of Vim 7.3 (colorcolumn).
  • mercurial uses colors and a pager for hg diff and hg log. I also enabled rebase extension to update a local repository without having to do a “useless” merge commit, and record to be able to commit only some changes of a file.

I recently added a 2-lines status line in vim, idea taken from a friend configuration (Hobbestigrou’s vimrc), but I am still testing it. I am not sure that it helps more than the default (single line) status bar.

The most important changes are that vim now removes trailing spaces on all of my computers, and that my name and email are correctly configured for Mercurial and git.

Posted in dvcs | Leave a comment

Presentation of the pysandbox project

Created in february 2010, pysandbox is an experimental project trying to create a sandbox for Python. It is based on a earlier project called safelite.py, written by Tav in february 2009. pysandbox uses different security models to get the best security with a small speed overhead. It uses a whitelist for Python builtins and imports, and a blacklist for object attributes (eg. it hides function.func_closure). Read the README file for more information.

It is the first time that I wrote some much tests since the beginning of the project. I am very proud and happy of that. I wrote the tests to prove the security model. The test suite helped me many times in the development to detect regressions. It was also my first Python module written in C. I realized that it is simple and well documented.

pysandbox has a major limitation: it hides most methods to modify a dictionary, eg. dict.update({1: 2}) is forbidden. This limitation comes from a limitation of CPython: the function executing bytecode supposes that the __builtins__ variable is a standard dictionary (or CPython does segfault). pysandbox has to replace __builtins__ by a read only object, because being able to modify __builtins__ permits to escape from the untrusted namespace. Because CPython expects a dictionary, pysandbox has to remove methods to modify a dictionary to create the read only __builtins__ variable. I wrote patches to “fix” CPython 2 and 3, to accept __builtins__ of other types than dictionary, but I didn’t proposed the patch upstream yet. It is still possible to modify a dictionary using dict[key]=value and del dict[key] syntaxes.

pysandbox is used by an IRC bot called fschfsch and written by Tila on the Freenode server, I use it to show examples on the #python-fr channel. pysandbox works with Python 2.5, 2.6 and 2.7. With some hacks, it is possible to use it on Python 3.1 and 3.2. I didn’t commit the hacks because I am not sure yet that they are safe.

Posted in python | 3 Comments

The IPy Python module moved to github.com

In October 2010, I moved the IPy Python module (Python class and tools for handling of IPv4 and IPv6 addresses and networks) from my (old) company website to github.com. I did that to accept more easily new contributions (github is open to anyone, and git allows forks), and to get a working bug tracker (the bug tracker is broken since some months ago).

If I remember correctly, I used git-svn to convert the whole historic from Subversion to git, with the following tutorial: How to convert from Subversion to Git. Now I use directly git.

I published IPy 0.71 at the same time. Since october, I am getting more bug reports and contributions. I suppose that it is related to the migration to github, and so it is positive. I will try to release a new version this month or next month to fix the last reported bug (IPv4-mapped IPv6 address reverseName() failure).

Posted in dvcs, python | Leave a comment

Development of the Hachoir project in 2010

The 20th January 2010, one year and four months after the previous major version, I published the version 1.3 of the Hachoir project (core, parser and metadata). I also published later some minor bugfix releases (of the different modules), and a minor release (of the parser and metadata modules) to support WebM (Google video format).

Main changes in Hachoir in 2010:

  • New Blizzard image (BLP), Palm resource (PRC) and WebM (format based on Matroska) parsers
  • JPEG recognize ICC profile chunks, Java gets a bytecode parser and supports JDK 1.6, and other minor improvements in different parsers
  • Add support of Windows codepages to parse Microsoft Office files
  • Add hachoir-metadata-qt: graphical interface (based on PyQt) to display metadata
  • hachoir-metadata program hides warnings by default
  • hachoir-core doesn’t replace sys.stdout and sys.stderr with a Unicode friendly object if the readline module is loaded, to avoid issues with ipython.
  • Use correctly distutils (create MANIFEST.in files) instead of using setuptools.

I am no more involved in/interested by the project, except to fix some minor bugs. But it looks like nneonneo is still active on the Mercurial repository.

Posted in Uncategorized | Leave a comment