Dropbox, Python y la comprobación de tipos

En programación existen varias discusiones eternas. Una de ellas es la comprobación de tipos. Hay lenguajes más exigentes y (nos dicen) por ello mucho más seguros y robustos y otros mucho menos y (nos dicen) por ello mucho más productivos y eficaces.

En Our journey to type checking 4 million lines of Python nos hablan de Dropbox, y su esfuerzo para mejorar la verificación de tipos en sus programas escritos en Python.

Dropbox is a big user of Python. It’s our most widely used language both for backend services and the desktop client app (we are also heavy users of Go, TypeScript, and Rust). At our scale—millions of lines of Python—the dynamic typing in Python made code needlessly hard to understand and started to seriously impact productivity.

Python es un lenguaje con tipado dinámico (esto es, una variable puede tener un tipo en un momento dado y más adelante tener otro). Eso es cómodo, porque nos quita preocupaciones cuando programamos pero puede ser un inconveniente cuando el código crece y empezamos a no entender lo que sucede.

Once your project is tens of thousands of lines of code, and several engineers work on it, our experience tells us that understanding code becomes the key to maintaining developer productivity. Without type annotations, basic reasoning such as figuring out the valid arguments to a function, or the possible return value types, becomes a hard problem.

¿Qué puede devolver esta función? ¿Cómo debería ser este argumento? ¿Qué puede significar este nombre?

Puede que haya documentación, pero que sea ambigua o poco precisa.

Even if there is a docstring, it’s often ambiguous or imprecise, leaving a lot of room for misunderstandings.

Si definimos los tipos y podemos verificarlos, tenemos algunas ventajas: no poner datos en lugares donde no serán bien recibidos, por ejemplo. Pero también, por ejemplo, poder cambiar el código (refactorizar) más fácilmente:

Refactoring is much easier, as the type checker will often tell exactly what code needs to be changed.

El trabajo se ha desarrollado sobre casi 5 millones de líneas de código y ha permitido mejorar la situación:

It has been a long journey from the early prototypes to type checking 4 million lines in production. Along the way we’ve standardized type hinting in Python,…

La organización ha asumido este trabajo como propio y ya se da por supuesto, pero aún queda camino por recorrer:

Even though type checking is already taken for granted at Dropbox, I believe that we are still in early days of Python type checking in the community, and things will continue to grow and get better

Intersante.

Análisis de amenazas de un sistema de cifrado entre extremos para videoconferencia

Zoom ha sido uno de los vencedores de la pandemia. Es una plataforma de videoconferencias sencilla de usar, que funciona bastante bien (casi todas hoy en día, esa es la verdad) y que para mucha gente ha sido un verdadero hallazgo.

En Trusting Zoom? que no me importan tanto por la plataforma en sí, sino por el análisis que hace de algunas prevenciones que se comentaron durante el inicio de la pandemia. Ya digo que algunas cosas habrán cambiado (y sigue habiendo polémicas sobre la plataforma, al máximo nivel Zoom lied to users about end-to-end encryption for years, FTC says).

Lo primero que dice es que, a pesar de los puntos débiles, la plataforma proporciona un servicio con unos beneficios tan importantes en aquel momento que superarían los inconvenientes, salvo que perjudicasen notablemente a alguien:

In other words, the benefit of using Zoom is considerable, and I have an ethical obligation to do it unless the risks to me, to my students, or to the university are greater.

Prefiere la aplicación porque no se fía de los navegadores:

My reasoning for not using the browser option is a bit different: I don’t trust browsers enough to want one to have the ability to get at my camera or microphone.

Sobre todo porque cualquier fallo le pondría en un compromiso sólo cuando utilizara la aplicación y no el navegador, que lo usa mucho más:

But apart from my serious privacy reservations, flaws in the Zoom app put me at risk while using Zoom, while flaws in a browser put me at risk more or less continuously.

Sus clases, al final son públicas (para su audiencia, e incluso cuelga materiales de manera pública)

But apart from my serious privacy reservations, flaws in the Zoom app put me at risk while using Zoom, while flaws in a browser put me at risk more or less continuously.

Y luego se pregunta si las debilidades de Zoom son suficientemente importantes para evitar su uso:

Are Zoom’s weaknesses sufficiently serious that my university—and I—should avoid it?

Sobre los métodos criptográficos utilizados, sin ser los mejores, parecen suficientes para lo que se está haciendo:

That’s already a substantial part of the answer: I’m not worried about the Andromedan cryptanalysts trying to learn about my students’ personal tragedies. Yes, I suppose in theory I could have as a student someone who is a person of interest to some foreign intelligence agency and this person has a problem that they would tell to me and that agency would be interested enough in blackmailing this student that they’d go to the trouble of cryptanalyzing just the right Zoom conversation—but I don’t believe it’s at all likely and I doubt that you do.

Por otra parte, ese atacante tendría que tener acceso de alguna forma al tráfico de la red:

There’s another part of the puzzle for a would-be attacker who wants to exploit this flaw: they need access to the target’s traffic. […] Routing attacks don’t require a government-grade attacker, but they’re also well up there on the scale of abilities.

Por lo que atacar y sacar partido del cifrado entre extremos sería algo complejo:

What it boils down to is this: exploiting the lack of true end-to-end encryption in Zoom is quite difficult …

Finalmente, destaco que el autor trata de hacer todo su contenido académico público, así que tampoco teme los robos. Puede haber excepciones:

Nothing that I personally do would seem to meet that first criterion—I try to make all of my academic work public as soon as I can—but there are some plausible university activities, e.g., development of advanced biotechnology, where there could be such governmental interest.

Como decía, me pareció una lectura interesante.

Esquema del sistema de ficheros en un sistema Linux

Hace no mucho he descubierto dev.to, una comunidad de desarrolladores que van compartiendo conocimiento e información. Muchas veces, casi a modo de recetas. En esta ocasión traigo Navigating files in Linux que habla del sistema de ficheros de Linux y cómo se organizan algunas cuestiones alrededor de él.

En Linux todo es un fichero:

On Linux, everything on the system is represented with a file—Keyboards, disk drives, robotic arms, running programs and the rest: All files. Naturally, a Linux system needs a lot of files—And a sensible way to organize them.

Y luego pasa a detallar la estructura de un sistema de ficheros y qué podemos esperar encontrar en cada sitio. Interesante.

Si quieres tener la información correcta en el avión, apaga y vuelve a encender

El viejo truco informático, sobre todo en sistemas domésticos o poco comprometidos, es apagar y volver a encender el sistema en cuestión antes de dar ningún otro paso. Eso no siempre es posible, y uno esperaría que en sistemas más profesionales esto no fuera necesario. Sin embargo, y en el apartado de cosas que uno no esperaría que puedan suceder en este curioso artículo Boeing 787s must be turned off and on every 51 days to prevent ‘misleading data’ being shown to pilots nos cuentan que los sistemas informáticos de estos aviones necesitan apagarse y volverse a encender para estar seguros de que mostrarán la información correcta a los pilotos.

Como anécdota personal, en cierta ocasión no funcionaba el sistema de entretenimiento (la pantallita) y la azafata la arregló apagando y volviendo a encender mi ordenadorcito, no sin insistir un poco porque no parecía darle importancia.

Esta necesidad viene en forma de orden de la Administración Federal de Aviación de EEUU:

The US Federal Aviation Administration has ordered Boeing 787 operators to switch their aircraft off and on every 51 days to prevent what it called “several potentially catastrophic failure scenarios” – including the crashing of onboard network switches.

El problema sería que los pilots podrían ver información incorrecta:

According to the directive itself, if the aircraft is powered on for more than 51 days this can lead to “display of misleading data” to the pilots, with that data including airspeed, attitude, altitude and engine operating indications. On top of all that, the stall warning horn and overspeed horn also stop working.

Por devolver la fe sobre estas cosas siempre le podemos echar la culpa en el caso del Boeing a que se trata de sistemas antiguos. El otro día podíamos leer U.S. Air Force Performs First Ever Code Change On A Flying U-2 Spyplane Running Kubernetes. Allí nos explican un caso de un avión espía U-2 actualizando código mediante este sistema de contenedores, en vuelo. Eso sí, no está claro el alcance de la actualización.

We don’t have the whole details here, so the extent of the “update” is not clear (actually, the deployment of a new container in Kubernetes can be seen more like a configuration update than a code change). Anyway, the achievement of the latest milestone proves that the U.S. Air Force is continuing to advance in its program to give its weapons system the ability to leverage the power of containerization.

Interesante.

Actualización (2020-11-03): En otros modelos de la casa, nos enteramos de que Boeing 747s still get critical updates via floppy disks.

Pen Test Partners discovered a 3.5-inch floppy disk drive in the cockpit, which is used to load important navigation databases. It’s a database that has to be updated every 28 days, and an engineer visits each month with the latest updates.

Algoritmos de diferencias y git

Los sistemas de control de versiones se basan en poder calcular de manera eficiente la diferencia entre dos ficheros. En How different are different diff algorithms in Git? hablan justamente de eso.

Identifican tres casos de uso de estas diferencias y comparan sus prestanciones.

From our systematic mapping, we identified three popular applications of diff in recent studies.

Fundamentalmente: recolección de métricas, identificar la introducción de fallos (bugs) y obtener parches (patches). Con dos algoritmos, Myers e Histogram.

In our empirical analyses, we conduct three comparisons based on the most popular usages of git diff found in our mapping study: collecting metrics, identifying bug introduction, and getting patches. We investigate the disagreement between two diff algorithms: Myers and Histogram, and take a manual measurement of their quality in generating the diff lists.

Lectura interesante.