VM Modifications for SecureSqueak
This page describes the changes needed to be made to Squeak to make it secure.
Porting to other platforms.
Idea: It is possible to sacrifice speed for easier porting to other VMs.
The compiler could be modified to use different classes for special classes such as Array, SmallInteger, and so forth. These special classes would not have any special treatment by the VM, so the VM would not need modifying. These classes are normally literals of CompiledMethods.
SmallInteger could be replaced with a wrapper class of the same name.
Associations and Points could be replaced with normal classes, and the bytecodes that deal with them could be disallowed.
String, Symbol and so forth could have their own implementations.
Class, ClassDescription and Behavior are more difficult. These will need to be investigated on a per-Smalltalk basis. The >>class method could be overridden completely by the compiler to return a wrapper of a class, but the actual class would be inaccessable to everything but developer tools. This class wrapper could define >>new to return a new class instance by forwarding it to the actual class. A reference to the class wrapper would be stored in... er... the object? Hmm. This would mean that every object, including very basic ones, would have at least instance variables for a dominion and a class wrapper.
The compiler would need to generate an AST or other intermediate format. This would be the same format that could be sent over a network. This AST would be quickly compiled at the destination. Each Smalltalk dialect would have its own AST compiler.
I'm looking at the NewCompiler - it has an "IR" (intermediate representation) which closely resembles bytecodes. This looks like an ideal format to send code over the network. The IRTranslator transforms an IRMethod into a CompiledMethod. The IRMethod can be augmented to contain useful meta-information. A package could contain IRMethods which are converted into classes.
So:
Package contains IRMethods (or something else contains them??)
The ability to convert IRMethods to be executable should be managed by a capability. The developer tools and the remote code loader in DPON would have access to this capability.
Platform requirements
In order for SecureSqueak to run on a particular platform, the platform needs to provide:
- Basic objects with garbage collection: integers, floats, arrays (variable size?).
- Method dispatch by Symbol and superclass lookup. VMs such as Java VMs would have difficulty with this.
- An exceptions mechanism?
- Able to: capture messages, override >>class and >>==, override BlockContext methods.
- Code definition API, to convert IRMethods to executable classes and methods. These are not necessarily Class and CompiledMethod.
- Subcanvas API.
- Networking API.
- Perhaps sound, movie decoding, OpenGL APIs?
- Persistence mechanism (commit, save image, ...).
The Smalltalk-80 Bytecodes
- Make sure the interpreter doesn't go into a loop when searching for a superclass when there's a cycle in the inheritance hierarchy.
- General hardening: always ensure that an instvar exists before accessing it, ensure that a methodDict is of class MethodDictionary, a compiled method is of class CompiledMethod etc.
- When no doesNotUnderstand: message handler can be found, throw an exception to the current dominion rather than crash.
- Unhandled exceptions should crash the VM; there should always be a top-level exception handler in the image.
- Move dangerous prims such as #asOop out of Object and into a capability.
- Better support for message capture (from Spoon perhaps?).
- Support for fair scheduling per dominion.
- Try not to die when a serious failure occurs.
- Perhaps... adding an instance variable to every object for the dominion?
- Perhaps... better VM<->image interaction: GC info/callbacks available, logging info sent from VM to image, etc?
I will need to modify and harden the Squeak VM to run untrusted bytecodes.
The changes are:
- Make sure that stack operations don't go past the bottom of the stack (for the currently executing method).
- Make sure that there isn't leftover rubbish on the stack at the end of execution.
- Make sure jumps don't jump out of the current method.
- Make sure jumps can't jump into the middle of a multi-byte bytecode.
- Make sure that the method is terminated properly so that execution doesn't continue past the end.
- Make sure the active context is not visible to the method (bytecode 137).
Open question: can the state of the stack be predicted by a verifier? The verifier would need to trace through the code.
The following situation could leave an unknown number of items on the stack:
1 push something.
2 send a message to compare two things
3 pop and jump if true to 5.
4 jump to 1
5 ...continue
A code tracer would need to analyse instructions 2 and 3 to determine how many times the loop would iterate. This bears a resemblence to the halting problem.
Perhaps the above code would never be generated by a compiler? TODO: are pushes and pops able to be generated in a loop by the compiler? Does the compiler guarantee that a loop will do as many pops as it does pushes?
0-15 0000iiii Push Receiver Variable #iiii
16-31 0001iiii Push Temporary Location #iiii
32-63 001iiiii Push Literal Constant #iiiii
64-95 010iiiii Push Literal Variable #iiiii
96-103 01100iii Pop and Store Receiver Variable #iii
104-111 01101iii Pop and Store Temporary Location #iii
These will all need checking to make sure that they write or read existant locations. This can be checked by a code verifier.
The size of the stack needs to be checked after the method completes to make sure no entries are left. This may be achievable by a code verifier doing a... data flow analysis? Alternatively, the execution stack could consist of frames, and each frame could have its own stack which these instructions manipulate (?). A lower bounds check would need to be implemented in either case. An upper bounds check would be needed for resource control.
The VM must check to make sure the stack isn't empty. I suspect it doesn't check for this.
...unless the compiler never generates a stack pop or push in a loop?
112-119 01110iii Push (receiver, true, false, nil, -1, 0, 1, 2) [iii]
120-123 011110ii Return (receiver, true, false, nil) [ii] From Message
124-125 0111110i Return Stack Top From (Message, Block) [i]
These are benign except for items left on the stack at completion.
The return message needs investigating. How does it know where to return to; is the calling context also pushed onto the stack? If that is the case, then it must not be accessable by other stack operations.
126-127 0111111i unused
Knowing Squeak, these won't be unused. Invalid bytecodes will need to be looked into.
128 10000000 jjkkkkkk Push (Receiver Variable, Temporary Location, Literal Constant, Literal Variable) [jj] #kkkkkk
129 10000001 jjkkkkkk Store (Receiver Variable, Temporary Location, Illegal, Literal Variable) [jj] #kkkkkk
Refers to the top of the stack.
130 10000010 jjkkkkkk Pop and Store (Receiver Variable, Temporary Location, Illegal, Literal Variable) [jj] #kkkkkk
131 10000011 jjjkkkkk Send Literal Selector #kkkkk With jjj Arguments
132 10000100 jjjjjjjj kkkkkkkk Send Literal Selector #kkkkkkkk With jjjjjjjj Arguments
133 10000101 jjjkkkkk Send Literal Selector #kkkkk To Superclass With jjj Arguments
Should be okay if superclass is valid. If a DNU occurs, then don't halt the VM!
134 10000110 jjjjjjjj kkkkkkkk Send Literal Selector #kkkkkkkk To Superclass With jjjjjjjj Arguments
135 10000111 Pop Stack Top
We need to make sure that a method doesn't pop stuff off the stack that doesn't belong to it.
136 10001000 Duplicate Stack Top
Ditto - when returning, we need to make sure only the return value is sitting on the stack. Perhaps we could make a special stack just for a method's invocation?
137 10001001 Push Active Context
We don't want untrusted code seeing the active context, right? Better to push a proxy to it, or have the active context check who the sender is.
138-143 unused
These are probably used in Squeak.
144-151 10010iii Jump iii + 1 (i.e., 1 through 8)
152-159 10011iii Pop and Jump 0n False iii +1 (i.e., 1 through 8)
These two are only a problem if they occur near the beginning or the end of a method. They can be checked for by a verifier.
All jump instructions have the destination hard-coded, meaning that they can be verified easily by a verifier before the method is executed.
160-167 10100iii jjjjjjjj Jump(iii - 4) *256+jjjjjjjj
168-171 101010ii jjjjjjjj Pop and Jump On True ii *256+jjjjjjjj
172-175 101011ii jjjjjjjj Pop and Jump On False ii *256+jjjjjjjj
This really needs bounds checking, but only if it occurs near the end (or the beginning?) of the method. This case can be checked for by a code sanity checker.
The last two are also stack operations which need to be checked to make sure the stack is sane.
176-191 1011iiii Send Arithmetic Message #iiii
192-207 1100iiii Send Special Message #iiii
208-223 1101iiii Send Literal Selector #iiii With No Arguments
224-239 1110iiii Send Literal Selector #iiii With 1 Argument
240-255 1111iiii Send Literal Selector #iiii With 2 Arguments
If these use the stack, then the stack has to be left in a sane state. Also, objects popped from the stack must be valid object references and not SmallIntegers.
The literal selectors must have their bounds checked.
Primitive methods
Most of the operations affect the stack; sanity will need checking. I assume these would all be invoked using the send bytecod
1 SmallInteger +
2 SmallInteger -
3 SmallInteger <
4 SmallInteger >
5* SmallInteger <=
6* SmallInteger >=
7 SmallInteger =
8* SmallInteger ~=
9 SmallInteger *
10* SmallInteger /
11* SmallInteger \\
12* SmallInteger //
13 SmallInteger quo:
14 SmallInteger bitAnd:
15 SmallInteger bitOr:
16 SmallInteger bitXor:
17 SmallInteger bitShift:
18* Number @
19
20
21* Integer +, LargePositiveInteger +
22* Integer - , LargePositiveInteger -
23* Integer < , LargePositiveInteger <
24* Integer > , LargePositiveInteger >
25* Integer <= , LargePositiveInteger <=
26 Integer >= , LargePositiveInteger >=
27* Integer = ,LargePositiveInteger =
28* Integer ~= , LargePositiveInteger ~=
29* Integer * , LargePositiveInteger *
30* Integer / , LargePositiveInteger /
31* Integer \\ , LargePositiveInteger \\
32* Integer // , LargePositiveInteger //
33* Integer quo: , LargePositiveInteger quo:
34* Integer bitAnd:, LargePositiveInteger bitAnd:
35* Integer bitOr: , LargePositiveInteger bitOr:
36* Integer bitXor: , LargePositiveInteger bitXor:
37* Integer bitShift: , LargePositiveInteger bitShift:
The operations above all pop two elements off the stack and push the result. Stack sanity would need checking.
38
39
40 SmallInteger asFloat
This pops a SmallInteger and pushes a float; stack bounds need checking.
41 Float +
42 Float -
43 Float <
44 Float >
45* Float <=
46* Float >=
47 Float =
48* Float ~=
49 Float *
50 Float /
51 Float truncated
52* Float fractionPart
53* Float exponent
54* Float timesTwoPower:
Ditto for stack manipulations.
55
56
57
58
59
60 LargeNegativeInteger digitAt:, LargePositiveInteger digitAt:, Object at:, Object basicAt:
61 LargeNegativeInteger digitAt:put:, LargePositiveInteger digitAt:put:, Object basicAt:put:, Object at:put:
Obviously these should be moved out and into a separate class which keeps these primitives safely locked away as capabilities.
62 ArrayedCollection size, LargeNegativeInteger digitLength, LargePositiveInteger digitLength, Object basicSize, Object size, String size
63 String at:, String basicAt:
64 String basicAt:put:, String at:put:
Assuming that Strings are mutable. These should not be applyable to Symbols.
65* ReadStream next, ReadWriteStream next
66* WriteStream nextPut:
67* PositionableStream atEnd
68 CompiledMethod objectAt:
69 CompiledMethod objectAt:put:
These two should be safe provided that untrusted code does not have access to compiled methods. It would be good to be consistent with other at:put: implementations and put this functionality in a capability.
70 Behavior basicNew, Behavior new, Interval class new
71 Behavior new:, Behavior basicNew:
These will be subject to dominion constraints.
72 Object become:
73 Object instVarAt:
74 Object instVarAt:put:
Move these to a capability.
75 Object asOop, Object hash, Symbol hash
76 SmallInteger asObject, SmallInteger asObjectNoFail
Object asOop could be a security concern if there is an unnoticed security hole elsewhere; asOop could be used to forge a reference if a hole is found elsewhere.
77 Behavior someInstance
78 Object nextInstance
Move these to a capability.
79 CompiledMethod class newMethod:header:
See CompiledMethod methods further above.
80* ContextPart blockCopy:
81 BlockContext value:value:value:, BlockContext value, BlockContext value:, BlockContext value:value:
82 BlockContext valueWithArguments:
Benign? More thought needed.
83* Object perform:with:with:with:, Object perform:with:, Object perform:with:with:, Object perform:
84 Object perform:withArguments:
These are required and equivalent to normal message sends.
85 Semaphore signal
86 Semaphore wait
87 Process resume
88 Process suspend
These are okay.
89 Behavior flushCache
Could this be perhaps used to deny service?
90* InputSensor primMousePt, InputState primMousePt
91 InputState primCursorLocPut: ,InputState primCursorLocPutAgain:
92 Cursor class cursorLink:
93 InputState primInputSemaphore:
94 InputState primSampleInterval:
95 InputState primInputWord
TODO?
96 BitBlt copyBitsAgain, BitBlt copyBits
Subject to dominion constraints.
97 SystemDictionary snapshotPrimitive
Should be a capability, or access to SystemDictionary is denied.
98 Time class secondClockInto:
99 Time class millisecondClockInto:
100 ProcessorScheduler signal:atMilliseconds:
101 Cursor beCursor
102 DisplayScreen beDisplay
103* CharacterScanner scanCharactersFrom:to:in:rightX:stopConditions:displaying:
104* BitBlt drawLoopX:Y:
105* ByteArray primReplaceFrom:to:with:startingAt:, ByteArray replaceFrom:to:withString:startingAt:, String replaceFrom:to:withByteArray:startingAt:, String primReplaceFrom:to:with;startingAt:
Assuming ByteArrays and Strings are mutable.
106
107
108
109
110 Character =, Object ==
111 Object class
"Object class" has proven problematic when implementing message proxies.
112 SystemDictionary coreLeft
113 SystemDictionary quitPrimitive
114 SystemDictionary exitToDebugger
115 SystemDictionary oopsLeft
116 SystemDictionary signal:atOopsLeft:wordsLeft:
These are okay provided that access to SystemDictionary is denied.
Scheduler modifications.
The code which chooses which processes to schedule could be written in Smalltalk. When the VM has an interrupt, it can transfer control to the scheduler object which then decides which other process to run. Once the behaviour of the schedular is ideal, it can be hard-coded into the VM for speed.
Effectively, the VM would invoke a method on a scheduler object which returns the next process to run.
Scheduler objects would be collections of processes. Each CPU would have one scheduler object (which maintains process-CPU affinity). Each CPU can ask it's own scheduler object for the next process to run. If desired, these scheduler objects could be swapped in at runtime to experment with different scheduling algorithms (and crash the system!).
When a new process is created, all schedulers could be searched and the scheduler that has the least number of active processes would be assigned that process.
A CPU is thought of as a worker, and processes are though of as tasks. If a CPU becomes idle, it can steal idle work from another CPU. Otherwise, it will cycle through it's own processes or go idle.
The collection of processes on a CPU would be some sort of sorted collection:
- Resources are allocated to dominions. A dominion with one process will be allocated as much CPU time as a dominion with 1000 processes.
- Dominions (or processes in dominions) have an upper and lower bound on CPU speed. The lower bound is a guarantee of CPU resources; if this cannot be met then some error will occur (e.g. an exception?). The upper bound is used when the dominion wants to be polite to other users, or when a programmer is doing slow speed simulations on a fast computer.
- Processes are never starved of CPU - all processes still get some CPU time.
- Interactivity is important. If some interrupt occurs, the handler of that interrupt will be scheduled next. However, this must not starve other processes.
- Dominions may have a CPU quota that will eventually cause that dominion to be halted or purged if its allocated CPU resource is consumed.
Reading material
http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#88597
Comments (0)
You don't have permission to comment on this page.