gulik

 

VM Modifications

Page history last edited by Michael van der Gulik 11 mos ago

VM Modifications for SecureSqueak

 

This page describes the changes needed to be made to Squeak to make it secure.

 

Porting to other platforms.

 

Idea: It is possible to sacrifice speed for easier porting to other VMs.

 

The compiler could be modified to use different classes for special classes such as Array, SmallInteger, and so forth. These special classes would not have any special treatment by the VM, so the VM would not need modifying. These classes are normally literals of CompiledMethods.

 

SmallInteger could be replaced with a wrapper class of the same name.

Associations and Points could be replaced with normal classes, and the bytecodes that deal with them could be disallowed.

String, Symbol and so forth could have their own implementations.

 

Class, ClassDescription and Behavior are more difficult. These will need to be investigated on a per-Smalltalk basis. The >>class method could be overridden completely by the compiler to return a wrapper of a class, but the actual class would be inaccessable to everything but developer tools. This class wrapper could define >>new to return a new class instance by forwarding it to the actual class. A reference to the class wrapper would be stored in... er... the object? Hmm. This would mean that every object, including very basic ones, would have at least instance variables for a dominion and a class wrapper.

 

The compiler would need to generate an AST or other intermediate format. This would be the same format that could be sent over a network. This AST would be quickly compiled at the destination. Each Smalltalk dialect would have its own AST compiler.

 

I'm looking at the NewCompiler - it has an "IR" (intermediate representation) which closely resembles bytecodes. This looks like an ideal format to send code over the network. The IRTranslator transforms an IRMethod into a CompiledMethod. The IRMethod can be augmented to contain useful meta-information. A package could contain IRMethods which are converted into classes.

 

So:

Package contains IRMethods (or something else contains them??)

  

The ability to convert IRMethods to be executable should be managed by a capability. The developer tools and the remote code loader in DPON would have access to this capability.

 

Platform requirements

 

In order for SecureSqueak to run on a particular platform, the platform needs to provide:

  • Basic objects with garbage collection: integers, floats, arrays (variable size?).
  • Method dispatch by Symbol and superclass lookup. VMs such as Java VMs would have difficulty with this.
  • An exceptions mechanism?
  • Able to: capture messages, override >>class and >>==, override BlockContext methods.
  • Code definition API, to convert IRMethods to executable classes and methods. These are not necessarily Class and CompiledMethod.
  • Subcanvas API.
  • Networking API.
  • Perhaps sound, movie decoding, OpenGL APIs?
  • Persistence mechanism (commit, save image, ...).

 

The Smalltalk-80 Bytecodes

 

  • Make sure the interpreter doesn't go into a loop when searching for a superclass when there's a cycle in the inheritance hierarchy.
  • General hardening: always ensure that an instvar exists before accessing it, ensure that a methodDict is of class MethodDictionary, a compiled method is of class CompiledMethod etc.
  • When no doesNotUnderstand: message handler can be found, throw an exception to the current dominion rather than crash.
  • Unhandled exceptions should crash the VM; there should always be a top-level exception handler in the image.
  • Move dangerous prims such as #asOop out of Object and into a capability.
  • Better support for message capture (from Spoon perhaps?).
  • Support for fair scheduling per dominion.
  • Try not to die when a serious failure occurs.
  • Perhaps... adding an instance variable to every object for the dominion?
  • Perhaps... better VM<->image interaction: GC info/callbacks available, logging info sent from VM to image, etc?

 

I will need to modify and harden the Squeak VM to run untrusted bytecodes.

 

The changes are:

  • Make sure that stack operations don't go past the bottom of the stack (for the currently executing method).
  • Make sure that there isn't leftover rubbish on the stack at the end of execution.
  • Make sure jumps don't jump out of the current method.
  • Make sure jumps can't jump into the middle of a multi-byte bytecode.
  • Make sure that the method is terminated properly so that execution doesn't continue past the end.
  • Make sure the active context is not visible to the method (bytecode 137).

 

 

Open question: can the state of the stack be predicted by a verifier? The verifier would need to trace through the code.

The following situation could leave an unknown number of items on the stack:

1 push something.

2 send a message to compare two things

3 pop and jump if true to 5.

4 jump to 1

5 ...continue

 

A code tracer would need to analyse instructions 2 and 3 to determine how many times the loop would iterate. This bears a resemblence to the halting problem.

 

Perhaps the above code would never be generated by a compiler? TODO: are pushes and pops able to be generated in a loop by the compiler? Does the compiler guarantee that a loop will do as many pops as it does pushes?

 

 

0-15     0000iiii     Push Receiver Variable #iiii

16-31     0001iiii     Push Temporary Location #iiii

32-63     001iiiii     Push Literal Constant #iiiii

64-95     010iiiii     Push Literal Variable #iiiii

96-103     01100iii     Pop and Store Receiver Variable #iii

104-111     01101iii     Pop and Store Temporary Location #iii

 

These will all need checking to make sure that they write or read existant locations. This can be checked by a code verifier.

 

The size of the stack needs to be checked after the method completes to make sure no entries are left. This may be achievable by a code verifier doing a... data flow analysis? Alternatively, the execution stack could consist of frames, and each frame could have its own stack which these instructions manipulate (?). A lower bounds check would need to be implemented in either case. An upper bounds check would be needed for resource control.

 

The VM must check to make sure the stack isn't empty. I suspect it doesn't check for this.

 

...unless the compiler never generates a stack pop or push in a loop?

 

112-119     01110iii     Push (receiver, true, false, nil, -1, 0, 1, 2) [iii]

120-123     011110ii     Return (receiver, true, false, nil) [ii] From Message

124-125     0111110i     Return Stack Top From (Message, Block) [i]

 

These are benign except for items left on the stack at completion.

 

The return message needs investigating. How does it know where to return to; is the calling context also pushed onto the stack? If that is the case, then it must not be accessable by other stack operations.

 

126-127     0111111i     unused

 

Knowing Squeak, these won't be unused. Invalid bytecodes will need to be looked into.

 

128     10000000 jjkkkkkk     Push (Receiver Variable, Temporary Location, Literal Constant, Literal Variable) [jj] #kkkkkk

129     10000001 jjkkkkkk     Store (Receiver Variable, Temporary Location, Illegal, Literal Variable) [jj] #kkkkkk

 

Refers to the top of the stack.

 

130     10000010 jjkkkkkk     Pop and Store (Receiver Variable, Temporary Location, Illegal, Literal Variable) [jj] #kkkkkk

131     10000011 jjjkkkkk     Send Literal Selector #kkkkk With jjj Arguments

132     10000100 jjjjjjjj kkkkkkkk     Send Literal Selector #kkkkkkkk With jjjjjjjj Arguments

133     10000101 jjjkkkkk     Send Literal Selector #kkkkk To Superclass With jjj Arguments

 

Should be okay if superclass is valid. If a DNU occurs, then don't halt the VM!

 

134     10000110 jjjjjjjj kkkkkkkk     Send Literal Selector #kkkkkkkk To Superclass With jjjjjjjj Arguments

 

135     10000111     Pop Stack Top

 

We need to make sure that a method doesn't pop stuff off the stack that doesn't belong to it.

 

136     10001000     Duplicate Stack Top

 

Ditto - when returning, we need to make sure only the return value is sitting on the stack. Perhaps we could make a special stack just for a method's invocation?

 

137     10001001     Push Active Context

 

We don't want untrusted code seeing the active context, right? Better to push a proxy to it, or have the active context check who the sender is.

 

138-143         unused

 

These are probably used in Squeak.

 

144-151     10010iii     Jump iii + 1 (i.e., 1 through 8)

152-159     10011iii     Pop and Jump 0n False iii +1 (i.e., 1 through 8)

 

These two are only a problem if they occur near the beginning or the end of a method. They can be checked for by a verifier.

 

All jump instructions have the destination hard-coded, meaning that they can be verified easily by a verifier before the method is executed.

 

160-167     10100iii jjjjjjjj     Jump(iii - 4) *256+jjjjjjjj

168-171     101010ii jjjjjjjj     Pop and Jump On True ii *256+jjjjjjjj

172-175     101011ii jjjjjjjj     Pop and Jump On False ii *256+jjjjjjjj

 

This really needs bounds checking, but only if it occurs near the end (or the beginning?) of the method. This case can be checked for by a code sanity checker.

 

The last two are also stack operations which need to be checked to make sure the stack is sane.

 

176-191     1011iiii     Send Arithmetic Message #iiii

192-207     1100iiii     Send Special Message #iiii

208-223     1101iiii     Send Literal Selector #iiii With No Arguments

224-239     1110iiii     Send Literal Selector #iiii With 1 Argument

240-255     1111iiii     Send Literal Selector #iiii With 2 Arguments

 

 

If these use the stack, then the stack has to be left in a sane state. Also, objects popped from the stack must be valid object references and not SmallIntegers.

 

The literal selectors must have their bounds checked.

 

 

Primitive methods

 

Most of the operations affect the stack; sanity will need checking. I assume these would all be invoked using the send bytecod

 

1 SmallInteger +

2   SmallInteger -

3   SmallInteger <

4   SmallInteger >

5*  SmallInteger <=

6*  SmallInteger >=

7   SmallInteger =

8*  SmallInteger ~=

9   SmallInteger *

10*     SmallInteger /

11*     SmallInteger \\

12*     SmallInteger //

13  SmallInteger quo:

14  SmallInteger bitAnd:

15  SmallInteger bitOr:

16  SmallInteger bitXor:

17  SmallInteger bitShift:

18*     Number @

19  

20  

21*     Integer +, LargePositiveInteger +

22*     Integer - , LargePositiveInteger -

23*     Integer < , LargePositiveInteger <

24*     Integer > , LargePositiveInteger >

25*     Integer <= , LargePositiveInteger <=

26      Integer >= , LargePositiveInteger >=

27*     Integer =  ,LargePositiveInteger =

28*     Integer ~= , LargePositiveInteger ~=

29*     Integer * , LargePositiveInteger *

30*     Integer / , LargePositiveInteger /

31*     Integer \\ , LargePositiveInteger \\

32*     Integer // , LargePositiveInteger //

33*     Integer quo: , LargePositiveInteger quo:

34*     Integer bitAnd:, LargePositiveInteger bitAnd:

35*     Integer bitOr: , LargePositiveInteger bitOr:

36*     Integer bitXor: , LargePositiveInteger bitXor:

37*     Integer bitShift: , LargePositiveInteger bitShift:

 

The operations above all pop two elements off the stack and push the result. Stack sanity would need checking.

 

38  

39  

40  SmallInteger asFloat

 

This pops a SmallInteger and pushes a float; stack bounds need checking.

 

41  Float +

42  Float -

43  Float <

44  Float >

45*     Float <=

46*     Float >=

47  Float =

48*     Float ~=

49  Float *

50  Float /

51  Float truncated

52*     Float fractionPart

53*     Float exponent

54*     Float timesTwoPower:

 

Ditto for stack manipulations.

 

55  

56  

57  

58  

59  

60  LargeNegativeInteger digitAt:, LargePositiveInteger digitAt:, Object at:, Object basicAt:

61  LargeNegativeInteger digitAt:put:, LargePositiveInteger digitAt:put:, Object basicAt:put:, Object at:put:

 

Obviously these should be moved out and into a separate class which keeps these primitives safely locked away as capabilities.

 

62  ArrayedCollection size, LargeNegativeInteger digitLength, LargePositiveInteger digitLength, Object basicSize, Object size, String size

63  String at:, String basicAt:

64  String basicAt:put:, String at:put:

 

Assuming that Strings are mutable. These should not be applyable to Symbols.

 

65*     ReadStream next, ReadWriteStream next

66*     WriteStream nextPut:

67*     PositionableStream atEnd

 

68  CompiledMethod objectAt:

69  CompiledMethod objectAt:put:

 

These two should be safe provided that untrusted code does not have access to compiled methods. It would be good to be consistent with other at:put: implementations and put this functionality in a capability.

 

70  Behavior basicNew, Behavior new, Interval class new

71  Behavior new:, Behavior basicNew:

 

These will be subject to dominion constraints.

 

72  Object become:

73  Object instVarAt:

74  Object instVarAt:put:

 

Move these to a capability.

 

75  Object asOop, Object hash, Symbol hash

76  SmallInteger asObject, SmallInteger asObjectNoFail

 

Object asOop could be a security concern if there is an unnoticed security hole elsewhere; asOop could be used to forge a reference if a hole is found elsewhere.

 

77  Behavior someInstance

78  Object nextInstance

 

Move these to a capability.

 

79  CompiledMethod class newMethod:header:

 

See CompiledMethod methods further above.

 

80*     ContextPart blockCopy:

81  BlockContext value:value:value:, BlockContext value, BlockContext value:, BlockContext value:value:

82  BlockContext valueWithArguments:

 

Benign? More thought needed.

 

83*     Object perform:with:with:with:, Object perform:with:, Object perform:with:with:, Object perform:

84  Object perform:withArguments:

 

These are required and equivalent to normal message sends.

 

85  Semaphore signal

86  Semaphore wait

87  Process resume

88  Process suspend

 

These are okay.

 

89  Behavior flushCache

 

Could this be perhaps used to deny service?

 

90*     InputSensor primMousePt, InputState primMousePt

91  InputState primCursorLocPut: ,InputState primCursorLocPutAgain:

92  Cursor class cursorLink:

93  InputState primInputSemaphore:

94  InputState primSampleInterval:

95  InputState primInputWord

 

TODO?

 

96  BitBlt copyBitsAgain, BitBlt copyBits

 

Subject to dominion constraints.

 

97  SystemDictionary snapshotPrimitive

 

Should be a capability, or access to SystemDictionary is denied.

 

98  Time class secondClockInto:

99  Time class millisecondClockInto:

100     ProcessorScheduler signal:atMilliseconds:

101     Cursor beCursor

102     DisplayScreen beDisplay

103*    CharacterScanner scanCharactersFrom:to:in:rightX:stopConditions:displaying:

104*    BitBlt drawLoopX:Y:

105*    ByteArray primReplaceFrom:to:with:startingAt:, ByteArray replaceFrom:to:withString:startingAt:, String replaceFrom:to:withByteArray:startingAt:, String primReplaceFrom:to:with;startingAt:

 

Assuming ByteArrays and Strings are mutable.

 

106     

107     

108     

109     

110     Character =, Object ==

111     Object class

 

"Object class" has proven problematic when implementing message proxies.

 

112     SystemDictionary coreLeft

113     SystemDictionary quitPrimitive

114     SystemDictionary exitToDebugger

115     SystemDictionary oopsLeft

116     SystemDictionary signal:atOopsLeft:wordsLeft: 

 

These are okay provided that access to SystemDictionary is denied.

 

 

Scheduler modifications.

 

The code which chooses which processes to schedule could be written in Smalltalk. When the VM has an interrupt, it can transfer control to the scheduler object which then decides which other process to run. Once the behaviour of the schedular is ideal, it can be hard-coded into the VM for speed.

 

Effectively, the VM would invoke a method on a scheduler object which returns the next process to run.

 

Scheduler objects would be collections of processes. Each CPU would have one scheduler object (which maintains process-CPU affinity). Each CPU can ask it's own scheduler object for the next process to run. If desired, these scheduler objects could be swapped in at runtime to experment with different scheduling algorithms (and crash the system!).

 

When a new process is created, all schedulers could be searched and the scheduler that has the least number of active processes would be assigned that process.

 

A CPU is thought of as a worker, and processes are though of as tasks. If a CPU becomes idle, it can steal idle work from another CPU. Otherwise, it will cycle through it's own processes or go idle.

 

The collection of processes on a CPU would be some sort of sorted collection:

 

  • Resources are allocated to dominions. A dominion with one process will be allocated as much CPU time as a dominion with 1000 processes.
  • Dominions (or processes in dominions) have an upper and lower bound on CPU speed. The lower bound is a guarantee of CPU resources; if this cannot be met then some error will occur (e.g. an exception?). The upper bound is used when the dominion wants to be polite to other users, or when a programmer is doing slow speed simulations on a fast computer.
  • Processes are never starved of CPU - all processes still get some CPU time.
  • Interactivity is important. If some interrupt occurs, the handler of that interrupt will be scheduled next. However, this must not starve other processes.
  • Dominions may have a CPU quota that will eventually cause that dominion to be halted or purged if its allocated CPU resource is consumed.

 

Reading material

 

http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#88597

Comments (0)

You don't have permission to comment on this page.