|
Operational Code AuthenticationThis document was originally written and prepared by Jeff Lawson. The last major update occurred in November 1999. Please submit comments and suggestions regarding this document's content directly to him. Background.Although distributed.net has long published almost full source code to its client applications, clients compiled from the public source cannot interact with the networked keyservers, and can only be used to benchmark and test the contest checker cores. Because of this, distributed.net is frequently criticized for this practice, however the people making these accusations rarely understand the full technical implications behind our policy. In reality, distributed.net fully understands the importance and advantages of being open source and is commited toward working toward fully accomplishing this eventual goal. The following list represents some of the major motivating reasons that prevents distributed.net from fully publishing full source code for a fully functional client:
Quite truthfully, releasing binary-only clients still does not completely eliminate the possibility of sabotage, since it is relatively easy for any knowledgeable person to disasemble or patch binaries. This is actually quite a trivial task, so we urge you not to try. Indeed, security through obscurity is actually not secure at all, and we do not claim it to be such. It would be easy for distributed.net to rewrite the simple data obfuscation that is currently done for all buffer files and network traffic, and replace it with something more complex (such as using some public-key encryption or a real block-cipher for data packets). However, doing that would effectively be useless and provide no additional benefit, since it adds no actual mathematical security in our usage model (see the below sections for explanations of why). Any significant architectural changes that are adopted by distributed.net in the future must provide improvements in this area or else there is no gain by using it. Until distributed.net can identify a protocol that provides a secure implementation (that allows the source code to be fully public without endangering the ability to detect and discard results from fraudulant clients), at least a portion of distributed.net's client source code must remain closed to distribution. The possibilities listed above are not merely unlikely scenarios. In fact they have all already occurred in some limited form or another. Our original "v1" clients were readily available in full source code form, but there were many instances of individuals making intentional modifications to the client code to attempt to nullify the entire contest or gain astronomical stats placement for themselves. Intentional spamming in the current "v2" clients does exist, but it is typically restricted to resubmitting the same blocks that were checked once but submitted many times, which we can automatically detect and ignore. There are currently a few technical issues that prevent the existence of a fully open-source distributed.net while still allowing us to verify/ensure the operational capacity of all clients. Some of these major issues are highlighted here, as well as some of our evaluations of these issues. There are many other remaining issues that usually are classifiable as a subset of one of these problems. It is highly suspected that all of these problems are not fully solvable in a fully secure manner. OPERATIONAL CODE AUTHENTICATION.We have programs that perform a known behavior (test all possible keys within a sequence), and produce output that is dependent upon the known behavior (out buffer files). It would be desirable to be able to determine if the output of a given instance of our program that was running on an "untrusted host" was in fact generated by an instance of our program with the same behavior. When running on an "untrusted host", it cannot be guaranteed that a given program is running without modifications to it that could alter its expected behavior in a way that could produce valid-but-inaccurate output. For distributed.net's clients, valid-but-inaccurate output would mean that the client would claim to have completed the work assigned to it, when it has in fact not done so at all. It would be desirable to have tolerance of a limited scope of modifications to the client, such as allowing cosmetic or functional changes that do not alter the work-completion expectation. Unfortunately, since the code is running on an untrusted host, you cannot rely upon safety measures that are implemented entirely on the client side, since the behavior of this checking can be altered or completely bypassed by modifying the code. Examples of types of checks that this includes are:
The open-source networked game "Netrek" allows for trusted binary-only distribution of "RSA-blessed" clients so that cheating behavior can be eliminated/reduced. Although anyone can compile a fully operational client using the publicly distributed source, it will not be RSA-blessed and some public Netrek servers may have been configured to disallow unblessed clients. What differentiates a blessed client is a compile-time directive and the inclusion of a private RSA (or sometimes PGP) key within the compiled binary. The corresponding public half of the key must then be given to each Netrek server operator so that it can be added to his server's list of explicitly allowed clients. Server operators accept public keys based on their perceived trustworthiness of the person compiling the client to keep the private key confidential. The generated public keys have an embedded expiration date (typically 6 months or so) that is verified by the server on client connect, after which point expired clients are rejected. Each new client revision that an author compiles and releases uses an independent private/public key pair. By using independent keys for each revision, when the compromization of a specific client version is detected, that specific client public key can be manually removed from the explicit-allowed list on all servers. Note that because private keys must be embedded within the blessed client binaries, it is a simple task to extract the key with a debugger/disassembler so that it can be reused for the compilation of a cheating client. Also, nothing prevents tampering with a blessed binary to make it perform cheating actions, also requiring use of a debugger/disassembler. The short-duration expiration also limits the significance of a single compromised private key and the practicality of replicating disassembly tasks for every expiration period. CODE AUTHENTICATION VIA "DUPLICATED VERIFICATION".A possible method to work around the Operational Code Authentication problem that would be usable for secure detection of tampered clients involves duplicated work verification. In this proposal, instead of distributing distinct work assignments to each client, some amount of duplicated assignments will be performed. Then, the returned results from the multiple clients are compared for equality, and rejected if there is a discrepancy. This method makes a number of assumptions:
Currently, distributed.net's clients return only a Boolean indicator (whether a possible solution was found, with True results being extremely rare), which breaks assumption #2. This lack of distinctiveness in the typical client output makes it impossible to attempt to use independent result verification as a method of checking. For duplicated verification to be a viable possibility for distributed.net, our existing clients must be made to produce additional distinctiveness in addition to the Boolean result. Distributed.net has made several evaluations for possible ways to introduce the generation of distinctiveness into our RC5-64/DES/CSC cores by collecting "residual" data, however all methods we have considered has introduced a significant overhead (an estimated 15% slowdown in client key rate has been predicted). The most viable method that we have currently considered is to keep a running CRC32 value of the least-significant-byte of each resulting plaintext word. Using only the least-significant-byte is probably sufficient for simple Duplicated Verification, since the crypto algorithms of RC5/DES/CSC work on a minimum of 32-bit words (which is all that d.net clients verify). Typical table-lookup CRC32 implementations require a pass for each 8-bit piece and utilizing the entire 32-bit plaintext output would require 4 iterations to process each of the 8-bit parts. For OGR, a good "residual" data value for comparison is the number of nodes searched within the requested stub. This value is already passed back up to the master and logged in the master log files. Only problem is this value is highly dependent on the core implementation - better algorithms search fewer nodes by better pruning. Other points to keep in mind and consider for the use of Duplicated Verification specifically for distributed.net include:
Note that it is not necessary to distribute all work units multiple times. Duplicated distribution can occur on a relatively infrequent period if it is acceptable for delayed detection of illegitimate clients, since only a single failed Duplicated Verification is necessary to suspect a client of faulty work. Once an illegitimate client is suspected, all previously submitted work units by that host/email could be re-verified and discarded, if necessary. The SETI@Home project supposedly uses a type of Duplicated Verification to detect forged work, however I don't have any further information about their implementation. GIMPS also uses a method of Duplicated Verification based upon the result of the LL sequence (a huge binary number), the lower 64 bits of which is called the "residue"; and is used as the verification data. To prevent submitting bogus first-time results, the official clients are compiled from the public source with the addition of a bit of checksum code that validates the result. If someone were to reverse engineer the checksum code, his or her bogus results would show up sometime later in double-checking. If someone were to submit an abnormally large number of results, there would probably be some spot double-checking going on at the same time (ahead of the normal double checking location). In fact, the entire residue is not kept secret--56 bits of it are public and the lowest 8 bits are kept secret. So it's still possible to do things like running your own independent double checks, checking the public 56 bits. If you submit a double check residue and all 64 bits match, then everything is made public. If the double check residue doesn't match, a triple check is performed. Sometimes none of the three residues of a triple check match--just keep going until you find two that match. The nice thing about the LL test is the final residue is independent of the method you used to get there. CODE AUTHENTICATION VIA "TRUSTED-USER SIGNATURES"Under this proposal, the trust of the code is not what is being authenticated, but actually the trust of the user. Before a user starts using a client, he must obtain a digital private key (for example, by completing an agreement on a web page that generates a private key). All clients deployed by a person must be configured with a valid private key or else if refuses to operate (fails to start if the client test of the general private key validity fails, and fails to transmit the results to the server if the private key is unregistered/unknown). Under this system, the trust and authentication is effectively delegated to the user and the user becomes responsible for not violating the trust of the client. In this environment, if client forgery is ever detected, reported blocks that have been signed by that user can immediately be dismissed (or selectively re-verified through some secondary mechanism, if desired). Note that in this mechanism, there must still be some alternate method to allow forged results to be detected--this scheme only provides a way of easily revoking all results from a user that has been determined to be untrusted. Schemes to define trust inheritance from sub-derived user signatures could potentially be devised, which would allow sub-authorities, reducing the overhead of having a single authority to generate private keys. CODE AUTHENTICATION VIA "SIMULTANEOUS PROBLEM MINIMIZATION" (INLINE TESTING)A solution that Dan Oetting proposed ("inline testing" he calls it) involves having the client simultaneously solve two problems (using the decrypted key results for each pass to not only do an solution equality comparison, but also a constraint minimization). The results communicated back to the server by his clients would not only send back the Boolean result (and the solution if True), but also the result of a separate constraint minimization problem. Since it is a minimization problem, you must evaluate all keys in the block to ensure that you consider all possibilities. Skipping any will possibly invalidate the result of the minimization, which can be detected by the server once the same evaluation is re-performed. These are some points that are significant:
PREVENTION AND DETECTION OF SOLUTION TRANSMISSION SUPRESSION ("Solution Stealing")It is possible for an un-honest user to modify the client so that when a solution is detected, instead of transmitting the solution to distributed.net for final verification and allowing the user to deprive distributed.net (and potentially the winner's team) of the cooperative solution attribution and award. There are several possible ways that this issue can be addressed:
PREVENTION OF LOCAL STORAGE FILE TAMPERING.It would be desirable for our programs to be able to read and write data in flat binary files, but write them in such a way that it can be ensured that no modifications could have possibly been made to the file by the user or by an untrusted program. It will always be possible for modifications to be made to the file, but it should always be possible to detect such changes and ignore/discard them. Don't forget that the program must be able to read and modify its files. Therefore, the algorithm for making sense of them and changing them properly is embedded in the program. A determined attacker is always able to reverse-engineer these algorithms and implement them on his own to achieve the desired results. Adding cryptographic signing to the file for the purposes of detecting tampering is not sufficient, since the program needs to be able to perform legitimate modifications to its files and an illegitimate client could replicate any file operation that can be performed by a legitimate client. Embedded private keys within the client binary can be extracted through manual techniques and reused in illegitimate programs. Note that if it is possible to solve the Operational Code Authentication problem and convince a central key server of its authenticity, it might be possible to use the central key server for a notarization authority to allow it to cryptographically sign data records that are being added to a data file. The public key of the notarization authority can be used to re-verify the legitimacy of data records within the file. Of course, this technique would require a more permanent level of network connectivity, which is what the use of local storage files attempt to eliminate the necessity of.
Projects |
Download |
Discussion |
Documentation |
Research |
Statistics |
.plans
© Copyright distributed.net 1997, 1998, 1999, 2000 - All rights reserved |