As we have seen, during the construction of the PathClassLoader
to be used as an application’s ClassLoader the function dvmJarFileOpen
is used to open an application’s APK.
dvmJarFileOpen
in turn calls dexZipOpenArchive
which calls parseZipArchive
.
The function parseZipArchive
is the equivalent of the java.util.ZipFile
readCentralDirectory
method.
It is responsible for reading the Central Directory and populating a hashtable which maps file names to File Headers.
1.0 parseZipArchive Again
Source: $(ANDROID_SRC)/dalvik/libdex/ZipArchive.cpp
static int parseZipArchive(ZipArchive* pArchive)
{
int result = -1;
const u1* cdPtr = (const u1*)pArchive->mDirectoryMap.addr;
size_t cdLength = pArchive->mDirectoryMap.length;
int numEntries = pArchive->mNumEntries;
/*
* Create hash table. We have a minimum 75% load factor, possibly as
* low as 50% after we round off to a power of 2. There must be at
* least one unused entry to avoid an infinite loop during creation.
*/
pArchive->mHashTableSize = dexRoundUpPower2(1 + (numEntries * 4) / 3);
pArchive->mHashTable = (ZipHashEntry*)
calloc(pArchive->mHashTableSize, sizeof(ZipHashEntry));
/*
* Walk through the central directory, adding entries to the hash
* table and verifying values.
*/
const u1* ptr = cdPtr;
int i;
for (i = 0; i < numEntries; i++) {
if (get4LE(ptr) != kCDESignature) {
ALOGW("Zip: missed a central dir sig (at %d)", i);
goto bail;
}
if (ptr + kCDELen > cdPtr + cdLength) {
ALOGW("Zip: ran off the end (at %d)", i);
goto bail;
}
long localHdrOffset = (long) get4LE(ptr + kCDELocalOffset);
if (localHdrOffset >= pArchive->mDirectoryOffset) {
ALOGW("Zip: bad LFH offset %ld at entry %d", localHdrOffset, i);
goto bail;
}
unsigned int fileNameLen, extraLen, commentLen, hash;
fileNameLen = get2LE(ptr + kCDENameLen);
extraLen = get2LE(ptr + kCDEExtraLen);
commentLen = get2LE(ptr + kCDECommentLen);
/* add the CDE filename to the hash table */
hash = computeHash((const char*)ptr + kCDELen, fileNameLen);
addToHash(pArchive, (const char*)ptr + kCDELen, fileNameLen, hash);
ptr += kCDELen + fileNameLen + extraLen + commentLen;
if ((size_t)(ptr - cdPtr) > cdLength) {
ALOGW("Zip: bad CD advance (%d vs %zd) at entry %d",
(int) (ptr - cdPtr), cdLength, i);
goto bail;
}
}
ALOGV("+++ zip good scan %d entries", numEntries);
result = 0;
bail:
return result;
}
The function is passed a pointer to a ZipArchive
struct which represents the ZIP file.
struct ZipArchive {
/* open Zip archive */
int mFd;
/* mapped central directory area */
off_t mDirectoryOffset;
MemMapping mDirectoryMap;
/* number of entries in the Zip archive */
int mNumEntries;
/*
* We know how many entries are in the Zip archive, so we can have a
* fixed-size hash table. We probe on collisions.
*/
int mHashTableSize;
ZipHashEntry* mHashTable;
};
At this point the Central Directory of the ZIP file has been mapped into memory.
The function starts by allocating the memory for the hashtable which maps file names to File Headers.
The hashtable is implemented as a fixed size linear hashtable which maps file names to ZipHashEntry
structs.
The struct ZipHashEntry
is defined like this
struct ZipHashEntry {
const char* name;
unsigned short nameLen;
};
and the allocated hashtable is simply an array of them.
The size of the hashtable is a power of two so the computation x mod hashtable size
can be ‘optimized’.
Having allocated the hashtable it iterates over the File Headers in the Central Directory. For each one it does some basic sanity checking and then adds a pointer to the file name to the hashtable using the addToHash
function.
Note that like the readCentralDirectory
method it does not check to see whether there is more than one file with the same name.
1.1 computeHash
Source: $(ANDROID_SRC)/dalvik/libdex/ZipArchive.cpp
static unsigned int computeHash(const char* str, int len)
{
unsigned int hash = 0;
while (len--)
hash = hash * 31 + *str++;
return hash;
}
The computeHash
employs the canonical java.lang.String
hashCode
algorithm.
1.2 addToHash Again
Source: $(ANDROID_SRC)/dalvik/libdex/ZipArchive.cpp
static void addToHash(ZipArchive* pArchive, const char* str, int strLen, unsigned int hash)
{
const int hashTableSize = pArchive->mHashTableSize;
int ent = hash & (hashTableSize - 1);
/*
* We over-allocated the table, so we're guaranteed to find an empty slot.
*/
while (pArchive->mHashTable[ent].name != NULL)
ent = (ent + 1) & (hashTableSize-1);
pArchive->mHashTable[ent].name = str;
pArchive->mHashTable[ent].nameLen = strLen;
}
The function addToHash
starts by computing ent
, the index at which to add the entry for the name.
The index is simply the hash value mod the hashtable size.
It then examines the entry at the index ent
. If the entry is empty it adds the name there, otherwise it proceeds to look for the next empty entry.
Since the size of the hashtable is larger than the number of entries it is guaranteed to find one.
1.3 Example
Given the Central Directory of the APK shown in this example then the addition of the names to the hashtable by the parseZipArchive
function goes like this.
Note that there are 16 files in the example APK so the size of the hashtable is 32.
The first seven names are all added without collisions.
Name | Hash | Index Of Entry |
---|---|---|
res/layout/activity_main.xml |
3289686140 | 28 |
res/menu/activity_main.xml |
3702837041 | 17 |
AndroidManifest.xml |
4186266567 | 7 |
resources.arsc |
810769514 | 10 |
res/drawable-hdpi/ic_action_search.png |
2475784161 | 1 |
res/drawable-hdpi/ic_launcher.png |
1773930598 | 6 |
res/drawable-ldpi/ic_launcher.png |
138633698 | 2 |
The name res/drawable-mdpi/ic_action_search.png
hashes to 679672038, which mod 32 is 6, so it collides with res/drawable-hdpi/ic_launcher.png
.
The next index is 7 which gives another collision, this time with AndroidManifest.xml
.
The entry at is 8 is empty, so it is added here.
The name res/drawable-mdpi/ic_launcher.png
hashes to 4024776769 which mod 32 is 1 so it collides with res/drawable-hdpi/ic_action_search.png
.
It collides again at 2 with res/drawable-ldpi/ic_launcher.png
, and is added at 3.
The next three names are added without collisions
Name | Hash | Index Of Entry |
---|---|---|
res/drawable-xhdpi/ic_action_search.png |
2409726889 | 9 |
res/drawable-xhdpi/ic_launcher.png |
3887447966 | 30 |
classes.dex |
4055048783 | 15 |
Then classes.dex
is added for the second time.
This necessarily collides with the existing entry for the first classes.dex
at 15.
The entry at 16 is empty so it is added there.
The name META-INF/MANIFEST.MF
hashes to 1539143842
which mod 32 is 2 so it collides with
res/drawable-ldpi/ic_launcher.png
.
It collides again with res/drawable-mdpi/ic_launcher.png
at 3 before being added at 4.
The name META-INF/CERT.SF
hashes to 3935803975 which mod 32 is 7 so it collides with AndroidManifest.xml
It then collides three more times at 8, 9, and 10 with
res/drawable-mdpi/ic_action_search.png
res/drawable-xhdpi/ic_action_search.png
resources.arsc
respectively before being added at 11.
The name META-INF/CERT.RSA
hashes to 1750838444 which mod 32 is 12. The entry is empty so it is added at 12.
After all that the hashtable looks like this
Index | Name |
---|---|
0: | <empty> |
1: | res/drawable-hdpi/ic_action_search.png |
2: | res/drawable-ldpi/ic_launcher.png |
3: | res/drawable-mdpi/ic_launcher.png |
4: | META-INF/MANIFEST.MF |
5: | <empty> |
6: | res/drawable-hdpi/ic_launcher.png |
7: | AndroidManifest.xml |
8: | res/drawable-mdpi/ic_action_search.png |
9: | res/drawable-xhdpi/ic_action_search.png |
10: | resources.arsc |
11: | META-INF/CERT.SF |
12: | META-INF/CERT.RSA |
13: | <empty> |
14: | <empty> |
15: | classes.dex |
16: | classes.dex |
17: | res/menu/activity_main.xml |
18: | <empty> |
19: | <empty> |
20: | <empty> |
21: | <empty> |
22: | <empty> |
23: | <empty> |
24: | <empty> |
25: | <empty> |
26: | <empty> |
27: | <empty> |
28: | res/layout/activity_main.xml |
29: | <empty> |
30: | res/drawable-xhdpi/ic_launcher.png |
31: | <empty> |
1.4 dexZipFindEntry Again
ZipEntry dexZipFindEntry(const ZipArchive* pArchive, const char* entryName)
{
int nameLen = strlen(entryName);
unsigned int hash = computeHash(entryName, nameLen);
const int hashTableSize = pArchive->mHashTableSize;
int ent = hash & (hashTableSize-1);
while (pArchive->mHashTable[ent].name != NULL) {
if (pArchive->mHashTable[ent].nameLen == nameLen &&
memcmp(pArchive->mHashTable[ent].name, entryName, nameLen) == 0)
{
/* match */
return (ZipEntry)(long)(ent + kZipEntryAdj);
}
ent = (ent + 1) & (hashTableSize-1);
}
return NULL;
}
As you would expect the dexZipFindEntry
function uses exactly the same algorithm to find an entry matching the given name as addToHash
does to find where to add a given name.
The return value is a ZipEntry
which is defined like this
typedef void* ZipEntry;
Because NULL is returned when an entry is not found, the constant kZipEntryAdj
has to be added to the value to be returned when an entry is found in case it is at index 0.
1.5 Example Continued
Having used the function dexZipOpenArchive
to open the application’s APK, dvmJarFileOpen
then calls the function dexZipFindEntry
to find the application’s classes.dex
file.
Continuing the example above, when dexZipFindEntry
is passed the string
"classes.dex"
it will compute the hash which gives 4055048783, which mod 32 is 15 as we know.
The entry at 15 will match and dexZipFindEntry
will return.
We know that this is the entry for the first classes.dex
in the APK.
We also know that this is the classes.dex
file which did not get verified during installation.
Copyleft (c) 2013 By Simon Lewis. All Rights Reserved.
Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.
Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.