Monday, August 16, 2010

How to make a simple virus/malware scanner

16 comments
Making a working antivirus is a very hard thing because of several reason:
  1. You need to get virus/malware sample which is not easy to get.
  2. Most virus infect the executable file and some use polymorphism making it harder to detect.
  3. Most antivirus use heuristics scan which analyze the file and if the file look suspicious it will flag it and ask the user what to do. This is hard to implement and poor heuristics scan can result many false-positive.
  4. Need a driver which will monitor the system like file read/write.
  5. etc...

Today, I'm going to show you how to make a simple malware scanner (not antivirus) in MSVC2008 C/C++ which use hash to compare file with database. This methods only works on some kind of malware, eg. worm, trojan, or any file which doesn't change itself because we will hash the whole file content.



First, let's draft how our malware scanner will work:
  • The scanner will scan by hashing the file and comparing the hash with hash list in database using MD5.
  • The scanner will only scan file size that below 50MB and will skip some file types like .txt/.rtf
  • The scanner start scanning and firstly it will scan for all process and its module (dlls) and terminate it if found as malware.
  • Then the scanner will scan startup folder and registry entry in all possible startup places and if found, delete registry and file. For example
    • C:\Users\Username\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup
    • HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Run
  • Search the local harddrive and delete malware if found.
This tutorial covers only highlighted text.

Now, let's start. Get eicar.com from here: http://www.eicar.org/anti_virus_test_file.htm (eicar is antivirus test file)
Open Microsoft Visual Studio and Create a New Project, name it anything you like for you scanner. (I named it as ScannerTutorial)
For this tutorial, we will create a console project and use Multi-Byte character set.


Now you have empty console project. Get MD5 from here http://sourceforge.net/projects/libmd5-rfc/
Add md5.c and md5.h to your project.


Then Right-click on your project name (ScannerTutorial) in Solution Explorer, Click Add\Class... and select C++ Class.
Write CFileScanner in Class name and click Finish

The CFileScanner will have the following methods:
  • BOOL ScanFile(LPCSTR lpFileName, BOOL bDelete
  • void ScanFolder(LPCSTR lpFolderName);
  • void ScanProcess();
and 2 members
  • vector <char*> m_vDatabase; // Hash database
  • vector <char*> m_vExcludedExt; // Excluded extension

/*
    Scan for a single file
        lpFileName        Filename to scan (full path)
        bDelete            Delete file if found infected
    Return Value
        TRUE            File is infected
        FALSE            File is clean
*/BOOL CFileScanner::ScanFile(LPCSTR lpFileName, BOOL bDelete)
{
    // Get file extension
    const char *lpExt = lpFileName;
    for (unsigned int i=0; i<strlen(lpFileName); i++) {
        if (lpFileName[i] == '.')
            lpExt = lpFileName + i + 1;
    }

    // Exclude excluded file extension
    for (size_t i=0; i<m_vExcludedExt.size(); i++) {
        if (_stricmp(lpExt, m_vExcludedExt[i]) == 0)
            return FALSE;
    }

    HANDLE hFile = CreateFile(lpFileName, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
    if (hFile == INVALID_HANDLE_VALUE)
        return FALSE;                            // Error, cannot open file. Return FALSE

    // Get file size and proceed if file is below 50MB
    DWORD dwFileSize = GetFileSize(hFile, NULL);

    if (dwFileSize > 52428800)                   // http://www.google.com/search?q=50megabyte+to+bytes
        return FALSE;                            // More than 50MB

    // Start hash
    md5_state_t state;
    md5_byte_t digest[16];
    char buffer[1024];
    char szHash[16*2 + 1];
    DWORD dwRead, dwTotal = 0;

    md5_init(&state);
    do {
        ReadFile(hFile, buffer, 1024, &dwRead, NULL);
        md5_append(&state, (const md5_byte_t *)buffer, dwRead);

        dwTotal += dwRead;
    } while (dwTotal < dwFileSize);
    md5_finish(&state, digest);

    // Convert hash to hex
    for (int di = 0; di < 16; ++di)
        sprintf(szHash + di * 2, "%02x", digest[di]);

    CloseHandle(hFile);                          // Close file handle
    // End hash

    // Compare md5 with database
    for (size_t i=0; i<m_vDatabase.size(); i++)
    {
        if (strcmp(szHash, m_vDatabase[i]) == 0)
        {
            // Write output to console
            printf("Found: %s\n", lpFileName);

            // Delete file
            if (bDelete) DeleteFile(lpFileName);

            return TRUE;                         // We found matched hash with database
        }
    }

    // Default return value
    return FALSE;
}

as you can see the function above, first, it will compare the file extension and then it hash the file content and compare the hash with the database, if the file hash found then it will delete the file. :)
Now let's see next function

/*
    Scan drive/folder and its subfolder
        lpFolderName    Folder to scan (full path)
    Return Value
        None
*/void CFileScanner::ScanFolder(LPCSTR lpFolderName)
{
    WIN32_FIND_DATA tFindData;
    HANDLE hFind;

    char szFolder[MAX_PATH];               // Folder with trailing backslash
    char szFind[MAX_PATH];                 // Folder name with wildcat
    vector <char*> vFolder;                // Store subfolder. Used to scan subfolder

    // If file, just scan
    if (!(GetFileAttributes(lpFolderName) & FILE_ATTRIBUTE_DIRECTORY)) {
        ScanFile(lpFolderName, TRUE);
        return;
    }

    // Copy folder name to szNewFolder and add trailing backslash if neccessary
    strcpy(szFolder, lpFolderName);        // Copy string to szFolder
    if (szFolder[strlen(szFolder) - 1] != '\\')
        strcat(szFolder, "\\");            // Add trailing backslash

    // Add wildcat
    strcpy(szFind, szFolder);              // Copy szFolder
    strcat(szFind, "*");                   // Add wildcat

    hFind = FindFirstFile(szFind, &tFindData);
    if (hFind == INVALID_HANDLE_VALUE)
        return;

    do {
        // Directory, copy to vFolder
        if (tFindData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
        {
            // File name is not . or ..
            if (!strcmp(tFindData.cFileName, ".") == 0 &&
                !strcmp(tFindData.cFileName, "..") == 0)
            {
                // Find maximum length with null string
                unsigned int nLen = strlen(szFolder) + strlen(tFindData.cFileName) + 1;

                // Create a new string
                char *lpFolder = new char[nLen];
                if (lpFolder == NULL) return;

                // Construct path
                strcpy(lpFolder, szFolder);
                strcat(lpFolder, tFindData.cFileName);

                // Add to vector array for later processing
                vFolder.push_back(lpFolder);
            }
        }
        else
        {
            // Find maximum length with null string
            unsigned int nLen = strlen(szFolder) + strlen(tFindData.cFileName) + 1;

            // Create a new string
            char *lpFile = new char[nLen];
            if (lpFile == NULL) return;

            // Construct path
            strcpy(lpFile, szFolder);
            strcat(lpFile, tFindData.cFileName);

            // Scan this file
            ScanFile(lpFile, TRUE);

            // Free memory
            delete []lpFile;
        }
    } while (FindNextFile(hFind, &tFindData) != 0);

    // We are done scanning this folder
    FindClose(hFind);

    // Now, let's scan subfolder
    for (size_t i=0; i<vFolder.size(); i++)
    {
        if (vFolder[i] != NULL) {
            ScanFolder(vFolder[i]);          // Call this function
            delete []vFolder[i];             // Free memory
        }
    }
}

Firstly, the function will get the target attributes, if it recognizes it as a file, then it calls ScanFile() function and return.
Then, it will call FindFirstFile() function to start listing file/folder in the directory and continue using FindNextFile(). When the folder is found, it will add folder path + folder name into vector array. You may noticed that I excluded "." and ".." from being added to array. If you didn't know, single dot, "." means current directory while double dot, ".." mean previous directory. You can open Command Prompt and try to change dir cd to ./..
If we include both ./.., we will end up in infinite loop.


void CFileScanner::ScanProcess()
{
    DWORD dwPIDs[1024], cbNeeded, cProcesses;

    // Enumerate running processes
    if (!EnumProcesses(dwPIDs, sizeof(dwPIDs), &cbNeeded))
        return;

    // Calculate how many process identifiers were returned.
    cProcesses = cbNeeded / sizeof(DWORD);

    for (unsigned int i=0; i<cProcesses; i++)
    {
        HMODULE hMods[1024];
        DWORD cbNeeded;
        HANDLE hProcess;

        // Get a list of all the modules in this process.
        hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ | PROCESS_TERMINATE, FALSE, dwPIDs[i]);
        if (NULL != hProcess)
        {
            if(EnumProcessModules(hProcess, hMods, sizeof(hMods), &cbNeeded))
            {
                for (unsigned int i = 0; i<(cbNeeded / sizeof(HMODULE)); i++ )
                {
                    char szModName[MAX_PATH];

                    // Get the full path to the module's file.
                    if (GetModuleFileNameEx(hProcess, hMods[i], szModName, MAX_PATH))
                    {
                        // Scan file and if found, don't delete it because the file is in use
                        if (ScanFile(szModName, FALSE))
                        {
                            // Terminate current process first, so we can delete file
                            TerminateProcess(hProcess, 0);

                            // Delete the file
                            DeleteFile(szModName);

                            // Continue to next process
                            goto SKIP;
                        }
                    }
                }
            }
SKIP:
            // Close process handle
            CloseHandle(hProcess);
        }
    }
}

The above function has problem with Windows 64 bit. It will only list 32bit process name and modules. The above function simply enumerate all processes run in windows and get all modules or dlls that is loaded with the process. The first module will be always the process name and doesn't need to call GetModuleBaseName() API function.

At the constructor of this class, you can init this 2 member variables

CFileScanner::CFileScanner(void)
{
    // Fill database (use lower case)
    m_vDatabase.push_back("44d88612fea8a8f36de82e1278abb02f");    // eicar.com hash
    m_vDatabase.push_back("7e28c727e6f5c43179254e2ccb6ffd3a");    // Some new folder.exe worm

    m_vExcludedExt.push_back("txt");
    m_vExcludedExt.push_back("ini");
    m_vExcludedExt.push_back("inf");
    m_vExcludedExt.push_back("doc");
    m_vExcludedExt.push_back("rtf");
    m_vExcludedExt.push_back("cfg");

    m_vExcludedExt.push_back("zip");
    m_vExcludedExt.push_back("rar");
    m_vExcludedExt.push_back("tar");
    m_vExcludedExt.push_back("gz");
    m_vExcludedExt.push_back("bz2");

    m_vExcludedExt.push_back("jpg");
    m_vExcludedExt.push_back("jpeg");
    m_vExcludedExt.push_back("png");
    m_vExcludedExt.push_back("bmp");
    m_vExcludedExt.push_back("gif");
}

Now your CScanFile is complete and you may now call your class in main function like below
CFileScanner oScan;
oScan.ScanProcess();
oScan.ScanFolder("C:\\");

Here is complete MSVC 2008 tutorial file: Download
Happy coding :)

16 comments :

  1. can u tell me how to use this scanner tutorial (scanner) in visual c++ basic!!!

    ReplyDelete
  2. you mean visual basic .net right?

    ReplyDelete
  3. There is download link at the end of tutorial which include how to run the code.. Download it and compile it first, then run it.

    The example is located on ScannerTutorial.cpp

    ReplyDelete
  4. for 64 bit, http://msdn.microsoft.com/en-us/library/windows/desktop/ms682633(v=vs.85).aspx

    ReplyDelete
  5. Our services to Get instance Help , if you have facing any problem about windows 7 ,then please go through this Url its help you.
    error 1068 windows7
    Thank you
    Aalia lyon

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. What kind of header files other than md5.h shud i use to access these inbuilt functions. plzz reply asap:)

    ReplyDelete
  8. This blog awesome and i learn a lot about programming from here.The best thing about this blog is that you doing from beginning to experts level.

    Love from

    ReplyDelete
  9. Mr/Mrs Syahmi

    great tutorial ! how to write malware scanner to detect polymorphic malware (microsost visual c++) ?

    ReplyDelete
  10. Mr Syahmi
    can u help me with this code? How it work?

    ReplyDelete
  11. How to quarantine the virus? Can you help me?

    ReplyDelete
  12. I've been using AVG anti virus for a number of years now, I would recommend this solution to all of you.

    ReplyDelete
  13. Hello ...mate can u help me with the erroe its showing during compilation...
    Its Showing undefined referenced to CFileScanner()

    ReplyDelete