Tuesday, December 13, 2016

How to extend Python on Windows (Deep Learning Course on Udacity related)

I am recently learning Deep Learning from Udacity.

For 1_notmnist.ipynb, Problem 1: Let's take a peek at some of the data to make sure it looks sensible. Each exemplar should be an image of a character A through J rendered in a different font. Display a sample of the images that we just downloaded. Hint: you can use the package IPython.display.

One alternative way might be the following code:

import os, fnmatch

img_files = []

def all_img_files(img_files, search_path, pattern = '*.png'):
    for path, subdirs, files in os.walk(search_path):
        if files and fnmatch.fnmatch(files[0], pattern):
            img_files.append(os.path.join(path, files[0]))
            break;
                
for folder in train_folders:
    all_img_files(img_files, folder)
        

for folder in test_folders:
    all_img_files(img_files, folder)
      
for img in img_files:
    Image(filename = img)

However I found it's extremely slow, probably due to every sub-directories and files will be gathered on os.walk returning. the break statement has only a little affect on the whole processing time.

So I decide to write some code which genuinely fetches the first png file in each A-J directories respectively. Readers can follow the below linkage for reference on how the work can be down via VC++:

There's plenty of material on how to do that, namely write extension for Python, so the following is just source code without much explanation. I did it with Anaconda python 3.5 with Visual C++ 2015. Other platforms probably need some adjustment:

#include <Python.h>
#include <tchar.h> 
#include <stdio.h>
#include <strsafe.h>

#include <Windows.h>
#include <Shlwapi.h>


#include "deelx.h"

#pragma comment(lib, "python35.lib")
#pragma comment(lib, "User32.lib")
#pragma comment(lib, "Shlwapi.lib")

static PyObject *get_first_matched_file_error;

static PyObject* get_first_matched_file(PyObject* self, PyObject* args)
{
WIN32_FIND_DATA ffd;
TCHAR szDir[MAX_PATH];
HANDLE hFind = INVALID_HANDLE_VALUE;
DWORD dwError = 0;

int wchars_num;
char* directoryA;
wchar_t* directoryW;
char* patternA;
wchar_t* patternW;

if (!PyArg_ParseTuple(args, "sz", &directoryA, &patternA))
return NULL;

wchars_num = MultiByteToWideChar(CP_UTF8, 0, directoryA, -1, NULL, 0);
directoryW = new wchar_t[wchars_num];
MultiByteToWideChar(CP_UTF8, 0, directoryA, -1, directoryW, wchars_num);

if (!PathFileExists(directoryW))
{
PyErr_SetString(get_first_matched_file_error, "Non-existing directory");
delete[] directoryW;
return NULL;
}

// Prepare string for use with FindFile functions.  First, copy the
// string to a buffer, then append '\*' to the directory name.

StringCchCopy(szDir, MAX_PATH, directoryW);
delete[] directoryW;
StringCchCat(szDir, MAX_PATH, TEXT("\\*"));

wchars_num = MultiByteToWideChar(CP_UTF8, 0, patternA, -1, NULL, 0);
patternW = new wchar_t[wchars_num];
MultiByteToWideChar(CP_UTF8, 0, patternA, -1, patternW, wchars_num);

CRegexpT <wchar_t> regexp(patternW);

// Find the first file in the directory.

hFind = FindFirstFile(szDir, &ffd);

if (INVALID_HANDLE_VALUE == hFind)
{
delete[] patternW;
PyErr_SetString(get_first_matched_file_error, "Cannot open directory");
return NULL;
}

PyObject * pyFileName = NULL;
// List all the files in the directory with some info about them.
do
{
if (ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
continue;
}
else
{
MatchResult result = regexp.Match(ffd.cFileName);
if (result.IsMatched())
{
char* cFileName;
int chars_num;

chars_num = WideCharToMultiByte(CP_UTF8, 0, ffd.cFileName, -1, NULL, 0, NULL, NULL);
cFileName = new char[chars_num];
WideCharToMultiByte(CP_UTF8, 0, ffd.cFileName, -1, cFileName, chars_num, NULL, NULL);

pyFileName = Py_BuildValue("s", cFileName);
delete[] cFileName;

break;
}
}
} while (FindNextFile(hFind, &ffd) != 0);

if (GetLastError() == ERROR_NO_MORE_FILES)
pyFileName = Py_BuildValue("s", "");

FindClose(hFind);
delete[] patternW;

return pyFileName;
}

static PyMethodDef get_first_matched_file_method[] = {
{
"get_first_matched_file",  get_first_matched_file,
METH_VARARGS, "Get the first file given directory and pattern"
},

{NULL, NULL, 0, NULL}        /* Sentinel */
};

static struct PyModuleDef get_first_matched_file_module =
{
PyModuleDef_HEAD_INIT,
"get_first_matched_file", /* name of module */
"Get the first file given directory and pattern",          /* module documentation, may be NULL */
-1,          /* size of per-interpreter state of the module, or -1 if the module keeps state in global variables. */
get_first_matched_file_method
};

PyMODINIT_FUNC PyInit_get_first_matched_file(void)
{
PyObject *m = PyModule_Create(&get_first_matched_file_module);
if (m == NULL)
return NULL;

get_first_matched_file_error = PyErr_NewException("get_first_matched_file.error", NULL, NULL);
Py_INCREF(get_first_matched_file_error);
PyModule_AddObject(m, "error", get_first_matched_file_error);

return m;
}


There's only one dependency, namely deelx.h, referring to the websites below:

A testing script is as follows:

import sys
sys.path.append("C:\\Users\\MS User\\Documents\\Visual Studio 2015\\Projects\\PythonExtensions\\x64\\Release")

import get_first_matched_file

directory = "C:\\tensorflow\\tensorflow\\examples\\udacity\\notMNIST_large\\B"
pattern = "\\.png$"

file = get_first_matched_file.get_first_matched_file(directory, pattern)
print(file)

Enjoy python, enjoy learning from Udacity.

Project setting:

How to change serving directory of Jupyter on Windows

Sometimes it's convenient altering the default directory which Jupyter serving from. For example, I prefer it serving from C:\tensorflow\tensorflow\examples\udacity since I git clone everything there.

First run the following command to generate the configuration file nammed jupyter_notebook_config.py, usually it resides in the .jupyter folder in your home directory:
jupyter notebook --generate-config


Now open the file and search the following line:
#c.NotebookApp.notebook_dir = ''

Uncomment it, put the target directory into the semicolon. Since on Windows platform, so we need to escape the backslash character:
c.NotebookApp.notebook_dir = 'C:\\tensorflow\\tensorflow\\examples\\udacity'

Final result:




Tuesday, December 6, 2016

How to use Anaconda

Anaconda is a one stop distribution of Python related scientific computing components. Probably its convenience is more obvious on Windows instead of Linux. Following is some summary of how to use conda on Windows platform.
To avoid anything unexpected happened, suggest to start Anaconda by choosing from start menu and launch the Anaconda Prompt.

1. Show all virtual environments created:
conda info --envs

2. Activate specific environment, like root:
activate root

3. Deactive specific environment, like root:
deactivate
There’s no need to append the option value since the command is aware of which environment it's currently in.
However, try not to deactivate the default root environment, since on *nix platform, it will try to remove Anaconda path variable from current shell environment variable. What you need to do is just switch to another virtual environment, and conda is clever enough to deactivate the previous one.

4. Create specified environment (here take “default” as an example) and initially with specified lib to be installed (here take “matplotlib” as an example):
conda create -n default matplotlib 

5. Create specified environment (for instance “default”) by clone another (here root)
conda create -n default --clone root

6. List the packages installed into specified environment (for instance “default”):
conda list -n default

7. Install package (here with option value “tensorflow”) into the current environment:
conda install tensorflow

8. Install package (as an example, “tensorflow”) into the specified environment (here with name root):
conda install -n root tensorflow

9. Search uncommon package in Anaconda website:
anaconda search -t conda package-name 

10. Show detail about found package
anaconda show user/package-name

11. Install specific package from specified channel:
conda install --channel https://conda.anaconda.org/user package-name

12. Install specific package (like tensorflow) with pip (it’s recommended to do in some virtual environment) by virtue of auto-resolving dependency:
pip instll tensorflow