“We are drowning in information and starving for knowledge.”
In the book Megatrends published in 1982, John Naisbitt presents ten new directions that would change people’s lives. Especially, he outlines where the sophisticated technology is taking people and concluded into a sentence above.
It couldn’t prove more right. It is not easy to make decision when faced with massive information. Entering Hangzhou Dianzi University, I studied New Media Communication in my freshman year. However, it appeared I lacked interests about courses in communication. While being the top one in the course of calculus and Visual Basic programming, I showed a strong enthusiasm in mathematics and computer science. On second thoughts, I switched my major to Statistics
…show more content…
In Professor Yujie Cui’s regression analysis class I acquired the methods to select models and variables in cases that the standard regression assumptions are violated or the set of variables to be included in model is not predetermined. During the second semester of my first graduate year, I tried to introduce two supervised learning algorithms respectively based on random forest and LASSO-logistic regression to predict whether citizens in Beijing will use public bicycles in the future, with the assistance of my advisor Professor Yun Chen and Jiong Chen from Columbia University. First-hand data was collected by sending out questionnaires to residents in the six districts of Beijing. During the data-analyzing phase, residents’ fifteen characteristics, such as gender, age, degree, profession, monthly income, district, commuting distance, commuting cost, commuting method, were used as predictors, while their choices about whether to use or not was perceived as dummy dependent variable. LASSO was conducted to shrink the fifteen predictors to the final four. It was found that compared with random forest, the lasso-logistic regression is a relatively more appropriate classifier to predict the usage of public bicycle in Beijing in the future since it has a …show more content…
I, together with a master and a Ph. D student in economics, was in charge of the quality control procedure in Chinese General Social Survey (CGSS), a major project conducted by the center. Since this was the first year that questionnaires were done through Limesurvey(an online survey software ), there appeared to be more logic errors in the data to be checked and cleaned than before. We selected major questions in the survey to be scrutinized and designed algorithms for scrutinizing bay Stata in the procedure. Being the only staff majoring in statistics in that team, I could take the credit for writing the statistical report on the quality of data newly uploaded on a daily basis. I introduced statistical models to better analyze and visualize the data. Since the survey population might not be normally distributed, kernel density estimation was used to describe the distribution in different age groups and various indexes such as absence rates in different sections of the survey were presented in the report. As an analyst and researcher frequently using the data collected by those large-scale surveys, I was fully aware of the importance of the quality control procedure. It was a great honor for me to be involved in the CGSS