中文字幕一区日韩精品,亚洲熟妇久久精品,在线观看亚洲av每日更新

　　Z同學，來自大陸的一所重點師范大學，統(tǒng)計專業(yè)，大四。夢想著進入美國名校讀統(tǒng)計學或商業(yè)分析專業(yè)。為此，參加美國名?？蒲?，增加學術(shù)背景，開拓視野，獲得真知。通過美國名?？蒲欣蠋煹闹笇?，申請到了哈佛大學生物統(tǒng)計專業(yè)的科研機會。

　　由于交叉學科的普遍性，生物統(tǒng)計專業(yè)，碩博錄取中，每年都有很多數(shù)學、統(tǒng)計、計算機等背景的學生。

　　Summer Research Statement Report

　　I am honored to have this opportunity to participate in the summer research program at Harvard medical school. Now I will report to all on this research project. In recent years, machine learning has been very popular in the field of artificial intelligence, and it is also a new tool for improving prediction level. My major and machine learningare also very relevant, therefore, before I came to the United States, I had decided to learn some machine learning algorithm as soon as I can, and applying in medical related field, although I did not know own research. Before leaving, I also think aboutthe possible difficulties in the project: one is the data acquisition and preprocessing, the data is one of the key factors for successful machine learning, machine learning professor N, a Amazon AI team member, once said: no matter how good is an algorithm, the best way to drive machine learning progressing is to obtain large amounts of data.The second is the improvement of the algorithm.These two hypotheses have also been proved in a month of scientific research.

　　我很榮幸有這個機會參加哈佛醫(yī)學院的暑期研究項目。現(xiàn)在我要向大家報告近年來，機器學習在人工智能領(lǐng)域得到了廣泛的應(yīng)用，同時也是一種新的學習工具。提高預測水平。我的專業(yè)和機器學習也非常重要，因此，在我來美國之前，我有決定盡快學習一些機器學習算法，并應(yīng)用于醫(yī)學相關(guān)領(lǐng)域，雖然我自己不知道。在離開之前，我還考慮了項目中可能遇到的困難：一是數(shù)據(jù)采集和預處理，數(shù)據(jù)是機器學習成功的關(guān)鍵因素之一，機器學習N教授，一個亞馬遜AI團隊成員，曾經(jīng)說過：不不管算法有多好，驅(qū)動機器學習的最好方法是獲取大量的數(shù)據(jù)。改進算法，這兩個假設(shè)在一個月的科學研究中也得到了證實。

　　After the first meeting, I understand the content of the degrees of freedom is very high, from the selected topic, suppose every steps, data, and even if solving the problem,every step is up to myself, mentor’s rich background can help in every way to me. Although the mentor said I also can choose the subject of the financial sector to study, which is a small kidding? Ha~ Because I had no medical background, so in the early stage of the research I take some time to replenish the knowledge of tumor and genes, so it can complete the topic selection better, ultimately I determine the project isTumor Gene Identification, research mainly with MATLAB platform.

　　第一次見面后，我了解自由度的內(nèi)容很高，從選題出發(fā)，假設(shè)每一步、數(shù)據(jù)，即使解決了問題，每一步都取決于我自己，導師豐富的背景可以幫助我。說我也可以選擇金融部門的課題來研究，這是個小玩笑嗎?因為我沒有醫(yī)學背景，所以在研究初期，我需要一定時間補充腫瘤和基因的知識，從而完成選題。好，最后我決定項目的istumor基因鑒定，研究主要以MATLAB為平臺。

　　After determining the research content, I analyze the existing data, the data characteristics of these genes are less samples, but gene dimensionality is high, and these data had been labeled set. So with these data,and basing on the literature study, I decided to choose the SVM prediction model for training and classification. (I need explain data a little more—data is another classmate send to me, there have been some problems with the data at the beginning, so I added another set of data, all data in project is from TCGA database.)Then I encountered the first difficulty: the preprocessing of the data, the quality of the data will affect the classification effect of the later SVM, so I spent a lot of time on the data processing.The processing of data is divided into three steps: First, the data is been normalized, so that the data is in the same level, which will eliminate the differences of data as much as possible. Second, remove extraneous genes and redundant genes, so that the genes where remained are genes that are either mutated or mutating and not duplicated.When removing extraneous genes,I chose the information index to classification method, which is a good way to consider the effect of variance size on classification results, this way is based on the common signal-to-noise ratio method.In removing redundant genes, I chose The correlation coefficient of redundancy elimination method, determininga gene whether need to eliminate with the help of the similarity between each gene, the final classification results shows that the feature extraction function of genes is very obvious.Third, I used the principal component method to classify the genes, after these three steps, there are only 134 genes left, which greatly reduce the dimension and get the expected result. After the data preprocessing, the data sets were randomly divided into training setandtest set, first put the training set into the SVM model to determine sample type, the accuracy is as high as 98.8889%, this result is good. So this model cango on forecasting, the accuracy in forecasting test set classification is 99.2063%.To this end, the study of the project ended and the classification effect is the ideal result.

　　在確定研究內(nèi)容后，對現(xiàn)有的數(shù)據(jù)進行分析，發(fā)現(xiàn)這些基因的數(shù)據(jù)特征較少，但基因較少。維度很高，這些數(shù)據(jù)被標記為集合。因此，根據(jù)這些數(shù)據(jù)，在文獻研究的基礎(chǔ)上，我決定選擇支持向量機訓練分類預測模型。(我需要解釋數(shù)據(jù)，多一點數(shù)據(jù)是另一個同學發(fā)給我的，有首先是數(shù)據(jù)的一些問題，所以我增加了一組數(shù)據(jù)，在項目的所有數(shù)據(jù)從TCGA數(shù)據(jù)庫。)然后我遇到的第一個難點是數(shù)據(jù)的預處理，數(shù)據(jù)的質(zhì)量會影響后期SVM的分類效果，所以我在數(shù)據(jù)處理上花費了大量的時間。數(shù)據(jù)的處理分為三個步驟：首先，對數(shù)據(jù)進行歸一化處理，使數(shù)據(jù)處于同一水平，盡可能地消除數(shù)據(jù)的差異。第二，去除多余的基因和冗余。因此，基因，仍然是任何突變或變異和不重復基因的基因。當去除多余的基因，我選擇了信息索引的分類方法，這是一個很好的方法來考慮方差大小對分類的影響。結(jié)果，這種方法是基于常用的信噪比方法，在去除冗余基因時，選擇了相關(guān)系數(shù)。redundancy elimination method, determininga gene whether need to eliminate with the help of the similarity between each gene, the final分類結(jié)果表明，基因的特征提取功能非常明顯。第三。對這些基因進行分類，在這三個步驟之后，只剩下134個基因，大大減少了維數(shù)并得到了預期。結(jié)果，經(jīng)過數(shù)據(jù)預處理、數(shù)據(jù)集隨機分為訓練setandtest集，先放在SVM的訓練模型確定樣品類型，準確度高達98.8889%，效果良好。該模型可以預測的準確性在預測中，測試集分類為99.2063%，對項目的研究結(jié)束，分類效果是理想的結(jié)果。

　　Actually, before determine using the SVM algorithm, I also tried lasso algorithm and neural network, lasso algorithm and principal component analysis has same effect, dimension reduction, it all have a good effect on extracting feature selection. The BP neural network is one of the prediction algorithm will often use, but only after a lasso algorithm processing of data still belonged to the noisy and high dimension data, it did not achieved ideal effect in the BP neural network training.Finally, the SVM algorithm is been found for the characteristics of genetic data, and doing a large number of effective data preprocessing before the application classifier, which can result in a better results.

　　實際上，在使用SVM算法確定之前，我還嘗試了套索算法和神經(jīng)網(wǎng)絡(luò)，套索算法和本金。構(gòu)件分析具有相同的效果，降維，對特征提取都有很好的效果。其中一種預測算法將經(jīng)常使用，但只有經(jīng)過套索算法處理的數(shù)據(jù)仍然屬于噪聲和高。維數(shù)數(shù)據(jù)，在bp神經(jīng)網(wǎng)絡(luò)訓練中未取得理想效果，最后得到了支持向量機算法。遺傳數(shù)據(jù)的特點，并在應(yīng)用分類器之前做大量有效的數(shù)據(jù)預處理，這可能導致較好的結(jié)果。

　　The main scientific research project for the SVM algorithm improvement concentrate on the data processing, on the feature selection and extraction achieved good effect, and then in the classifier training also achieved good results.This scientific research project also need to continue to study: first, although the classification result is not bad, but the operation is very time consuming, especially in eliminate gene steps, which need up to an hour, hope it can accelerate the speed in the future.Second, the application of the data is open, if the model is applied to hospitals, which is a more real complex and large database, whether such processing method can also achieve ideal result or not, so support vector machine (SVM) on gene expression data analysis research have a lot of work to do in the future.

　　支持向量機算法改進的主要科研項目集中在數(shù)據(jù)處理、特征選擇和提取取得了很好的效果，然后在分類器的訓練中也取得了良好的效果。繼續(xù)研究：第一，雖然分類效果不錯，但操作非常耗時，特別是在消除基因方面。需要一個小時的步驟，希望它能加快未來的速度。應(yīng)用于醫(yī)院，這是一個更真實、復雜、龐大的數(shù)據(jù)庫，這種處理方法是否也能達到理想的效果呢? 因此，支持向量機(SVM)在基因表達數(shù)據(jù)分析方面的研究還有很多工作要做。

　　China and the United States have a lot of differences in teaching even if in university.Although I have participated in some projects with my teacher before in China, what I do more is doingwith teacher's leader step by step; but this project research degrees of freedom is very high, in order that what the algorithm I want to apply is much more, sometimes I can not find direction, and I overturn the idea in the past many times, always looking for a new and suitable thought, this also lead to a few problems on time management.What’s more, in communication with my dear mentor, I sometimes feel that I don't have the idea of taking shape to communicate with my tutor, these thought should be changed in my future study life.Boston is a very attractive city, where science and technology has become a pillar industry, whichnow many cities want to transformation in the direction of development. Duringthe leisure time, I always can meet some interesting people and thingsin the library or campus, this also let me really looking forward to the future study life.In addition to the knowledge gains, my oral English has also been improved, which is not only thanks to my mentor but also to my host family.Finally, the project was completed with the help of my dear mentor L, L, teacher Z and teacher L. Thanks all of you very much!

　　中國和美國，即使在大學有很多差異教學。雖然我也參加了一些項目，我教師在中國，我所做的更多的是對著老師的領(lǐng)導一步一步;但本項目研究的自由度很高，為了我要應(yīng)用的算法要多很多，有時候我找不到方向，我在過去推翻了這個想法。很多時候，總是在尋找新的合適的思想，這也導致了一些時間管理上的問題。與我親愛的導師的溝通，我有時覺得我不知道如何形成與我的導師溝通，這些在我未來的學習生活中，思想應(yīng)該改變。波士頓是一個非常有吸引力的城市，那里的科學和技術(shù)已經(jīng)成為支柱。行業(yè)，現(xiàn)在許多城市想發(fā)展的方向轉(zhuǎn)變。在閑暇時間，我總能遇到一些人。有趣的人和事圖書館或校園，這也讓我很期待今后的學習生活中，除了對知識的提高，我的口語也得到了提高，這不僅要感謝我的導師，還要感謝我的寄宿家庭。項目完成了我親愛的Z老師，L老師，L老師。感謝您們!

　　附學生科研期間周總結(jié)：

　　week 3

　　接著之前的周報，側(cè)重于統(tǒng)計方法的研究，因此我決定利用機器學習算法做肺癌ALK基因的分類預測。首先要做的是提取特征值，通常用的是都是主成分分析等方法，曾經(jīng)研讀過的一篇關(guān)于財政評價體系構(gòu)建文獻，它利用lasso算法很好地進行了特征歸類，因此我就想利用此算法提取出更加利于分析的特征值，在提取出來后，原有的特征變得比之前小了很多，隨后進行分類預測，我選擇的是適用度非常廣的神經(jīng)網(wǎng)絡(luò)算法，但是在輸出分類結(jié)果的時候并沒有得到預期的結(jié)果，即預測精度并沒有得到很好地提高，可能導致此種的原因有：第一，數(shù)據(jù)集過于片面，數(shù)據(jù)量可能不夠;第二，選擇了錯誤的算法。在下一周我將換一種機器學習方法進行分類預測。希望可以提高預測精度。

　　week 4

　　由于上一周應(yīng)用lasso和神經(jīng)網(wǎng)絡(luò)模型得到的預測精度并沒有得到很好地提高，因此在本周中我換了一種新的機器學習方法——SVM方法進行分類預測。由于對這個模型沒有學過，因此先用了一些時間來重新學習了一下，截止目前我已經(jīng)得到了初步的分類結(jié)果，但對于某些參數(shù)的設(shè)置和應(yīng)用我還是沒太搞清楚，因此我申請再多用一點點時間來得到更精確的分類結(jié)果。在實驗中，沒有得到理想的分類結(jié)果其實也是正常情況之一，如果多個方法做出來結(jié)果相同那么就可以說明此組數(shù)據(jù)的結(jié)果就是一類的。但是這個結(jié)論現(xiàn)在定下還有一點過早，我需要再一點時間來檢測是否有問題出現(xiàn)。

點擊收起

最新欧美精品一区二区三区,亚洲一区二区色一琪琪,亚洲一本一道一区二区三区,中文天堂最新版资源www,阳茎伸入阳道全过程视频

我要申請

規(guī)劃攻略

留學資訊

申請階段

師大女生：我在哈佛做生物統(tǒng)計科研

熱門推薦

相關(guān)閱讀

留學評估

留學問題輕松問

留學問題輕松問

只需 3 秒
快速計算美國留學費用

留學費用?元

點擊選擇您所在年級：

* 點擊選擇您所希望的消費情況：

只需 3 秒
快速計算美國留學費用

留學機構(gòu)不會告訴你

這樣做可以更省錢！

獲取更加省錢方案

計算結(jié)果

美國留學，受疫情影響嗎？

最新欧美精品一区二区三区,亚洲一区二区色一琪琪,亚洲一本一道一区二区三区,中文天堂最新版资源www,阳茎伸入阳道全过程视频

我要申請

規(guī)劃攻略

留學資訊

申請階段

師大女生：我在哈佛做生物統(tǒng)計科研

熱門推薦

相關(guān)閱讀

留學評估

留學問題輕松問

留學問題輕松問

只需 3 秒快速計算美國留學費用

留學費用?元

點擊選擇您所在年級：

* 點擊選擇您所希望的消費情況：

只需 3 秒快速計算美國留學費用

留學機構(gòu)不會告訴你

這樣做可以更省錢！

獲取更加省錢方案

計算結(jié)果

美國留學，受疫情影響嗎？

只需 3 秒
快速計算美國留學費用

只需 3 秒
快速計算美國留學費用

美國留學，受疫情影響嗎？