lecture 1_Introduction

关于人类语言(human language)

语言就是符号：人类语言本质是一个符号系统（symbol system），无论是汉字还是英文字母，都是一种符号，用来承载、传递我们想要表达的意思（meaning）。
语言的载体：sound, vision(writting), gesture，不论是哪一种载体，都是一种连续的交流方式。
大脑是一种符号处理器（symbolic processors）：我们可以把大脑处理语言看成是连续模式的激活过程（continious pattern of activation）。
因此我们可以得到启发：探索一种连续的编码模式来表达思想(explore a continous encoding patten of thought)。这也是很多NLP算法的处理思想，同时也解决了sparsity的问题。

两大来源：通过语音或者文本。语音：语音分析（phonetic）或音韵分析（phonological）；文本：OCR识别（Optical Character Recognition，光学字符识别）或分词处理（tokenization）。通过上述方法来获取NLP的输入。
形态分析（morphological）：对单词进行形态分析：前缀（prefix）、后缀（suffix）等。
句法分析（syntactic）：分析句子结构、语法结构（structure of sentence）。
语义理解（semantic interpretation）：work out the meaning of sentences.
语篇处理（discourse processing）：因为大多数句子含义需要通过上下文（context）来推测，不能仅仅只分析当前句子，因此就有了the field of discourse processing。
注：cs224n课只重点讲syntatic & semantic analysis 这两块，以及一部分speech signal analysis。

较低级：spell checking, keyword search, finding synonyms
中级：extracting information。个人比较感兴趣的方向，让计算机可以阅读文本，理解在讲些什么，至少知道讲的是哪方面内容；从文本中识别、抽取某方面内容；或者为文本阅读难度分级（work out the reading level of school text）,识别文本的目标受众（intended audience of document）；情感分析（positive or negetive）。
高级：机器翻译、对话机器人、智能问答、机器撰写（exploit the knowledge of world）

语言本身的困难性：Ambiguilty of language, and moreover, humen always do not say everything（为了高效表达，语言使用中会出现很多省略）.
表征语言很困难：Complexity of representing, using linguistic/situational/world knowledge.
解释语言很困难：Real meaning of the language depends on real world, common sense, and contextual knowledge.

Most traditional machine learning algorithms work well because of human-designed representations and input featured.
“Machines” are only used to optimize weights that best make a final prdiction.
Moreover, manually designed featured are often over-specified(lack of generalization), incomplete and take a long time to design and validate.

Subfield of machine learning and part of representation learning.
Deep learning algorithms attempt to learn (multiple levels of) representations and an output themselves.
We only input the raw data.
In a lot of times, deep learning means neural networks (the dominant model family).

核心思想：用vector去表征语言，用神经网络去组织、计算vector。

Post Date： 2018-11-02