如何用Ai做任何事 | Mit How To Ai (Almost) Anything, Spring 2025

https://www.bilibili.com/video/BV1agH8zCE1V/?spm_id_from=333.1387.upload.video_card.click&vd_source=0645a76390602d5640c372c2f44d99e1
Lecturer: https://pliang279.github.io/

https://mit-mi.github.io/how2ai-course/spring2025/schedule/

Research Project

Research Projects on New Modalities

Motivation: Many tasks of real-world impact go beyond image and text.
Challenges:

Al with non-deep-learning effective modalities (e.g., tabular, time-series)
Multimodal deep learning + time-series analysis + tabular models
Al for physiological sensing, loT sensing in cities, climate and environment sensing
Smell, taste, art, music, tangible and embodied systems

Potential models and dataset to start with

Brain EEG Signal: https://arxiv.org/abs/2306.16934
Speech: https://arxiv.org/pdf/2310.02050.pdf
Facial Motion: https://arxiv.org/abs/2308.10897
Tactile: https://arxiv.org/pdf/2204.00117.pdf

Research Projects s on Al Reasoning

Motivation: Robust, reliable, interpretable reasoning in (multimodal) LLMs.
Challenges:

Fine-grained and compositional reasoning
Neuro-symbolic reasoning
Emergent reasoning in foundation models

Potential models and dataset to start with

Can LLMs actually reason and plan?
Code for VQA:
CodeVQA: https://arxiv.org/pdf/2306.05392.pdf,
VisProg: https://prior.allenai.org/projects/visprog,
Viper: https://viper.cs.columbia.edu/
Cola: https://openreview.net/pdf?id=kdHpWogtX6Y
NLVR2: https://arxiv.0rg/abs/1811.00491
Reference games: https://mcgill-nlp.github.io/imagecode/,
https://github.com/Alab-Nll/onecommon,
https://dmg-photobook.github.io/

Research Projects on Interactive Agents

Motivation: Grounding Al models in the web, computer, or other virtual worlds to help humans with digital tasks.
Challenges:

Web visual understanding is quite different from natural image understanding
Instructions and language grounded in web images, tools, APls
Asking for human clarification, human-in-the-loop
Search over environment and planning

Potential models and dataset to start with

WebArena: https://arxiv.org/pdf/2307.13854.pdf
AgentBench: https://arxiv.org/pdf/2308.03688.pdf
ToolFormer: https://arxiv.org/abs/2302.04761
SeeAct: https://osu-nlp-group.github.io/SeeAct/

Research Projects on Embodied and Tangible Al

Motivation: Building tangible and embodied Al systems that help humans in physical tasks.
Challenges:

Perception, reasoning, and interaction
Connecting sensing and actuation
Efficient models that can run on hardware
Understanding influence of actions on the world (world model)

Potential models and dataset to start with

Virtual Home: http://virtual-home.org/paper/virtualhome.pdf
Habitat 3.0 https://ai.meta.com/static-resource/habitat3
RoboThor: https://ai2thor.allenai.org/robothor
LangSuite-E: https://github.com/bigai-nlco/langsuite
Language models and world models: https://arxiv.org/pdf/2305.10626.pdf

Research Projects on Socially Intelligent Al

Motivation: Building Al that can understand and interact
with humans in social situations.
Challenges:

Social interaction, reasoning, and commonsense.
Building social relationships over months and years.
Theory-of-Mind and multi-party social interactions.

Potential models and dataset to start with

Multimodal WereWolf: https://persuasion-deductiongame.socialai-data.org/
Eg04D: https://arxiv.0rg/abs/2110.07058
MMToM-QA: https://openreview.net/pdf?id=ibLM1yvxaL
11866 Artificial Social Intelligence: https://cmu-multicomp-lab.github.io/asi-course/spring2023/

Research Projects on Human-Al Interaction

Motivation: What is the right medium for human-Al
interaction? How can we really trust Al? How do we enable collaboration and synergy?
Challenges:

Modeling and conveying model uncertainty - text input uncertainty, visual uncertainty, multimodal uncertainty? cross-modal interaction uncertainty?
Asking for human clarification, human-in-the-loop, types of human feedback and ways to learn from human feedback through all modalities.
New mediums to interact with Al. New tasks beyond imitating humans, leading to collaboration.

Potential models and dataset to start with

MMHal-Bench: https://arxiv.org/pdf/2309.14525.pdf aligning multimodal LLMs
HACL: https://arxiv.org/pdf/2312.06968.pdf hallucination + LLM

Research Projects on Ethics and Safety

Motivation: Large Al models are can emit unsafe text content, generate or retrieve biased images.
Challenges:

Taxonomizing types of biases: text, vision, audio, generation, etc.
Tracing biases to pretraining data, seeing how bias can be amplified during training, fine-tuning.
New ways of mitigating biases and aligning to human preferences.

Potential models and dataset to start with

Many works on fairness in LLMs -> how to extend to multimodal?
Mitigating bias in text generation, image-captioning, image generation

How Do We Get Research Ideas?

我们如何产生研究想法？

1 Bottom-up（自下而上）

Turn a concrete understanding of existing research's failings to a higher-level experimental question.
意思是：

先深入理解现有研究的不足、漏洞或失败之处
然后把这种具体的问题
抽象提升为一个更高层次的实验问题
换句话说：从“别人哪里没做好”出发 → 提炼出“更本质的科学问题”
典型的 problem-driven research generation（问题驱动型研究思路）
Bottom-up discovery（自下而上的研究想法）
“从细节往理论走”
从具体实验结果、局部问题、技术瓶颈出发
一点点往上抽象
形成新的研究方向
Great tool for incremental progress, but may preclude larger leaps
这种方法很适合渐进式进步（incremental progress），但可能不利于做大跨越式创新（large conceptual leaps）
总是在“修补现有体系”，而不是重新定义问题框架。

2 Top-down（自上而下）

Move from a higher-level question to a lower-level concrete testing of that question.

先提出一个宏观的、高层次的研究问题
然后把它拆解成可操作的具体实验或验证方式
理论驱动（theory-driven）或主题驱动（theme-driven）
Favors bigger ideas, but can be disconnected from reality
容易产生大问题
有可能带来突破性研究
更具战略性和系统性

但是可能脱离现实，缺乏可行性，有时会停留在概念层面
Bottom-up = 解决问题型思维
Top-down = 提问型思维
优秀的研究者通常需要在两种模式之间切换。

posted @ 2026-02-26 14:46 asandstar 阅读(21) 评论(0) 收藏举报

刷新页面返回顶部

asandstar

如何用Ai做任何事 | Mit How To Ai (Almost) Anything, Spring 2025

Research Project

Research Projects on New Modalities

Potential models and dataset to start with

Research Projects s on Al Reasoning

Potential models and dataset to start with

Research Projects on Interactive Agents

Potential models and dataset to start with

Research Projects on Embodied and Tangible Al

Potential models and dataset to start with

Research Projects on Socially Intelligent Al

Potential models and dataset to start with

Research Projects on Human-Al Interaction

Potential models and dataset to start with

Research Projects on Ethics and Safety

Potential models and dataset to start with

How Do We Get Research Ideas?

1 Bottom-up（自下而上）

2 Top-down（自上而下）

公告