Products
VideoChat with MOSS https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat_with_MOSS
VideoChat is a multifunctional video question answering tool that combines the functions of Action Recognition, Visual Captioning and StableLM. Our solution generates dense, descriptive captions for any object and action in a video, offering a range of language styles to suit different user preferences. It supports users to have conversations in different lengths, emotions, authenticity of language.
让AI看懂视频,适用长视频场景,决策是否值得投入时间看ChatGLM-6B https://github.com/THUDM/ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
还有130B,它最強的點,倒是底層的跨GPU平臺運行的能力(當然,情感上我個人還是傾向於歐美GPU,當然Fuck Nvidia是需要堅持的)
GLM-130B:开源的双语预训练模型:https://keg.cs.tsinghua.edu.cn/glm-130b/zh/posts/glm-130b/
Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University
在消費級別PC上能RUN,是個必然的趨勢,期待MiniGPT-4 https://github.com/Vision-CAIR/MiniGPT-4
Enhancing Vision-language Understanding with Advanced Large Language Models
增強視覺理解,官方的一個Demo是根據視頻中的圖片,描述出對應的文本信息
MiniGPT-4 可以生成準確的圖像描述,根據圖像編寫文本,為圖片描述的問題提供解決方案,甚至可以根據照片教用戶如何做某些事情。 (也就是GPT-4在Demo中呈現過的能力)gpt4free https://github.com/xtekky/gpt4free
收到OpenAI律師函的項目,必然有可取之處GPT-3 Demo https://gpt3demo.com/map
Real-time Market Map
基於GPT能力在應用層做嘗試的APP們Advancing AGI for humanity https://thegenerality.com/agi/blog.html
裏面的論文值得讀讀ControlNet https://github.com/lllyasviel/ControlNet
ControlNet is a neural network structure to control diffusion models by adding extra conditions.
可以將這個技術理解成是illustrator中的錨點TaskMatrix https://github.com/microsoft/TaskMatrix
TaskMatrix connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.
與多模態LLM有本質上的差別
Ideas
智慧信息的压缩:模型智能的涌现之道 https://mp.weixin.qq.com/s/hQmvltuMlClBonM6UJmtLg
OpenAI 不需要 langchain:
从传统产业链逻辑来说,OpenAI 提供基础模型能力,langchain 提供开发工具,这原本是没有太大冲突的,但是从获取更多含有人类智慧数据目标的角度,langchain 的存在会让“开发者是如何构建某种应用”这样一份极具价值的“有效数据”脱离 OpenAI 的控制范围。
——這裏非常有insight,值得關注
“对于以 AGI 为目标的 OpenAI 来说,所有的应用似乎只是下一阶段 AGI 能力的“有效数据”的免费提供者。”
這個邏輯在世界層面上,會造就模型集中,寡頭,在中國,可能也會聚攏到1-2家投身LLM,要从本质上想明白的三个问题 https://zhuanlan.zhihu.com/p/618902095
牢記表層的知識、事實,會最先被淘汰、替換,最深層的思考認知,纔是最關鍵、不可替換的Compression for AGI https://mp.weixin.qq.com/s/G613tUo4TzjddaysGs26AQ
找到解决感知问题的最小描述长度
這個思路和張小龍做微信的思路,也很一致,即找到原子化的組件,讓組件去流動、流通,核心思路也是找到解決問題的最小化、最優雅的方案解读 ChatGPT 背后的技术重点:RLHF、IFT、CoT、红蓝对抗 https://zhuanlan.zhihu.com/p/602458131
Toolformer: Language Models Can Teach Themselves to Use Tools https://arxiv.org/abs/2302.04761
里程碑式的能力
https://kikaben.com/toolformer-2023/ 這篇文章,配合Yann LeCun的Twitter,對Toolformer的理解能更深
“The limits of my language means the limits of my world. ” ——
Despite this, one still needs to approximate closer and closer to the truth through probabilistic truth. The representations of language are inherently wrong but it is very useful, it enables humans to convey propositions of the world and gain a deeper understanding and interaction with reality. This ability enables us to gain insight into the nature of consciousness and the human condition, an experience that is uniquely human.
突然開始理解,當年Silas拼命讓我們寫好Leads的原因.....因爲寫那些Description的過程,就是讓自己對世界產生更深認知的過程OpenAI正式推出多模态GPT-4 https://mp.weixin.qq.com/s/iw0wESsyP8nkPuFkj_EkOg
当任务的复杂性达到足够的阈值时,区别就出来了,GPT-4比GPT-3.5更可靠,更有创造力,能够处理更细微的指令。
开源OpenAI Evals,用于创建和运行评估GPT-4等模型的基准,同时逐个样本检查其性能。