【TechWeb】3月25日新闻,OpenAI开创人兼CEO萨姆·阿尔特曼直播宣布了GPT-4o图像天生功效,作为多模态模子的GPT-4o补齐了图片天生这一主要拼图。GPT-4o图像天生能够遵守指令天生更正确的图像,OpenAI还为其挂载了固有常识库,能够依据常识库或高低文帮用户天生、编纂图像。明天起,GPT-4o图像天生曾经作为ChatGPT中的默许图像天生器向Plus、Pro、Team跟收费用户连续推出。当初,翻开ChatGPT,即可实验这些才能,但一般用户天天仅有3次休会机遇。开辟者经由过程API应用GPT-4o天生图像的权限,将在将来多少周内推出。从OpenAI官方展现跟演示的示例来看:GPT-4o图片天生对笔墨的处置才能很高,能够100%复原笔墨内容,且指定笔墨摆放地位,还能像持续剧一样,一边正确天生笔墨,一边变更人物举措。GPT-4o的图像能够遵守具体的提醒,如处置多达10-20个差别的工具。别的,GPT-4o在天生实在图像方面也表示杰出。同时,官方也自动表现:“咱们的模子并不完善。咱们认识到现在存在多种范围性,咱们将在初次宣布后经由过程模子改良来处理这些范围性。”现在GPT-4o图像天生还存在幻觉;裁剪不当;难以浮现非拉丁言语、字符可能禁绝确;编纂图像天生的特定局部(如拼写过错)的恳求并不老是无效的,也可能以未恳求的方法变动图像的其余局部或引入更多过错;别的,GPT-4bet356亚洲版本体育o模子难以坚持用户上传的人脸编纂的分歧性,但估计这将在一周内失掉修复。假如把同样的需要指令输入给现在海内的文生图APP们,它们的表示比拟GPT-4o又怎样呢?先看看多少个GPT-4o图像天生展现示例:示例1:图片中对笔墨的处置才能在ChatGPT 输入以下笔墨(中文内容为TechWeb翻译弥补内容):A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer’s reflection.(在鸟瞰海湾年夜桥的房间里,用手机拍摄了一张玻璃白板的年夜幅照片。视线中,一位女性正在写字,她衣着一件印有年夜型OpenAI标记的T恤。字迹看起来很天然,有点混乱,咱们看到了拍照师的倒影。)The text reads:(Left)(右边白板表现以下内容)“Transfer between Modalities:Suppose we directly modelp(text, pixels, sound) [equation]with one big autoregressive transformer.Pros:* image generation augmented with vast world knowledge* next-level text rendering* native in-context learning* unified post-training stackCons:* varying bit-rate across modalities* compute not adaptive”(Right)(左边白板表现一下内容))“Fixes:* model compressed representations* compose autoregressive prior with a powerful decoder”On the bottom right of the board, she draws a diagram:(在白板的右下角,她画了一张图:)“tokens - [transformer] - [diffusion] - pixels”终极,如下图,GPT-4o天生的图片中,白板上展现的笔墨内容完整正确!还能像持续剧一样,一边正确天生笔墨,一边变更人物举措。在ChatGPT 输入以下指令:selfie view of the photographer, as she turns around to high five him(拍照师转过身来向他击掌时的自照相)GPT-4o天生的图片中,第一张白板中的男子倒影跟第二张图也对应上了。示例2、让GPT-4o天生菜单,提醒词中除了须要包括的菜品、价钱及简介外,还须要天生的图像中包括这家餐厅的称号、重要亮点以及菜单作风。在ChatGPT 输入以下指令:I m opening a traditional concept restaurant in Marin called Haein. It focuses on Korean food cooked with organic, farm-fresh ingredients, with a rotating menu based on what s seasonal. I want you to design an image - a menu incorporating the following menu items - lean into the traditional/rustic style while keeping it feeling upscale and sleek. Please also include illustrations of each dish in an elegant, peter rabbit style. Make sure all the text is rendered correctly, with a white background.(Top)Doenjang Jjigae (Fermented Soybean Stew) – $18 House-made doenjang with local mushrooms, tofu, and seasonal vegetables served with rice.Galbi Jjim (Braised Short Ribs) – $34 Slow-braised local grass-fed beef ribs with pear and black garlic glaze, seasonal root vegetables, and jujube.Grilled Seasonal Fish – Market Price ($22-$30) Whole or fillet of local, sustainable fish grilled over charcoal, served with perilla leaf ssam and house-made sauces.Bibimbap – $19 Heirloom rice with a rotating selection of farm-fresh vegetables, house-fermented gochujang, and pasture-raised egg.Bossam (Heritage Pork Wraps) – $28 Slow-cooked pork belly with napa cabbage wraps, oyster kimchi, perilla, and seasonal condiments.(Bottom) Dessert Drinks Seasonal Makgeolli (Rice Wine) – $12/glassRotating flavors based on seasonal fruits and flowers (persimmon, citrus, elderflower, etc.).Hoddeok (Korean Sweet Pancake) – $9 Pan-fried cinnamon-stuffed pancake with black sesame ice cream.(我要在马林开一家名为Haein的传统观点餐厅。它专一于用无机农场新颖食材烹制的韩国食品,并依据节令轮换菜单。我盼望你计划一个抽象——一个包括以下菜单项的菜单——融入传统/城市作风,同时坚持高级跟时髦的感到。请以优雅的彼得兔作风附上每道菜的插图。确保全部文本都以白色配景准确浮现。(顶部)Doenjang Jjigae(发酵年夜豆炖菜)-18美元克己的Doenjiang,配以外地蘑菇、豆腐跟气节蔬菜,配以米饭。Galbi Jjim(红烧排骨)——34美元慢炖外地草饲牛肋骨,配梨跟黑蒜酱、气节根菜跟红枣。烤气节鱼——市场价钱(22-30美元)整条鱼或鱼片,用柴炭烤,配紫苏叶跟克己酱汁。Bibimbapg电子娱乐平台p——19美元的传家宝米,搭配农场新颖蔬菜、家庭发酵的gochujang跟牧场豢养的鸡蛋。Bossam(传统猪肉卷)——28美元慢炖五花肉,配纳帕卷心菜卷、牡蛎泡菜、紫苏跟气节调味品。(底部)气节Makgeolli(米酒)甜点跟饮料–12美元/杯以气节生果跟花草(柿子、柑橘、接骨木花等)为基本的扭转口胃。Hoddeok(韩国甜煎饼)-9美元的煎肉桂馅煎饼配黑芝麻冰淇淋。)GPT-4o天生的菜单如下:示例3、看看GPT-4o的图像可遵守具体的提醒,处置多达10-20个差别的工具的气力。在ChatGPT 输入以下指令:A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here’s the list:1. a blue star2. red triangle3. green square4. pink circle5. orange hourglass6. purple infinity sign7. black and white polka dot bowtie8. tiedye 皇冠新体育官网“42”9. an orange cat wearing a black baseball cap10. a map with a treasure chest11. a pair of googly eyes12. a thumbs up emoji13. a pair of scissors14. a blue and white giraffe15. the word “OpenAI” written in cursive16. a rainbow-colored lightning bolt(一个正方形图像,包括一个4行乘4列的网格,在白色配景上包括16个工具。从左到右,从上到下。以下是列表:1.一颗蓝色的星星2.白色三角形3.绿色广场4.粉白色圆圈5.橙色沙漏6.紫色无穷标记7.彩色圆点领结8.扎染“42”9.一只戴着玄色棒球帽的橙色猫10.带宝箱的舆图11.一双黏糊糊的眼睛12.竖起年夜拇指的心情标记13.一把铰剪14.一只蓝白相间的长颈鹿15.用草书誊写的单词“OpenAI”16.彩虹色的闪电)GPT-4o天生的图片如下:最后,假如把下面这些指令输入给现在海内的文生图APP们,它们的表示又怎样呢?这里,咱们用示例3的指令,分辨测试了文心一言(文心年夜模子4.5)、豆包APP。文心一言(文心年夜模子4.5)天生的4张图片之一豆包天生的4张图片之一现在看来,仍是有些差距。