揭开GPT2神秘面纱!


自然语言任务在近些年取得了相当耀眼的成就,正好今年的课程作业就是做一篇阅读笔记,借此机会简单给大家玩一下大名鼎鼎的GPT模型。


相信对GPT有一定了解的人都知道他是个巨无霸,之所以这样称呼他是因为该模型的参数达到了惊人的117M,到了最大的GPT2代已经达到1.5B的数据量。到了第三代的时候模型参数量已经够王健林实现1750个小目标,这显然是十分惊人的。时至今日OpenAI仍未开源,不过也是可以理解,毕竟你模型那么大,一般人也难忍一下。然而千呼万唤始出来,今年三月份有人复现了一个轻量级的(1.3B和1.7B),也算是圆了众多炼丹师一梦,感兴趣的小伙伴可以去体验一下。


本篇文章我们就谈一下GPT2的模型下载和使用,因为今天上午百度了半天也没看见有人复现代码的,自己搞了半天,疏通了一些bug,终于用上这个模型了。


开源代码地址:

https://github.com/openai/gpt-2

我用的anaconda配置的环境,可以说简单又方便,干净又卫生。注意要在Python3.6下配置TensorFlow==1.14.0,刚开始我直接安装了最新版本,果然用不了,老老实实换回最忠诚的版本后计划通。相关的配置方法在百度也很多。

用的IDE是pycharm,我很喜欢这个界面,而且安装库也很方便,直接安装代码中requirements要求的库就好了。

下一步是下载我们需要的model版本,在pycharm下方的终端中输入如下命令回车即可,如果你想下载更大的版本,替换124M为355M,744M,1.5B即可,我下载了最小的,即便如此也占用了约500MB的空间。

下载完成后你的文件夹里就会多出一个models文件夹里面包含了你下载的各种模型数据。之后我们把src里的文件拖出来就可以用了。

命令行输入:

python generate_unconditional_samples.py

即可让模型输出一堆奇奇怪怪的故事或新闻,属于是自己叽叽歪歪地说。

输入:

python interactive_conditional_samples.py

待程序准备好后,在你输入一个主题之后,程序根据你的输入来编故事或小作文。虽然编着编着就跑题了,但我还是觉得这个蛮有趣的。当然也可能是因为我用的小模型。 这里我分别运行两个程序,来看看效果。

第一个程序我跑了六个结果,这里选一个有趣的给大家展示一下。本来想用翻译软件翻译一下,但是错误有点多,又手工润色一遍。可以说这一篇欧美风味新闻把buff都叠满了,一边翻译一边笑!

文章及翻译如下:


About 1,000 protesters gathered in front of Interstate 35 in River North Wednesday night, sending bark to police over $30 million in student loans.

周三晚上,约有 1,000 名抗议者聚集在北溪的 35 号州际公路前,针对 3,000 万美元的学生贷款问题向警方怒吼。

Katelyn Farenthold, 26, cancelled Monday's protests and sat in a restaurant on I-35 that had so far got only customer service after she disrupted the protest.

26 岁的 Katelyn Farenthold 取消了周一的抗议活动,并坐在 I-35 州际公路上的一家餐馆里,在她扰乱抗议活动后,该餐馆迄今为止只有客户服务。

"This is about, like, a friendship business... I don't think my friends do much for the university if they go this way," said Farenthold, a second-year. But Dr. Ruth Penske, president of the College of American Physicians-Head and College of Physicians of ANZAC, said women who boycott federal healthcare, fire yardsticks at people and smoke into taxis and vans?

“这是关于,就像,友谊生意......如果我的朋友这样做,我认为他们不会为大学做多少,”二年级学生 Farenthold 说。 但是,ANZAC 美国医师学院院长兼医师学院院长 Ruth Penske 博士说,抵制联邦医疗保健、对人开火以及向出租车和货车吸烟的女性?(这段属于是语法错误了)

How could they lose to oncoming protesters? Not by all the efforts that men make to see government and investor banks and components of NATO as effective. According to the Center for Public Integrity, the backlash is stronger than protests have induced. "I would say the vengeance movement I feared was going over in this whole country started in Theresa May's care," said Postdoctoral Ph.D. candidate Dr. Aileen Helps.

他们怎么可能输给迎面而来的抗议者?并非所有人都努力将政府和投资银行以及北约的组成部分视为有效。 根据公共诚信中心的说法,强烈反对比抗议活动所引起的强烈。“我会说我害怕在整个国家蔓延的复仇运动是在特蕾莎·梅的照顾下开始的,”博士后候选人艾琳博士如是说。

Mementos of the auspicious days she had for her foundation? Countercultural poetry. A postcard she wanted to put a firearm on; a happy girl bag.

她为她的基金会所拥有的吉祥日子的纪念品?反文化诗歌。一张她想放枪的明信片;一个快乐的女孩包。

The protesters, who wore black-and-white caps and wreaths with communion swine to honor them and share the day of reckoning, said their support of what happened raised questions, grounded hopes but stern ones is enough to make them a faster national forward if the culture is emphasized enough and some stations leave their footprints.


抗议者戴着黑白帽子和带有圣餐猪的花圈来向他们表示敬意并分享清算日,他们说他们对所发生的事情的支持提出了问题,让希望落空,但严厉的希望足以使他们成为更快的国家前锋,如果文化被足够强调并在一些地方留下他们的足迹。

Progressives also highlight the law schools as socio-intellectual intellectual wings similar to the USA Freedom, the Bursary School for Law and Economics. Western universities are also bastions of education reform devoting innate gods like better math skills and more experience to young students. Education scholar Catherine Wong, a Bursary University of Puget Sound, thought of the invisible classes as "transformative learning structures" that provide graduates considerable academic independence by shaping their creativity and realizing medication and mental health.


进步人士还强调法学院是类似于美国自由法学院和经济学院的社会智力知识翼。 西方大学也是教育改革的堡垒,致力于提高年轻学生的数学技能和经验等天赋。普吉特海湾助学金大学教育学者 Catherine Wong 将隐形课程视为“变革性学习结构”,通过塑造他们的创造力并实现药物治疗和心理健康,为毕业生提供相当大的学术独立性。

Quinz said the eruption of protests her organization sees isn't simply a sign that people are coming out to protest. However, taking all the marching orders, creating or ordering different forms of resistance takes time, Preparatory OSAs can know what they want when they identify and tailor things to fit their context. Leaders and more importantly their students have to have absolute control over what their activists walk again.


奎兹说,她的组织看到的抗议活动的爆发不仅仅是人们出来抗议的迹象。然而,接受所有命令、组织或命令不同形式的抵抗需要时间,预防性OSA(安全事务处) 在识别和定制适合他们的东西时可以知道他们想要什么。领导者,更重要的是他们的学生必须对他们的激进分子再次行动的行为拥有绝对的控制权。

It took many years for the members of Indian Colonial youths in Raining Gorge and mills around Seattle to realize the power of the vanguard music to shape young minds through submitting critical statements and rallying newer students and members of their tribe to direct the chants.A century with the end of racial segregation shackles our collective legacy through our fine arts.

西雅图附近的雷宁峡谷和磨坊的印度殖民青年成员花了很多年才意识到先锋音乐的力量,通过提交批评性陈述和召集新学生和他们的部落成员来指挥歌谣来塑造年轻人的思想。 一个以种族隔离为末尾的世纪通过我们的美术束缚了我们的集体遗产。


本来想整个活,结果发现生成文章略有恶臭,还真是符合训练集的来源,也体现了外国媒体对中国的抹黑。怕不是BBC之流都是用生成器写文章吧!

Model prompt >>> CNN reported that the university of science and technology of China had excavated several tons of gold under the teaching building.

模型输入>>> CNN报道称,中国科学技术大学在教学楼下挖掘了数吨黄金。

"It is likely some gold was used here illegally," Yun Yun Zhu, the construction reconvener, told the media outlet. "It will be treated as having been given away illegally. There will be an accounting again on how much is being returned, and the university managers will be showing new rooms for that."

“这里很可能有一些黄金被非法使用,”建筑责任人朱云云告诉媒体。 “它将被视为非法赠送。将再次核算返还多少,大学管理人员将为此展示新房间。”

Yun said the deposit of rests according to tagnets was the result of a notorious informal venture between the super-rich and Chinese officials. Other details show that a building was also constructed at the top of several large storage malls in Jihong Autonomous Land (JAR). But in December last year based on first supporting evidence from industry experts it was ruled that no deposits were brought out of the warehouse.

云说,根据标签存放余额(这里指黄金)是超级富豪和中国官员之间臭名昭著的不正当勾当的结果。其他细节显示,在冀红自治区(JAR)的几个大型仓储商场的顶部还建造了一种建筑物。但在去年 12 月,根据行业专家的初步支持的证据,裁定没有存款被带出仓库。

"Imagine some amount of penal funds of some kind was brought into this. If this news gets out, people will wake up," he said. Wang-Hyun Lei, the building inspector of Georgia's CCTV state cement employees' collective, a headquarter for industrial and domestic quality, said, "I'm extremely disappointed about the mystery here. The information will be submitted to the video standard board. It is a very sophisticated quality control, but less efficient ... There will be pressure for tampering with perimeter accents slightly from precautions to seal the place. No other companies are involved," said Wang.

“想象一下,某种数额的罚款被带入其中。如果这个消息传出去,人们就会醒悟,”他说。佐治亚州的CCTV水泥职工集体(工业和国家质量总部)的建筑检查员雷王铉说:“我对这里的神秘感到非常失望。信息将提交给视频标准委员会。它是一个非常复杂的品质控制,但效率较低......从预防措施到密封地方会有轻微篡改周边口音的压力(不太懂啥玩意)。没有其他公司参与,“王说。

China is fascinated by the gold trappping at JAR and that supporters of a tomb rescue are unabashed in their support of the project. This week, the Assistant Secretary-General of the Zhaoquan Province appears to hold a senior position that no official books can fill. The buildings were designated TIGED 4 abandoned house, refused in 2011, while the TIGED 5 building was handed up in 2006.

中国对 JAR 的黄金陷阱很着迷,而且坟墓救援的支持者毫不掩饰地支持该项目。本周,肇泉省助理秘书长似乎担任了官方无法填补的高级职位。这些建筑物被指定为 TIGED 4 废弃的房子,2011 年被拒绝,而 TIGED 5 建筑于 2006 年交付。

Stay tuned to Scribner on Sina News.com for more news and photos.

STEAM: Fair Loops arrived over the site in Puttner Park on Thursday and is expected to both be completed shortly afterwards, the Eastern Hartford-based firm said.

敬请关注新浪新闻网站上的 Scribner 以获取更多新闻和照片。 STEAM:Fair Loops 周四抵达普特纳公园的现场,预计将在不久之后完成,这家总部位于东哈特福德的公司表示.

联系方式

  • 850466469
  • wtd2018@mail.ustc.edu.cn

编程学习