English
全部
搜索
图片
视频
地图
资讯
Copilot
更多
购物
航班
旅游
笔记本
Top stories
冬季运动会
Sports
U.S.
Local
World
Science
Technology
Entertainment
Business
More
Politics
过去 7 天
时间不限
过去 1 小时
过去 24 小时
过去 30 天
最佳匹配
最新
腾讯网
2 天
ICLR 2026 | 北航开源Code2Bench:双扩展动态评测,代码大模型告别躺平刷分
在衡量大语言模型(LLM)代码生成能力的竞赛中,一个日益严峻的问题正浮出水面:当模型在 HumanEval、MBPP 等经典基准上纷纷取得近乎饱和的成绩时,我们究竟是在评估其真实的泛化推理能力,还是在检验其对训练语料库的「记忆力」?
一些您可能无法访问的结果已被隐去。
显示无法访问的结果
今日热点
Nick Reiner pleads not guilty
US beats Canada in final
Judge blocks Smith’s report
NYC under travel ban
Faces murder trial
To hear Exxon, Suncor bid
Leaves game w/ back injury
Hundreds protest in Verona
Former Vikings DB dies
2 killed in Northlake fire
Major winter storm in US
Former WNBA All-Star dies
CDC deputy director quits
Re-elected to lead party
US issues security alert
Mugabe’s son charged in SA
Dutch govt. sworn in
ICC to hold pre-trial hearing
Nepal bus accident
NASA delays Moon mission
To buy US biotech Arcellx
Merck creating cancer unit
BAFTAs host, BBC apologize
Bomb explosions in Ukraine
FR to summon US envoy
PreCheck still operational
HK court rejects appeal
BAFTA Film Awards winners
Earthquake hits Malaysia
Iran, US to hold talks
反馈