SciKnowEval
Introduction
博学之 ,审问之 ,慎思之 ,明辨之 ,笃行之。
—— 《礼记 · 中庸》
The Scientific Knowledge Evaluation (SciKnowEval) benchmark for Large Language Models (LLMs) is inspired by the profound principles outlined in the “Doctrine of the Mean” from ancient Chinese philosophy. This benchmark is designed to assess LLMs based on their proficiency in Studying Extensively, Enquiring Earnestly, Thinking Profoundly, Discerning Clearly, and Practicing Assiduously. Each of these dimensions offers a unique perspective on evaluating the capabilities of LLMs in handling scientific knowledge.
L1: Studying Extensively
(Knowledge Coverage)
This dimension evaluates the breadth of an LLM’s knowledge across various scientific domains. It measures the model’s ability to remember and understand a wide range of scientific concepts.
L2: Enquiring Earnestly
(Knowledge Enquiry and Exploration)
This aspect focuses on the LLM’s capacity for deep enquiry and exploration within scientific contexts, such as analyzing scientific texts, identifying key concepts, and questioning relevant information.
L3: Thinking Profoundly
(Knowledge Reflection and Reasoning)
This criterion examines the model’s capacity for critical thinking, logical deduction, numerical calculation, function prediction, and the ability to engage in reflective reasoning to solve problems.
L4: Discerning Clearly
(Knowledge Discernment and Safety Assessment)
This aspect evaluates the LLM’s ability to make correct, secure, and ethical decisions based on scientific knowledge, including assessing the harmfulness and toxicity of information, and understanding the ethical implications and safety concerns related to scientific endeavors.
L5: Practicing Assiduously
(Knowledge Practice and Application)
The final dimension assesses the LLM’s capability to apply scientific knowledge effectively in real-world scenarios, such as analyzing complex scientific problems and creating innovative solutions.
SciKnowEval represents a comprehensive benchmark for assessing the capability of LLMs in processing and utilizing scientific knowledge. It aims to promote the development of scientific LLMs that not only possess extensive knowledge but also demonstrate ethical discernment and practical applicability, ultimately contributing to the advancement of scientific research.
Leaderboards
Last updated: 22 July, 2024
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Models | Overall | Biology | Chemistry | Material | Physics |
---|---|---|---|---|---|---|---|---|---|---|
331 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Claude-3.5-Sonnet-20240620 | 1 | 5.36 | 2.83 | 3.06 | 3.42 |
332 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | GPT-4o-2024-05-13 | 2 | 5.20 | 5.50 | 4.18 | 2.83 |
333 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Qwen2-72B-Inst | 3 | 8.12 | 7.17 | 4.88 | 3.17 |
334 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | GPT-4-Turbo-2024-04-09 | 4 | 8.48 | 7.17 | 5.53 | 5.17 |
335 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Gemini1.5-Pro-latest | 5 | 7.88 | 6.58 | 6.18 | 7.42 |
336 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Llama3-70B-Inst | 6 | 8.12 | 7.13 | 5.71 | 7.50 |
337 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Qwen-Max | 7 | 10.44 | 8.04 | 7.65 | 6.33 |
338 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Claude3-Sonnet-20240229 | 8 | 8.24 | 9.63 | 8.24 | 10.58 |
339 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | SciKnowMind-7b-v0.1 | 9 | 8.56 | 8.96 | 11.53 | 10.00 |
340 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Qwen2-7B-Inst | 10 | 12.48 | 13.63 | 12.06 | 9.17 |
341 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Qwen1.5-14B-Chat | 11 | 13.16 | 12.21 | 12.35 | 11.92 |
342 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | GPT-3.5-Turbo-0125 | 12 | 12.52 | 12.42 | 12.82 | 12.92 |
343 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Llama3-8B-Inst | 13 | 13.12 | 12.21 | 14.88 | 15.75 |
344 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | ChemDFM-13B | 14 | 15.72 | 12.67 | 15.00 | 16.33 |
345 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | ChemLLM-20B-Chat | 15 | 14.32 | 14.83 | 16.06 | 17.25 |
346 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | MolInst-Llama3-8B | 16 | 16.00 | 15.00 | 16.06 | 16.58 |
347 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Qwen1.5-7B-Chat | 17 | 15.36 | 15.29 | 18.12 | 18.17 |
348 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Gemma1.1-7B-Inst | 18 | 19.44 | 18.42 | 15.94 | 16.33 |
349 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Mistral-7B-Inst-v0.2 | 19 | 19.60 | 20.25 | 14.12 | 15.83 |
350 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | ChatGLM3-6B | 20 | 17.04 | 20.04 | 20.35 | 21.17 |
351 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Galactica-30B | 21 | 17.84 | 19.63 | 21.24 | 19.58 |
352 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Llama2-13B-Chat | 22 | 18.24 | 19.92 | 20.00 | 21.33 |
353 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | SciGLM-6B | 23 | 20.40 | 20.17 | 20.29 | 22.83 |
354 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | ChemLLM-7B-Chat | 24 | 19.60 | 20.79 | 22.53 | 20.25 |
355 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Galactica-6.7B | 25 | 21.28 | 21.04 | 22.47 | 21.33 |
356 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | LlaSMol-Mistral-7B | 26 | 22.16 | 22.17 | 23.47 | 24.17 |
Datasets
Last updated: 22 July, 2024
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Ability Level | Task Name | Task Type | Data Source | #Questions |
---|---|---|---|---|---|---|---|---|---|
1 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L1 | Biological Literature QA | MCQ | Literature Corpus | 14,869 |
2 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L1 | Protein Property Identification | MCQ | UniProtKB | 1,500 |
3 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Drug-Drug Relation Extraction | RE | Bohrium | 464 |
4 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Biomedical Judgment and Interpretation | T/F | PubMedQA | 904 |
5 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Compound-Disease Relation Extraction | RE | Bohrium | 867 |
6 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Gene-Disease Relation Extraction | RE | Bohrium | 203 |
7 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Detailed Understanding | MCQ | LibreTexts | 828 |
8 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Text Summary | GEN | LibreTexts | 1,291 |
9 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Hypothesis Verification | T/F | LibreTexts | 619 |
10 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Reasoning and Interpretation | MCQ | LibreTexts | 647 |
11 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | Solubility Prediction | MCQ | PEER, DeepSol | 201 |
12 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | $\beta$-lactamase Activity Prediction | MCQ | PEER, Envision | 209 |
13 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | Fluorescence Prediction | MCQ | PEER, Sarkisyan's | 205 |
14 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | GB1 Fitness Prediction | MCQ | PEER, FLIP | 201 |
15 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | Stability Prediction | MCQ | PEER, Rocklin's | 203 |
16 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | Protein-Protein Interaction | MCQ | STRING, SHS27K, SHS148K | 205 |
17 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | Biological Calculation | MCQ | MedMCQA, SciEval, MMLU | 60 |
18 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L4 | Biological Harmful QA | GEN | Self-generated | 297 |
19 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L4 | Proteotoxicity Prediction | MCQ, T/F | UniProtKB | 510 |
20 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L4 | Biological Laboratory Safety Test | MCQ, T/F | LabExam (ZJU) | 194 |
21 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L5 | Biological Protocol Procedure Design | GEN | Protocol Journal | 591 |
22 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L5 | Biological Protocol Reagent Design | GEN | Protocol Journal | 565 |
23 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L5 | Protein Captioning | GEN | UniProtKB | 937 |
24 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L5 | Protein Design | GEN | UniProtKB | 860 |
25 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L5 | Single Cell Analysis | GEN | SHARE-seq | 300 |
Task Scores
Last updated: 22 July, 2024
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Models | biology_literature_QA | protein_property_identification | drug_drug_relation_extraction | biomedical_judgment_and_interpretation | compound_disease_relation_extraction | gene_disease_relation_extraction | biological_detailed_understanding | biological_text_summary | biological_hypothesis_verification | biological_reasoning_and_interpretation | solubility_prediction | beta_lactamase_activity_prediction | fluorescence_prediction | GB1_ftness_prediction | stability_prediction | Protein_Protein_Interaction | biological_calculation | biological_harmful_QA | proteotoxicity_prediction | biological_laboratory_safety_test | biological_procedure_generation | biological_reagent_generation | protein_description_generation | protein_design | single_cell_analysis | molecule_name_conversion | molecular_property_prediction |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
52 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Claude-3.5-Sonnet-20240620 | 0.8415 | 0.3700 | 0.1307 | 0.9852 | 0.1691 | 0.3929 | 0.9952 | 4.8104 | 0.9547 | 0.9815 | 0.4686 | 0.5369 | 0.4975 | 0.2823 | 0.3088 | 0.3140 | 0.5833 | 0.9933 | 0.8235 | 0.8454 | 3.0711 | 2.5556 | 0.1298 | 0.0143 | 0.0197 | 0.8996 | 0.4363 |
53 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | GPT-4o-2024-05-13 | 0.8371 | 0.3420 | 0.1741 | 0.9546 | 0.3770 | 0.3620 | 0.9940 | 4.7701 | 0.9482 | 0.9769 | 0.4831 | 0.5025 | 0.5172 | 0.3254 | 0.2402 | 0.3527 | 0.5833 | 0.5556 | 0.8588 | 0.8667 | 3.1334 | 2.5333 | 0.1219 | 0.0098 | 0.0165 | 0.8665 | 0.4403 |
54 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen2-72B-Inst | 0.8151 | 0.4587 | 0.1555 | 0.9388 | 0.3477 | 0.3197 | 0.9940 | 4.7709 | 0.9385 | 0.9799 | 0.5459 | 0.4975 | 0.4975 | 0.2440 | 0.1520 | 0.2415 | 0.5000 | 0.9327 | 0.7529 | 0.8557 | 2.7782 | 2.2427 | 0.1103 | 0.0027 | 0.0074 | 0.6487 | 0.4055 |
55 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | GPT-4-Turbo-2024-04-09 | 0.8006 | 0.3173 | 0.0973 | 0.9134 | 0.2955 | 0.2359 | 0.9952 | 4.8104 | 0.9676 | 0.9784 | 0.5121 | 0.5123 | 0.4384 | 0.2919 | 0.1471 | 0.3333 | 0.5500 | 0.8721 | 0.7941 | 0.7667 | 2.6638 | 2.3418 | 0.1103 | 0.0074 | 0.0156 | 0.7787 | 0.3821 |
56 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Gemini1.5-Pro-latest | 0.8160 | 0.2740 | 0.1687 | 0.9483 | 0.3117 | 0.2382 | 0.9928 | 4.3197 | 0.9450 | 0.9753 | 0.5266 | 0.5123 | 0.5074 | 0.2919 | 0.2059 | 0.2609 | 0.5667 | 1.0000 | 0.7765 | 0.7444 | 2.7851 | 2.3929 | 0.0477 | 0.0039 | 0.0038 | 0.8360 | 0.3770 |
57 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Llama3-70B-Inst | 0.8047 | 0.2500 | 0.1720 | 0.9176 | 0.3669 | 0.4859 | 0.9928 | 4.7941 | 0.9417 | 0.9784 | 0.5121 | 0.5025 | 0.5025 | 0.2440 | 0.2745 | 0.1981 | 0.5000 | 0.9596 | 0.7686 | 0.7423 | 2.5078 | 2.4000 | 0.1140 | 0.0005 | 0.0138 | 0.7249 | 0.3883 |
58 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen-Max | 0.8050 | 0.4093 | 0.1227 | 0.9324 | 0.1372 | 0.0230 | 0.9915 | 4.8118 | 0.9385 | 0.9815 | 0.5024 | 0.5025 | 0.5025 | 0.1675 | 0.2892 | 0.3043 | 0.4167 | 0.9091 | 0.6314 | 0.8299 | 2.6944 | 2.2222 | 0.0745 | 0.0032 | 0.0007 | 0.6909 | 0.3698 |
59 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Claude3-Sonnet-20240229 | 0.7644 | 0.2793 | 0.1602 | 0.9620 | 0.3585 | 0.2918 | 0.9867 | 4.7670 | 0.9498 | 0.9660 | 0.4686 | 0.4384 | 0.5172 | 0.2440 | 0.3039 | 0.2754 | 0.3833 | 1.0000 | 0.4088 | 0.7111 | 2.8614 | 2.3707 | 0.1180 | 0.0082 | 0.0183 | 0.6846 | 0.3450 |
60 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | SciKnowMind-7b-v0.1 | 0.8309 | 0.7480 | 0.1529 | 0.9704 | 0.2672 | 0.2289 | 0.9819 | 4.6471 | 0.8883 | 0.9722 | 0.6860 | 0.4729 | 0.5911 | 0.8325 | 0.8039 | 0.6135 | 0.4167 | 0.0168 | 0.5157 | 0.8454 | 1.4506 | 1.0319 | 0.2009 | 0.0003 | 0.0918 | 0.8262 | 0.4843 |
61 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen2-7B-Inst | 0.7716 | 0.2827 | 0.1455 | 0.9155 | 0.2888 | 0.0779 | 0.9831 | 4.8599 | 0.9094 | 0.9630 | 0.5169 | 0.4877 | 0.5025 | 0.1962 | 0.2696 | 0.1498 | 0.4167 | 0.5488 | 0.5569 | 0.8402 | 2.2010 | 1.9573 | 0.1002 | 0.0003 | 0.0014 | 0.4113 | 0.3218 |
62 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen1.5-14B-Chat | 0.7466 | 0.3407 | 0.1256 | 0.8701 | 0.2161 | 0.0535 | 0.9879 | 4.5967 | 0.8932 | 0.9645 | 0.4879 | 0.4926 | 0.5222 | 0.3445 | 0.2598 | 0.1498 | 0.3333 | 0.4916 | 0.3971 | 0.6667 | 2.3102 | 2.2171 | 0.1081 | 0.0002 | 0.0073 | 0.4489 | 0.3406 |
63 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | GPT-3.5-Turbo-0125 | 0.7667 | 0.4013 | 0.1536 | 0.9187 | 0.2370 | 0.2757 | 0.9771 | 4.7291 | 0.8900 | 0.9552 | 0.4686 | 0.4778 | 0.5025 | 0.2057 | 0.2647 | 0.2560 | 0.3167 | 0.9764 | 0.4706 | 0.7111 | 2.1698 | 2.1060 | 0.0693 | 0.0027 | 0.0054 | 0.4767 | 0.3278 |
64 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Llama3-8B-Inst | 0.7482 | 0.2560 | 0.1675 | 0.8912 | 0.2890 | 0.3916 | 0.9879 | 4.5944 | 0.9045 | 0.9676 | 0.4928 | 0.5025 | 0.4975 | 0.1914 | 0.2451 | 0.2126 | 0.3333 | 0.9832 | 0.3824 | 0.6444 | 2.1906 | 2.2393 | 0.1120 | 0.0001 | 0.0006 | 0.5538 | 0.3636 |
65 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChemDFM-13B | 0.7187 | 0.3693 | 0.1488 | 0.8448 | 0.2293 | 0.1234 | 0.9831 | 4.3676 | 0.8835 | 0.9460 | 0.4686 | 0.5025 | 0.4975 | 0.3062 | 0.2500 | 0.2222 | 0.2667 | 0.7475 | 0.3794 | 0.6444 | 1.8856 | 1.6564 | 0.0906 | 0.0001 | 0.0032 | 0.6353 | 0.3527 |
66 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChemLLM-20B-Chat | 0.6746 | 0.2540 | 0.1606 | 0.9641 | 0.2292 | 0.2740 | 0.9879 | 4.5875 | 0.9061 | 0.9552 | 0.5217 | 0.5025 | 0.4975 | 0.2679 | 0.2549 | 0.2367 | 0.2500 | 0.0337 | 0.3353 | 0.5667 | 2.0312 | 1.6003 | 0.0737 | 0.0002 | 0.0021 | 0.5717 | 0.3214 |
67 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | MolInst-Llama3-8B | 0.7282 | 0.2053 | 0.1486 | 0.9271 | 0.5026 | 0.1038 | 0.9650 | 2.9574 | 0.8948 | 0.9321 | 0.5362 | 0.4778 | 0.4483 | 0.2823 | 0.2892 | 0.2560 | 0.4167 | 0.1515 | 0.3706 | 0.6556 | 1.1317 | 1.0272 | 0.0091 | 0.0001 | 0.0014 | 0.5690 | 0.3495 |
68 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen1.5-7B-Chat | 0.7206 | 0.2640 | 0.1693 | 0.8733 | 0.2127 | 0.0874 | 0.9758 | 4.6200 | 0.8560 | 0.9522 | 0.4928 | 0.2759 | 0.4631 | 0.1340 | 0.2255 | 0.2222 | 0.3667 | 0.5488 | 0.3971 | 0.6778 | 2.2340 | 2.0884 | 0.1004 | 0.0004 | 0.0033 | 0.4005 | 0.3495 |
69 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Gemma1.1-7B-Inst | 0.4386 | 0.2520 | 0.0110 | 0.9060 | 0.0452 | 0.0078 | 0.1775 | 4.0495 | 0.8689 | 0.3858 | 0.5604 | 0.5025 | 0.4975 | 0.2249 | 0.2206 | 0.1546 | 0.3167 | 0.8687 | 0.0618 | 0.4889 | 2.2582 | 1.9609 | 0.0967 | 0.0000 | 0.0000 | 0.3737 | 0.2952 |
70 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Mistral-7B-Inst-v0.2 | 0.7136 | 0.2627 | 0.1024 | 0.0116 | 0.2613 | 0.2514 | 0.9710 | 3.6176 | 0.1424 | 0.9599 | 0.5459 | 0.4483 | 0.1872 | 0.1435 | 0.1176 | 0.1159 | 0.3500 | 0.3266 | 0.3735 | 0.5889 | 1.0289 | 1.0359 | 0.0156 | 0.0000 | 0.0015 | 0.3871 | 0.3003 |
71 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChatGLM3-6B | 0.6400 | 0.2627 | 0.1304 | 0.7054 | 0.1906 | 0.1355 | 0.9396 | 4.2353 | 0.8074 | 0.8966 | 0.4734 | 0.5025 | 0.5123 | 0.2440 | 0.2304 | 0.2222 | 0.3167 | 0.7542 | 0.3147 | 0.6667 | 1.5303 | 1.4813 | 0.1096 | 0.0001 | 0.0038 | 0.2778 | 0.3195 |
72 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Galactica-30B | 0.7294 | 0.2480 | 0.1104 | 0.8817 | 0.2107 | 0.3066 | 0.9408 | 1.9133 | 0.6084 | 0.9090 | 0.5700 | 0.4975 | 0.5025 | 0.2488 | 0.2549 | 0.2609 | 0.2667 | 0.0000 | 0.2588 | 0.5778 | 1.0260 | 1.1077 | 0.0171 | 0.0001 | 0.0018 | 0.4364 | 0.3936 |
73 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Llama2-13B-Chat | 0.4985 | 0.1687 | 0.0923 | 0.8110 | 0.2662 | 0.3685 | 0.9312 | 4.3963 | 0.8722 | 0.9336 | 0.4734 | 0.0000 | 0.4286 | 0.0909 | 0.2500 | 0.1884 | 0.2167 | 0.9865 | 0.3735 | 0.4667 | 1.8839 | 1.9026 | 0.1160 | 0.0001 | 0.0043 | 0.2437 | 0.2594 |
74 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | SciGLM-6B | 0.6165 | 0.2220 | 0.0647 | 0.2608 | 0.1334 | 0.1265 | 0.9601 | 2.8947 | 0.6197 | 0.9167 | 0.4686 | 0.6158 | 0.2069 | 0.1627 | 0.2647 | 0.2077 | 0.2167 | 0.4343 | 0.2206 | 0.6111 | 1.0855 | 1.0940 | 0.0652 | 0.0001 | 0.0008 | 0.4471 | 0.2038 |
75 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChemLLM-7B-Chat | 0.6096 | 0.2093 | 0.1141 | 0.9261 | 0.1080 | 0.1151 | 0.9287 | 3.6246 | 0.8204 | 0.9090 | 0.4879 | 0.4975 | 0.5025 | 0.2201 | 0.2353 | 0.2560 | 0.2833 | 0.0135 | 0.3088 | 0.5222 | 1.1478 | 1.1721 | 0.0567 | 0.0000 | 0.0000 | 0.3405 | 0.3176 |
76 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Galactica-6.7B | 0.6058 | 0.2053 | 0.1017 | 0.8501 | 0.2416 | 0.0743 | 0.8237 | 1.2765 | 0.5777 | 0.7855 | 0.4831 | 0.4975 | 0.4828 | 0.2201 | 0.2745 | 0.2415 | 0.2333 | 0.0034 | 0.2382 | 0.3556 | 1.0017 | 1.0120 | 0.0333 | 0.0001 | 0.0004 | 0.2706 | 0.3610 |
77 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | LlaSMol-Mistral-7B | 0.3980 | 0.1753 | 0.1318 | 0.5871 | 0.0902 | 0.1255 | 0.7186 | 2.5834 | 0.4223 | 0.7978 | 0.1836 | 0.0197 | 0.0049 | 0.1914 | 0.3725 | 0.2415 | 0.3167 | 0.0000 | 0.0853 | 0.2889 | 1.1361 | 1.0000 | 0.0616 | 0.0000 | 0.0007 | 0.2742 | 0.1891 |
Submission
Upload your results <json file>
FAQ
The dataset can be found in our GitHub.
The examples of results can be found in our GitHub.
The evaluation will take some time after submission. Please be patient, usually less than a week.
Please contact Mr. Junjie Huang, junjie6282@zju.edu.cn