mirror of
https://github.com/Fishwaldo/linux-bl808.git
synced 2025-06-17 20:25:19 +00:00
It's a somewhat calmer cycle for docs this time, as the churn of the mass
RST conversion is happily mostly behind us.
- A new document on reproducible builds.
- We finally got around to zapping the documentation for hardware support
that was removed in 2004; one doesn't want to rush these things.
- The usual assortment of fixes, typo corrections, etc.
You'll still find a handful of annoying conflicts against other trees,
mostly tied to the last RST conversions; resolutions are straightforward
and the linux-next ones are good.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl1/J4IACgkQF0NaE2wM
flhYogf9EgYozCe8RocSq+JjJpZOSFjIGDQv+GwTjOBIdqgO9tSIaY/p0wSkYKil
jYXyMDF+Xwr8podsUep2F7akBM7j9XJ+XBGJcfOna0ypC9xoejMgWt9fU3YvaWge
dQJxIQ/iwkDlKNx6uOYgKysLUWFS0EP/nzPhqBo4bZZzhugvrR46D/nQqFNmGihd
l9yLalJtP5mC0XRUv3hpdAFFFKxdC0R3BGOel2V+slSClp0LEgpdMAuMaKydEDI3
Ch9ZpIp8fB8kqONCs9/X6083WRsDOMe28KgeGrGHo4Jla6u51QBLQjSVKttFv7xk
051yNJwDWMxgl+A4gyNLDPXM7Gd7HQ==
=v4dp
-----END PGP SIGNATURE-----
Merge tag 'docs-5.4' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It's a somewhat calmer cycle for docs this time, as the churn of the
mass RST conversion is happily mostly behind us.
- A new document on reproducible builds.
- We finally got around to zapping the documentation for hardware
support that was removed in 2004; one doesn't want to rush these
things.
- The usual assortment of fixes, typo corrections, etc"
* tag 'docs-5.4' of git://git.lwn.net/linux: (67 commits)
Documentation: kbuild: Add document about reproducible builds
docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]
Documentation: Add "earlycon=sbi" to the admin guide
doc🔒 remove reference to clever use of read-write lock
devices.txt: improve entry for comedi (char major 98)
docs: mtd: Update spi nor reference driver
doc: arm64: fix grammar dtb placed in no attributes region
Documentation: sysrq: don't recommend 'S' 'U' before 'B'
mailmap: Update email address for Quentin Perret
docs: ftrace: clarify when tracing is disabled by the trace file
docs: process: fix broken link
Documentation/arm/samsung-s3c24xx: Remove stray U+FEFF character to fix title
Documentation/arm/sa1100/assabet: Fix 'make assabet_defconfig' command
Documentation/arm/sa1100: Remove some obsolete documentation
docs/zh_CN: update Chinese howto.rst for latexdocs making
Documentation: virt: Fix broken reference to virt tree's index
docs: Fix typo on pull requests guide
kernel-doc: Allow anonymous enum
Documentation: sphinx: Don't parse socket() as identifier reference
Documentation: sphinx: Add missing comma to list of strings
...
This commit is contained in:
commit
7c672abc12
249 changed files with 5159 additions and 3953 deletions
|
@ -47,7 +47,7 @@ This book adds some notes about PXA DMA
|
|||
|
||||
pxa_dma
|
||||
|
||||
.. only:: subproject
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
|
|
@ -65,6 +65,7 @@ available subsections can be seen below.
|
|||
dmaengine/index
|
||||
slimbus
|
||||
soundwire/index
|
||||
thermal/index
|
||||
fpga/index
|
||||
acpi/index
|
||||
backlight/lp855x-driver.rst
|
||||
|
@ -75,6 +76,7 @@ available subsections can be seen below.
|
|||
dell_rbu
|
||||
edid
|
||||
eisa
|
||||
ipmb
|
||||
isa
|
||||
isapnp
|
||||
generic-counter
|
||||
|
|
|
@ -83,7 +83,7 @@ Instantiate the device
|
|||
----------------------
|
||||
|
||||
After loading the driver, you can instantiate the device as
|
||||
described in 'Documentation/i2c/instantiating-devices'.
|
||||
described in 'Documentation/i2c/instantiating-devices.rst'.
|
||||
If you have multiple BMCs, each connected to your Satellite MC via
|
||||
a different I2C bus, you can instantiate a device for each of
|
||||
those BMCs.
|
||||
|
|
|
@ -59,7 +59,7 @@ Part III - How can drivers use the framework?
|
|||
|
||||
The main API is spi_nor_scan(). Before you call the hook, a driver should
|
||||
initialize the necessary fields for spi_nor{}. Please see
|
||||
drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to fsl-quadspi.c
|
||||
drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to spi-fsl-qspi.c
|
||||
when you want to write a new driver for a SPI NOR controller.
|
||||
Another API is spi_nor_restore(), this is used to restore the status of SPI
|
||||
flash chip such as addressing mode. Call it whenever detach the driver from
|
||||
|
|
|
@ -10,7 +10,7 @@ SoundWire Documentation
|
|||
error_handling
|
||||
locking
|
||||
|
||||
.. only:: subproject
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
|
107
Documentation/driver-api/thermal/cpu-cooling-api.rst
Normal file
107
Documentation/driver-api/thermal/cpu-cooling-api.rst
Normal file
|
@ -0,0 +1,107 @@
|
|||
=======================
|
||||
CPU cooling APIs How To
|
||||
=======================
|
||||
|
||||
Written by Amit Daniel Kachhap <amit.kachhap@linaro.org>
|
||||
|
||||
Updated: 6 Jan 2015
|
||||
|
||||
Copyright (c) 2012 Samsung Electronics Co., Ltd(http://www.samsung.com)
|
||||
|
||||
0. Introduction
|
||||
===============
|
||||
|
||||
The generic cpu cooling(freq clipping) provides registration/unregistration APIs
|
||||
to the caller. The binding of the cooling devices to the trip point is left for
|
||||
the user. The registration APIs returns the cooling device pointer.
|
||||
|
||||
1. cpu cooling APIs
|
||||
===================
|
||||
|
||||
1.1 cpufreq registration/unregistration APIs
|
||||
--------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
struct thermal_cooling_device
|
||||
*cpufreq_cooling_register(struct cpumask *clip_cpus)
|
||||
|
||||
This interface function registers the cpufreq cooling device with the name
|
||||
"thermal-cpufreq-%x". This api can support multiple instances of cpufreq
|
||||
cooling devices.
|
||||
|
||||
clip_cpus:
|
||||
cpumask of cpus where the frequency constraints will happen.
|
||||
|
||||
::
|
||||
|
||||
struct thermal_cooling_device
|
||||
*of_cpufreq_cooling_register(struct cpufreq_policy *policy)
|
||||
|
||||
This interface function registers the cpufreq cooling device with
|
||||
the name "thermal-cpufreq-%x" linking it with a device tree node, in
|
||||
order to bind it via the thermal DT code. This api can support multiple
|
||||
instances of cpufreq cooling devices.
|
||||
|
||||
policy:
|
||||
CPUFreq policy.
|
||||
|
||||
|
||||
::
|
||||
|
||||
void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
|
||||
|
||||
This interface function unregisters the "thermal-cpufreq-%x" cooling device.
|
||||
|
||||
cdev: Cooling device pointer which has to be unregistered.
|
||||
|
||||
2. Power models
|
||||
===============
|
||||
|
||||
The power API registration functions provide a simple power model for
|
||||
CPUs. The current power is calculated as dynamic power (static power isn't
|
||||
supported currently). This power model requires that the operating-points of
|
||||
the CPUs are registered using the kernel's opp library and the
|
||||
`cpufreq_frequency_table` is assigned to the `struct device` of the
|
||||
cpu. If you are using CONFIG_CPUFREQ_DT then the
|
||||
`cpufreq_frequency_table` should already be assigned to the cpu
|
||||
device.
|
||||
|
||||
The dynamic power consumption of a processor depends on many factors.
|
||||
For a given processor implementation the primary factors are:
|
||||
|
||||
- The time the processor spends running, consuming dynamic power, as
|
||||
compared to the time in idle states where dynamic consumption is
|
||||
negligible. Herein we refer to this as 'utilisation'.
|
||||
- The voltage and frequency levels as a result of DVFS. The DVFS
|
||||
level is a dominant factor governing power consumption.
|
||||
- In running time the 'execution' behaviour (instruction types, memory
|
||||
access patterns and so forth) causes, in most cases, a second order
|
||||
variation. In pathological cases this variation can be significant,
|
||||
but typically it is of a much lesser impact than the factors above.
|
||||
|
||||
A high level dynamic power consumption model may then be represented as::
|
||||
|
||||
Pdyn = f(run) * Voltage^2 * Frequency * Utilisation
|
||||
|
||||
f(run) here represents the described execution behaviour and its
|
||||
result has a units of Watts/Hz/Volt^2 (this often expressed in
|
||||
mW/MHz/uVolt^2)
|
||||
|
||||
The detailed behaviour for f(run) could be modelled on-line. However,
|
||||
in practice, such an on-line model has dependencies on a number of
|
||||
implementation specific processor support and characterisation
|
||||
factors. Therefore, in initial implementation that contribution is
|
||||
represented as a constant coefficient. This is a simplification
|
||||
consistent with the relative contribution to overall power variation.
|
||||
|
||||
In this simplified representation our model becomes::
|
||||
|
||||
Pdyn = Capacitance * Voltage^2 * Frequency * Utilisation
|
||||
|
||||
Where `capacitance` is a constant that represents an indicative
|
||||
running time dynamic power coefficient in fundamental units of
|
||||
mW/MHz/uVolt^2. Typical values for mobile CPUs might lie in range
|
||||
from 100 to 500. For reference, the approximate values for the SoC in
|
||||
ARM's Juno Development Platform are 530 for the Cortex-A57 cluster and
|
||||
140 for the Cortex-A53 cluster.
|
90
Documentation/driver-api/thermal/exynos_thermal.rst
Normal file
90
Documentation/driver-api/thermal/exynos_thermal.rst
Normal file
|
@ -0,0 +1,90 @@
|
|||
========================
|
||||
Kernel driver exynos_tmu
|
||||
========================
|
||||
|
||||
Supported chips:
|
||||
|
||||
* ARM SAMSUNG EXYNOS4, EXYNOS5 series of SoC
|
||||
|
||||
Datasheet: Not publicly available
|
||||
|
||||
Authors: Donggeun Kim <dg77.kim@samsung.com>
|
||||
Authors: Amit Daniel <amit.daniel@samsung.com>
|
||||
|
||||
TMU controller Description:
|
||||
---------------------------
|
||||
|
||||
This driver allows to read temperature inside SAMSUNG EXYNOS4/5 series of SoC.
|
||||
|
||||
The chip only exposes the measured 8-bit temperature code value
|
||||
through a register.
|
||||
Temperature can be taken from the temperature code.
|
||||
There are three equations converting from temperature to temperature code.
|
||||
|
||||
The three equations are:
|
||||
1. Two point trimming::
|
||||
|
||||
Tc = (T - 25) * (TI2 - TI1) / (85 - 25) + TI1
|
||||
|
||||
2. One point trimming::
|
||||
|
||||
Tc = T + TI1 - 25
|
||||
|
||||
3. No trimming::
|
||||
|
||||
Tc = T + 50
|
||||
|
||||
Tc:
|
||||
Temperature code, T: Temperature,
|
||||
TI1:
|
||||
Trimming info for 25 degree Celsius (stored at TRIMINFO register)
|
||||
Temperature code measured at 25 degree Celsius which is unchanged
|
||||
TI2:
|
||||
Trimming info for 85 degree Celsius (stored at TRIMINFO register)
|
||||
Temperature code measured at 85 degree Celsius which is unchanged
|
||||
|
||||
TMU(Thermal Management Unit) in EXYNOS4/5 generates interrupt
|
||||
when temperature exceeds pre-defined levels.
|
||||
The maximum number of configurable threshold is five.
|
||||
The threshold levels are defined as follows::
|
||||
|
||||
Level_0: current temperature > trigger_level_0 + threshold
|
||||
Level_1: current temperature > trigger_level_1 + threshold
|
||||
Level_2: current temperature > trigger_level_2 + threshold
|
||||
Level_3: current temperature > trigger_level_3 + threshold
|
||||
|
||||
The threshold and each trigger_level are set
|
||||
through the corresponding registers.
|
||||
|
||||
When an interrupt occurs, this driver notify kernel thermal framework
|
||||
with the function exynos_report_trigger.
|
||||
Although an interrupt condition for level_0 can be set,
|
||||
it can be used to synchronize the cooling action.
|
||||
|
||||
TMU driver description:
|
||||
-----------------------
|
||||
|
||||
The exynos thermal driver is structured as::
|
||||
|
||||
Kernel Core thermal framework
|
||||
(thermal_core.c, step_wise.c, cpu_cooling.c)
|
||||
^
|
||||
|
|
||||
|
|
||||
TMU configuration data -----> TMU Driver <----> Exynos Core thermal wrapper
|
||||
(exynos_tmu_data.c) (exynos_tmu.c) (exynos_thermal_common.c)
|
||||
(exynos_tmu_data.h) (exynos_tmu.h) (exynos_thermal_common.h)
|
||||
|
||||
a) TMU configuration data:
|
||||
This consist of TMU register offsets/bitfields
|
||||
described through structure exynos_tmu_registers. Also several
|
||||
other platform data (struct exynos_tmu_platform_data) members
|
||||
are used to configure the TMU.
|
||||
b) TMU driver:
|
||||
This component initialises the TMU controller and sets different
|
||||
thresholds. It invokes core thermal implementation with the call
|
||||
exynos_report_trigger.
|
||||
c) Exynos Core thermal wrapper:
|
||||
This provides 3 wrapper function to use the
|
||||
Kernel core thermal framework. They are exynos_unregister_thermal,
|
||||
exynos_register_thermal and exynos_report_trigger.
|
|
@ -0,0 +1,61 @@
|
|||
=====================
|
||||
Exynos Emulation Mode
|
||||
=====================
|
||||
|
||||
Copyright (C) 2012 Samsung Electronics
|
||||
|
||||
Written by Jonghwa Lee <jonghwa3.lee@samsung.com>
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal
|
||||
management unit. Thermal emulation mode supports software debug for
|
||||
TMU's operation. User can set temperature manually with software code
|
||||
and TMU will read current temperature from user value not from sensor's
|
||||
value.
|
||||
|
||||
Enabling CONFIG_THERMAL_EMULATION option will make this support
|
||||
available. When it's enabled, sysfs node will be created as
|
||||
/sys/devices/virtual/thermal/thermal_zone'zone id'/emul_temp.
|
||||
|
||||
The sysfs node, 'emul_node', will contain value 0 for the initial state.
|
||||
When you input any temperature you want to update to sysfs node, it
|
||||
automatically enable emulation mode and current temperature will be
|
||||
changed into it.
|
||||
|
||||
(Exynos also supports user changeable delay time which would be used to
|
||||
delay of changing temperature. However, this node only uses same delay
|
||||
of real sensing time, 938us.)
|
||||
|
||||
Exynos emulation mode requires synchronous of value changing and
|
||||
enabling. It means when you want to update the any value of delay or
|
||||
next temperature, then you have to enable emulation mode at the same
|
||||
time. (Or you have to keep the mode enabling.) If you don't, it fails to
|
||||
change the value to updated one and just use last succeessful value
|
||||
repeatedly. That's why this node gives users the right to change
|
||||
termerpature only. Just one interface makes it more simply to use.
|
||||
|
||||
Disabling emulation mode only requires writing value 0 to sysfs node.
|
||||
|
||||
::
|
||||
|
||||
|
||||
TEMP 120 |
|
||||
|
|
||||
100 |
|
||||
|
|
||||
80 |
|
||||
| +-----------
|
||||
60 | | |
|
||||
| +-------------| |
|
||||
40 | | | |
|
||||
| | | |
|
||||
20 | | | +----------
|
||||
| | | | |
|
||||
0 |______________|_____________|__________|__________|_________
|
||||
A A A A TIME
|
||||
|<----->| |<----->| |<----->| |
|
||||
| 938us | | | | | |
|
||||
emulation : 0 50 | 70 | 20 | 0
|
||||
current temp: sensor 50 70 20 sensor
|
18
Documentation/driver-api/thermal/index.rst
Normal file
18
Documentation/driver-api/thermal/index.rst
Normal file
|
@ -0,0 +1,18 @@
|
|||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======
|
||||
Thermal
|
||||
=======
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
cpu-cooling-api
|
||||
sysfs-api
|
||||
power_allocator
|
||||
|
||||
exynos_thermal
|
||||
exynos_thermal_emulation
|
||||
intel_powerclamp
|
||||
nouveau_thermal
|
||||
x86_pkg_temperature_thermal
|
320
Documentation/driver-api/thermal/intel_powerclamp.rst
Normal file
320
Documentation/driver-api/thermal/intel_powerclamp.rst
Normal file
|
@ -0,0 +1,320 @@
|
|||
=======================
|
||||
Intel Powerclamp Driver
|
||||
=======================
|
||||
|
||||
By:
|
||||
- Arjan van de Ven <arjan@linux.intel.com>
|
||||
- Jacob Pan <jacob.jun.pan@linux.intel.com>
|
||||
|
||||
.. Contents:
|
||||
|
||||
(*) Introduction
|
||||
- Goals and Objectives
|
||||
|
||||
(*) Theory of Operation
|
||||
- Idle Injection
|
||||
- Calibration
|
||||
|
||||
(*) Performance Analysis
|
||||
- Effectiveness and Limitations
|
||||
- Power vs Performance
|
||||
- Scalability
|
||||
- Calibration
|
||||
- Comparison with Alternative Techniques
|
||||
|
||||
(*) Usage and Interfaces
|
||||
- Generic Thermal Layer (sysfs)
|
||||
- Kernel APIs (TBD)
|
||||
|
||||
INTRODUCTION
|
||||
============
|
||||
|
||||
Consider the situation where a system’s power consumption must be
|
||||
reduced at runtime, due to power budget, thermal constraint, or noise
|
||||
level, and where active cooling is not preferred. Software managed
|
||||
passive power reduction must be performed to prevent the hardware
|
||||
actions that are designed for catastrophic scenarios.
|
||||
|
||||
Currently, P-states, T-states (clock modulation), and CPU offlining
|
||||
are used for CPU throttling.
|
||||
|
||||
On Intel CPUs, C-states provide effective power reduction, but so far
|
||||
they’re only used opportunistically, based on workload. With the
|
||||
development of intel_powerclamp driver, the method of synchronizing
|
||||
idle injection across all online CPU threads was introduced. The goal
|
||||
is to achieve forced and controllable C-state residency.
|
||||
|
||||
Test/Analysis has been made in the areas of power, performance,
|
||||
scalability, and user experience. In many cases, clear advantage is
|
||||
shown over taking the CPU offline or modulating the CPU clock.
|
||||
|
||||
|
||||
THEORY OF OPERATION
|
||||
===================
|
||||
|
||||
Idle Injection
|
||||
--------------
|
||||
|
||||
On modern Intel processors (Nehalem or later), package level C-state
|
||||
residency is available in MSRs, thus also available to the kernel.
|
||||
|
||||
These MSRs are::
|
||||
|
||||
#define MSR_PKG_C2_RESIDENCY 0x60D
|
||||
#define MSR_PKG_C3_RESIDENCY 0x3F8
|
||||
#define MSR_PKG_C6_RESIDENCY 0x3F9
|
||||
#define MSR_PKG_C7_RESIDENCY 0x3FA
|
||||
|
||||
If the kernel can also inject idle time to the system, then a
|
||||
closed-loop control system can be established that manages package
|
||||
level C-state. The intel_powerclamp driver is conceived as such a
|
||||
control system, where the target set point is a user-selected idle
|
||||
ratio (based on power reduction), and the error is the difference
|
||||
between the actual package level C-state residency ratio and the target idle
|
||||
ratio.
|
||||
|
||||
Injection is controlled by high priority kernel threads, spawned for
|
||||
each online CPU.
|
||||
|
||||
These kernel threads, with SCHED_FIFO class, are created to perform
|
||||
clamping actions of controlled duty ratio and duration. Each per-CPU
|
||||
thread synchronizes its idle time and duration, based on the rounding
|
||||
of jiffies, so accumulated errors can be prevented to avoid a jittery
|
||||
effect. Threads are also bound to the CPU such that they cannot be
|
||||
migrated, unless the CPU is taken offline. In this case, threads
|
||||
belong to the offlined CPUs will be terminated immediately.
|
||||
|
||||
Running as SCHED_FIFO and relatively high priority, also allows such
|
||||
scheme to work for both preemptable and non-preemptable kernels.
|
||||
Alignment of idle time around jiffies ensures scalability for HZ
|
||||
values. This effect can be better visualized using a Perf timechart.
|
||||
The following diagram shows the behavior of kernel thread
|
||||
kidle_inject/cpu. During idle injection, it runs monitor/mwait idle
|
||||
for a given "duration", then relinquishes the CPU to other tasks,
|
||||
until the next time interval.
|
||||
|
||||
The NOHZ schedule tick is disabled during idle time, but interrupts
|
||||
are not masked. Tests show that the extra wakeups from scheduler tick
|
||||
have a dramatic impact on the effectiveness of the powerclamp driver
|
||||
on large scale systems (Westmere system with 80 processors).
|
||||
|
||||
::
|
||||
|
||||
CPU0
|
||||
____________ ____________
|
||||
kidle_inject/0 | sleep | mwait | sleep |
|
||||
_________| |________| |_______
|
||||
duration
|
||||
CPU1
|
||||
____________ ____________
|
||||
kidle_inject/1 | sleep | mwait | sleep |
|
||||
_________| |________| |_______
|
||||
^
|
||||
|
|
||||
|
|
||||
roundup(jiffies, interval)
|
||||
|
||||
Only one CPU is allowed to collect statistics and update global
|
||||
control parameters. This CPU is referred to as the controlling CPU in
|
||||
this document. The controlling CPU is elected at runtime, with a
|
||||
policy that favors BSP, taking into account the possibility of a CPU
|
||||
hot-plug.
|
||||
|
||||
In terms of dynamics of the idle control system, package level idle
|
||||
time is considered largely as a non-causal system where its behavior
|
||||
cannot be based on the past or current input. Therefore, the
|
||||
intel_powerclamp driver attempts to enforce the desired idle time
|
||||
instantly as given input (target idle ratio). After injection,
|
||||
powerclamp monitors the actual idle for a given time window and adjust
|
||||
the next injection accordingly to avoid over/under correction.
|
||||
|
||||
When used in a causal control system, such as a temperature control,
|
||||
it is up to the user of this driver to implement algorithms where
|
||||
past samples and outputs are included in the feedback. For example, a
|
||||
PID-based thermal controller can use the powerclamp driver to
|
||||
maintain a desired target temperature, based on integral and
|
||||
derivative gains of the past samples.
|
||||
|
||||
|
||||
|
||||
Calibration
|
||||
-----------
|
||||
During scalability testing, it is observed that synchronized actions
|
||||
among CPUs become challenging as the number of cores grows. This is
|
||||
also true for the ability of a system to enter package level C-states.
|
||||
|
||||
To make sure the intel_powerclamp driver scales well, online
|
||||
calibration is implemented. The goals for doing such a calibration
|
||||
are:
|
||||
|
||||
a) determine the effective range of idle injection ratio
|
||||
b) determine the amount of compensation needed at each target ratio
|
||||
|
||||
Compensation to each target ratio consists of two parts:
|
||||
|
||||
a) steady state error compensation
|
||||
This is to offset the error occurring when the system can
|
||||
enter idle without extra wakeups (such as external interrupts).
|
||||
|
||||
b) dynamic error compensation
|
||||
When an excessive amount of wakeups occurs during idle, an
|
||||
additional idle ratio can be added to quiet interrupts, by
|
||||
slowing down CPU activities.
|
||||
|
||||
A debugfs file is provided for the user to examine compensation
|
||||
progress and results, such as on a Westmere system::
|
||||
|
||||
[jacob@nex01 ~]$ cat
|
||||
/sys/kernel/debug/intel_powerclamp/powerclamp_calib
|
||||
controlling cpu: 0
|
||||
pct confidence steady dynamic (compensation)
|
||||
0 0 0 0
|
||||
1 1 0 0
|
||||
2 1 1 0
|
||||
3 3 1 0
|
||||
4 3 1 0
|
||||
5 3 1 0
|
||||
6 3 1 0
|
||||
7 3 1 0
|
||||
8 3 1 0
|
||||
...
|
||||
30 3 2 0
|
||||
31 3 2 0
|
||||
32 3 1 0
|
||||
33 3 2 0
|
||||
34 3 1 0
|
||||
35 3 2 0
|
||||
36 3 1 0
|
||||
37 3 2 0
|
||||
38 3 1 0
|
||||
39 3 2 0
|
||||
40 3 3 0
|
||||
41 3 1 0
|
||||
42 3 2 0
|
||||
43 3 1 0
|
||||
44 3 1 0
|
||||
45 3 2 0
|
||||
46 3 3 0
|
||||
47 3 0 0
|
||||
48 3 2 0
|
||||
49 3 3 0
|
||||
|
||||
Calibration occurs during runtime. No offline method is available.
|
||||
Steady state compensation is used only when confidence levels of all
|
||||
adjacent ratios have reached satisfactory level. A confidence level
|
||||
is accumulated based on clean data collected at runtime. Data
|
||||
collected during a period without extra interrupts is considered
|
||||
clean.
|
||||
|
||||
To compensate for excessive amounts of wakeup during idle, additional
|
||||
idle time is injected when such a condition is detected. Currently,
|
||||
we have a simple algorithm to double the injection ratio. A possible
|
||||
enhancement might be to throttle the offending IRQ, such as delaying
|
||||
EOI for level triggered interrupts. But it is a challenge to be
|
||||
non-intrusive to the scheduler or the IRQ core code.
|
||||
|
||||
|
||||
CPU Online/Offline
|
||||
------------------
|
||||
Per-CPU kernel threads are started/stopped upon receiving
|
||||
notifications of CPU hotplug activities. The intel_powerclamp driver
|
||||
keeps track of clamping kernel threads, even after they are migrated
|
||||
to other CPUs, after a CPU offline event.
|
||||
|
||||
|
||||
Performance Analysis
|
||||
====================
|
||||
This section describes the general performance data collected on
|
||||
multiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P).
|
||||
|
||||
Effectiveness and Limitations
|
||||
-----------------------------
|
||||
The maximum range that idle injection is allowed is capped at 50
|
||||
percent. As mentioned earlier, since interrupts are allowed during
|
||||
forced idle time, excessive interrupts could result in less
|
||||
effectiveness. The extreme case would be doing a ping -f to generated
|
||||
flooded network interrupts without much CPU acknowledgement. In this
|
||||
case, little can be done from the idle injection threads. In most
|
||||
normal cases, such as scp a large file, applications can be throttled
|
||||
by the powerclamp driver, since slowing down the CPU also slows down
|
||||
network protocol processing, which in turn reduces interrupts.
|
||||
|
||||
When control parameters change at runtime by the controlling CPU, it
|
||||
may take an additional period for the rest of the CPUs to catch up
|
||||
with the changes. During this time, idle injection is out of sync,
|
||||
thus not able to enter package C- states at the expected ratio. But
|
||||
this effect is minor, in that in most cases change to the target
|
||||
ratio is updated much less frequently than the idle injection
|
||||
frequency.
|
||||
|
||||
Scalability
|
||||
-----------
|
||||
Tests also show a minor, but measurable, difference between the 4P/8P
|
||||
Ivy Bridge system and the 80P Westmere server under 50% idle ratio.
|
||||
More compensation is needed on Westmere for the same amount of
|
||||
target idle ratio. The compensation also increases as the idle ratio
|
||||
gets larger. The above reason constitutes the need for the
|
||||
calibration code.
|
||||
|
||||
On the IVB 8P system, compared to an offline CPU, powerclamp can
|
||||
achieve up to 40% better performance per watt. (measured by a spin
|
||||
counter summed over per CPU counting threads spawned for all running
|
||||
CPUs).
|
||||
|
||||
Usage and Interfaces
|
||||
====================
|
||||
The powerclamp driver is registered to the generic thermal layer as a
|
||||
cooling device. Currently, it’s not bound to any thermal zones::
|
||||
|
||||
jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . *
|
||||
cur_state:0
|
||||
max_state:50
|
||||
type:intel_powerclamp
|
||||
|
||||
cur_state allows user to set the desired idle percentage. Writing 0 to
|
||||
cur_state will stop idle injection. Writing a value between 1 and
|
||||
max_state will start the idle injection. Reading cur_state returns the
|
||||
actual and current idle percentage. This may not be the same value
|
||||
set by the user in that current idle percentage depends on workload
|
||||
and includes natural idle. When idle injection is disabled, reading
|
||||
cur_state returns value -1 instead of 0 which is to avoid confusing
|
||||
100% busy state with the disabled state.
|
||||
|
||||
Example usage:
|
||||
- To inject 25% idle time::
|
||||
|
||||
$ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state
|
||||
|
||||
If the system is not busy and has more than 25% idle time already,
|
||||
then the powerclamp driver will not start idle injection. Using Top
|
||||
will not show idle injection kernel threads.
|
||||
|
||||
If the system is busy (spin test below) and has less than 25% natural
|
||||
idle time, powerclamp kernel threads will do idle injection. Forced
|
||||
idle time is accounted as normal idle in that common code path is
|
||||
taken as the idle task.
|
||||
|
||||
In this example, 24.1% idle is shown. This helps the system admin or
|
||||
user determine the cause of slowdown, when a powerclamp driver is in action::
|
||||
|
||||
|
||||
Tasks: 197 total, 1 running, 196 sleeping, 0 stopped, 0 zombie
|
||||
Cpu(s): 71.2%us, 4.7%sy, 0.0%ni, 24.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
|
||||
Mem: 3943228k total, 1689632k used, 2253596k free, 74960k buffers
|
||||
Swap: 4087804k total, 0k used, 4087804k free, 945336k cached
|
||||
|
||||
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
|
||||
3352 jacob 20 0 262m 644 428 S 286 0.0 0:17.16 spin
|
||||
3341 root -51 0 0 0 0 D 25 0.0 0:01.62 kidle_inject/0
|
||||
3344 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/3
|
||||
3342 root -51 0 0 0 0 D 25 0.0 0:01.61 kidle_inject/1
|
||||
3343 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/2
|
||||
2935 jacob 20 0 696m 125m 35m S 5 3.3 0:31.11 firefox
|
||||
1546 root 20 0 158m 20m 6640 S 3 0.5 0:26.97 Xorg
|
||||
2100 jacob 20 0 1223m 88m 30m S 3 2.3 0:23.68 compiz
|
||||
|
||||
Tests have shown that by using the powerclamp driver as a cooling
|
||||
device, a PID based userspace thermal controller can manage to
|
||||
control CPU temperature effectively, when no other thermal influence
|
||||
is added. For example, a UltraBook user can compile the kernel under
|
||||
certain temperature (below most active trip points).
|
96
Documentation/driver-api/thermal/nouveau_thermal.rst
Normal file
96
Documentation/driver-api/thermal/nouveau_thermal.rst
Normal file
|
@ -0,0 +1,96 @@
|
|||
=====================
|
||||
Kernel driver nouveau
|
||||
=====================
|
||||
|
||||
Supported chips:
|
||||
|
||||
* NV43+
|
||||
|
||||
Authors: Martin Peres (mupuf) <martin.peres@free.fr>
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
This driver allows to read the GPU core temperature, drive the GPU fan and
|
||||
set temperature alarms.
|
||||
|
||||
Currently, due to the absence of in-kernel API to access HWMON drivers, Nouveau
|
||||
cannot access any of the i2c external monitoring chips it may find. If you
|
||||
have one of those, temperature and/or fan management through Nouveau's HWMON
|
||||
interface is likely not to work. This document may then not cover your situation
|
||||
entirely.
|
||||
|
||||
Temperature management
|
||||
----------------------
|
||||
|
||||
Temperature is exposed under as a read-only HWMON attribute temp1_input.
|
||||
|
||||
In order to protect the GPU from overheating, Nouveau supports 4 configurable
|
||||
temperature thresholds:
|
||||
|
||||
* Fan_boost:
|
||||
Fan speed is set to 100% when reaching this temperature;
|
||||
* Downclock:
|
||||
The GPU will be downclocked to reduce its power dissipation;
|
||||
* Critical:
|
||||
The GPU is put on hold to further lower power dissipation;
|
||||
* Shutdown:
|
||||
Shut the computer down to protect your GPU.
|
||||
|
||||
WARNING:
|
||||
Some of these thresholds may not be used by Nouveau depending
|
||||
on your chipset.
|
||||
|
||||
The default value for these thresholds comes from the GPU's vbios. These
|
||||
thresholds can be configured thanks to the following HWMON attributes:
|
||||
|
||||
* Fan_boost: temp1_auto_point1_temp and temp1_auto_point1_temp_hyst;
|
||||
* Downclock: temp1_max and temp1_max_hyst;
|
||||
* Critical: temp1_crit and temp1_crit_hyst;
|
||||
* Shutdown: temp1_emergency and temp1_emergency_hyst.
|
||||
|
||||
NOTE: Remember that the values are stored as milli degrees Celsius. Don't forget
|
||||
to multiply!
|
||||
|
||||
Fan management
|
||||
--------------
|
||||
|
||||
Not all cards have a drivable fan. If you do, then the following HWMON
|
||||
attributes should be available:
|
||||
|
||||
* pwm1_enable:
|
||||
Current fan management mode (NONE, MANUAL or AUTO);
|
||||
* pwm1:
|
||||
Current PWM value (power percentage);
|
||||
* pwm1_min:
|
||||
The minimum PWM speed allowed;
|
||||
* pwm1_max:
|
||||
The maximum PWM speed allowed (bypassed when hitting Fan_boost);
|
||||
|
||||
You may also have the following attribute:
|
||||
|
||||
* fan1_input:
|
||||
Speed in RPM of your fan.
|
||||
|
||||
Your fan can be driven in different modes:
|
||||
|
||||
* 0: The fan is left untouched;
|
||||
* 1: The fan can be driven in manual (use pwm1 to change the speed);
|
||||
* 2; The fan is driven automatically depending on the temperature.
|
||||
|
||||
NOTE:
|
||||
Be sure to use the manual mode if you want to drive the fan speed manually
|
||||
|
||||
NOTE2:
|
||||
When operating in manual mode outside the vbios-defined
|
||||
[PWM_min, PWM_max] range, the reported fan speed (RPM) may not be accurate
|
||||
depending on your hardware.
|
||||
|
||||
Bug reports
|
||||
-----------
|
||||
|
||||
Thermal management on Nouveau is new and may not work on all cards. If you have
|
||||
inquiries, please ping mupuf on IRC (#nouveau, freenode).
|
||||
|
||||
Bug reports should be filled on Freedesktop's bug tracker. Please follow
|
||||
http://nouveau.freedesktop.org/wiki/Bugs
|
271
Documentation/driver-api/thermal/power_allocator.rst
Normal file
271
Documentation/driver-api/thermal/power_allocator.rst
Normal file
|
@ -0,0 +1,271 @@
|
|||
=================================
|
||||
Power allocator governor tunables
|
||||
=================================
|
||||
|
||||
Trip points
|
||||
-----------
|
||||
|
||||
The governor works optimally with the following two passive trip points:
|
||||
|
||||
1. "switch on" trip point: temperature above which the governor
|
||||
control loop starts operating. This is the first passive trip
|
||||
point of the thermal zone.
|
||||
|
||||
2. "desired temperature" trip point: it should be higher than the
|
||||
"switch on" trip point. This the target temperature the governor
|
||||
is controlling for. This is the last passive trip point of the
|
||||
thermal zone.
|
||||
|
||||
PID Controller
|
||||
--------------
|
||||
|
||||
The power allocator governor implements a
|
||||
Proportional-Integral-Derivative controller (PID controller) with
|
||||
temperature as the control input and power as the controlled output:
|
||||
|
||||
P_max = k_p * e + k_i * err_integral + k_d * diff_err + sustainable_power
|
||||
|
||||
where
|
||||
- e = desired_temperature - current_temperature
|
||||
- err_integral is the sum of previous errors
|
||||
- diff_err = e - previous_error
|
||||
|
||||
It is similar to the one depicted below::
|
||||
|
||||
k_d
|
||||
|
|
||||
current_temp |
|
||||
| v
|
||||
| +----------+ +---+
|
||||
| +----->| diff_err |-->| X |------+
|
||||
| | +----------+ +---+ |
|
||||
| | | tdp actor
|
||||
| | k_i | | get_requested_power()
|
||||
| | | | | | |
|
||||
| | | | | | | ...
|
||||
v | v v v v v
|
||||
+---+ | +-------+ +---+ +---+ +---+ +----------+
|
||||
| S |-----+----->| sum e |----->| X |--->| S |-->| S |-->|power |
|
||||
+---+ | +-------+ +---+ +---+ +---+ |allocation|
|
||||
^ | ^ +----------+
|
||||
| | | | |
|
||||
| | +---+ | | |
|
||||
| +------->| X |-------------------+ v v
|
||||
| +---+ granted performance
|
||||
desired_temperature ^
|
||||
|
|
||||
|
|
||||
k_po/k_pu
|
||||
|
||||
Sustainable power
|
||||
-----------------
|
||||
|
||||
An estimate of the sustainable dissipatable power (in mW) should be
|
||||
provided while registering the thermal zone. This estimates the
|
||||
sustained power that can be dissipated at the desired control
|
||||
temperature. This is the maximum sustained power for allocation at
|
||||
the desired maximum temperature. The actual sustained power can vary
|
||||
for a number of reasons. The closed loop controller will take care of
|
||||
variations such as environmental conditions, and some factors related
|
||||
to the speed-grade of the silicon. `sustainable_power` is therefore
|
||||
simply an estimate, and may be tuned to affect the aggressiveness of
|
||||
the thermal ramp. For reference, the sustainable power of a 4" phone
|
||||
is typically 2000mW, while on a 10" tablet is around 4500mW (may vary
|
||||
depending on screen size).
|
||||
|
||||
If you are using device tree, do add it as a property of the
|
||||
thermal-zone. For example::
|
||||
|
||||
thermal-zones {
|
||||
soc_thermal {
|
||||
polling-delay = <1000>;
|
||||
polling-delay-passive = <100>;
|
||||
sustainable-power = <2500>;
|
||||
...
|
||||
|
||||
Instead, if the thermal zone is registered from the platform code, pass a
|
||||
`thermal_zone_params` that has a `sustainable_power`. If no
|
||||
`thermal_zone_params` were being passed, then something like below
|
||||
will suffice::
|
||||
|
||||
static const struct thermal_zone_params tz_params = {
|
||||
.sustainable_power = 3500,
|
||||
};
|
||||
|
||||
and then pass `tz_params` as the 5th parameter to
|
||||
`thermal_zone_device_register()`
|
||||
|
||||
k_po and k_pu
|
||||
-------------
|
||||
|
||||
The implementation of the PID controller in the power allocator
|
||||
thermal governor allows the configuration of two proportional term
|
||||
constants: `k_po` and `k_pu`. `k_po` is the proportional term
|
||||
constant during temperature overshoot periods (current temperature is
|
||||
above "desired temperature" trip point). Conversely, `k_pu` is the
|
||||
proportional term constant during temperature undershoot periods
|
||||
(current temperature below "desired temperature" trip point).
|
||||
|
||||
These controls are intended as the primary mechanism for configuring
|
||||
the permitted thermal "ramp" of the system. For instance, a lower
|
||||
`k_pu` value will provide a slower ramp, at the cost of capping
|
||||
available capacity at a low temperature. On the other hand, a high
|
||||
value of `k_pu` will result in the governor granting very high power
|
||||
while temperature is low, and may lead to temperature overshooting.
|
||||
|
||||
The default value for `k_pu` is::
|
||||
|
||||
2 * sustainable_power / (desired_temperature - switch_on_temp)
|
||||
|
||||
This means that at `switch_on_temp` the output of the controller's
|
||||
proportional term will be 2 * `sustainable_power`. The default value
|
||||
for `k_po` is::
|
||||
|
||||
sustainable_power / (desired_temperature - switch_on_temp)
|
||||
|
||||
Focusing on the proportional and feed forward values of the PID
|
||||
controller equation we have::
|
||||
|
||||
P_max = k_p * e + sustainable_power
|
||||
|
||||
The proportional term is proportional to the difference between the
|
||||
desired temperature and the current one. When the current temperature
|
||||
is the desired one, then the proportional component is zero and
|
||||
`P_max` = `sustainable_power`. That is, the system should operate in
|
||||
thermal equilibrium under constant load. `sustainable_power` is only
|
||||
an estimate, which is the reason for closed-loop control such as this.
|
||||
|
||||
Expanding `k_pu` we get::
|
||||
|
||||
P_max = 2 * sustainable_power * (T_set - T) / (T_set - T_on) +
|
||||
sustainable_power
|
||||
|
||||
where:
|
||||
|
||||
- T_set is the desired temperature
|
||||
- T is the current temperature
|
||||
- T_on is the switch on temperature
|
||||
|
||||
When the current temperature is the switch_on temperature, the above
|
||||
formula becomes::
|
||||
|
||||
P_max = 2 * sustainable_power * (T_set - T_on) / (T_set - T_on) +
|
||||
sustainable_power = 2 * sustainable_power + sustainable_power =
|
||||
3 * sustainable_power
|
||||
|
||||
Therefore, the proportional term alone linearly decreases power from
|
||||
3 * `sustainable_power` to `sustainable_power` as the temperature
|
||||
rises from the switch on temperature to the desired temperature.
|
||||
|
||||
k_i and integral_cutoff
|
||||
-----------------------
|
||||
|
||||
`k_i` configures the PID loop's integral term constant. This term
|
||||
allows the PID controller to compensate for long term drift and for
|
||||
the quantized nature of the output control: cooling devices can't set
|
||||
the exact power that the governor requests. When the temperature
|
||||
error is below `integral_cutoff`, errors are accumulated in the
|
||||
integral term. This term is then multiplied by `k_i` and the result
|
||||
added to the output of the controller. Typically `k_i` is set low (1
|
||||
or 2) and `integral_cutoff` is 0.
|
||||
|
||||
k_d
|
||||
---
|
||||
|
||||
`k_d` configures the PID loop's derivative term constant. It's
|
||||
recommended to leave it as the default: 0.
|
||||
|
||||
Cooling device power API
|
||||
========================
|
||||
|
||||
Cooling devices controlled by this governor must supply the additional
|
||||
"power" API in their `cooling_device_ops`. It consists on three ops:
|
||||
|
||||
1. ::
|
||||
|
||||
int get_requested_power(struct thermal_cooling_device *cdev,
|
||||
struct thermal_zone_device *tz, u32 *power);
|
||||
|
||||
|
||||
@cdev:
|
||||
The `struct thermal_cooling_device` pointer
|
||||
@tz:
|
||||
thermal zone in which we are currently operating
|
||||
@power:
|
||||
pointer in which to store the calculated power
|
||||
|
||||
`get_requested_power()` calculates the power requested by the device
|
||||
in milliwatts and stores it in @power . It should return 0 on
|
||||
success, -E* on failure. This is currently used by the power
|
||||
allocator governor to calculate how much power to give to each cooling
|
||||
device.
|
||||
|
||||
2. ::
|
||||
|
||||
int state2power(struct thermal_cooling_device *cdev, struct
|
||||
thermal_zone_device *tz, unsigned long state,
|
||||
u32 *power);
|
||||
|
||||
@cdev:
|
||||
The `struct thermal_cooling_device` pointer
|
||||
@tz:
|
||||
thermal zone in which we are currently operating
|
||||
@state:
|
||||
A cooling device state
|
||||
@power:
|
||||
pointer in which to store the equivalent power
|
||||
|
||||
Convert cooling device state @state into power consumption in
|
||||
milliwatts and store it in @power. It should return 0 on success, -E*
|
||||
on failure. This is currently used by thermal core to calculate the
|
||||
maximum power that an actor can consume.
|
||||
|
||||
3. ::
|
||||
|
||||
int power2state(struct thermal_cooling_device *cdev, u32 power,
|
||||
unsigned long *state);
|
||||
|
||||
@cdev:
|
||||
The `struct thermal_cooling_device` pointer
|
||||
@power:
|
||||
power in milliwatts
|
||||
@state:
|
||||
pointer in which to store the resulting state
|
||||
|
||||
Calculate a cooling device state that would make the device consume at
|
||||
most @power mW and store it in @state. It should return 0 on success,
|
||||
-E* on failure. This is currently used by the thermal core to convert
|
||||
a given power set by the power allocator governor to a state that the
|
||||
cooling device can set. It is a function because this conversion may
|
||||
depend on external factors that may change so this function should the
|
||||
best conversion given "current circumstances".
|
||||
|
||||
Cooling device weights
|
||||
----------------------
|
||||
|
||||
Weights are a mechanism to bias the allocation among cooling
|
||||
devices. They express the relative power efficiency of different
|
||||
cooling devices. Higher weight can be used to express higher power
|
||||
efficiency. Weighting is relative such that if each cooling device
|
||||
has a weight of one they are considered equal. This is particularly
|
||||
useful in heterogeneous systems where two cooling devices may perform
|
||||
the same kind of compute, but with different efficiency. For example,
|
||||
a system with two different types of processors.
|
||||
|
||||
If the thermal zone is registered using
|
||||
`thermal_zone_device_register()` (i.e., platform code), then weights
|
||||
are passed as part of the thermal zone's `thermal_bind_parameters`.
|
||||
If the platform is registered using device tree, then they are passed
|
||||
as the `contribution` property of each map in the `cooling-maps` node.
|
||||
|
||||
Limitations of the power allocator governor
|
||||
===========================================
|
||||
|
||||
The power allocator governor's PID controller works best if there is a
|
||||
periodic tick. If you have a driver that calls
|
||||
`thermal_zone_device_update()` (or anything that ends up calling the
|
||||
governor's `throttle()` function) repetitively, the governor response
|
||||
won't be very good. Note that this is not particular to this
|
||||
governor, step-wise will also misbehave if you call its throttle()
|
||||
faster than the normal thermal framework tick (due to interrupts for
|
||||
example) as it will overreact.
|
798
Documentation/driver-api/thermal/sysfs-api.rst
Normal file
798
Documentation/driver-api/thermal/sysfs-api.rst
Normal file
|
@ -0,0 +1,798 @@
|
|||
===================================
|
||||
Generic Thermal Sysfs driver How To
|
||||
===================================
|
||||
|
||||
Written by Sujith Thomas <sujith.thomas@intel.com>, Zhang Rui <rui.zhang@intel.com>
|
||||
|
||||
Updated: 2 January 2008
|
||||
|
||||
Copyright (c) 2008 Intel Corporation
|
||||
|
||||
|
||||
0. Introduction
|
||||
===============
|
||||
|
||||
The generic thermal sysfs provides a set of interfaces for thermal zone
|
||||
devices (sensors) and thermal cooling devices (fan, processor...) to register
|
||||
with the thermal management solution and to be a part of it.
|
||||
|
||||
This how-to focuses on enabling new thermal zone and cooling devices to
|
||||
participate in thermal management.
|
||||
This solution is platform independent and any type of thermal zone devices
|
||||
and cooling devices should be able to make use of the infrastructure.
|
||||
|
||||
The main task of the thermal sysfs driver is to expose thermal zone attributes
|
||||
as well as cooling device attributes to the user space.
|
||||
An intelligent thermal management application can make decisions based on
|
||||
inputs from thermal zone attributes (the current temperature and trip point
|
||||
temperature) and throttle appropriate devices.
|
||||
|
||||
- `[0-*]` denotes any positive number starting from 0
|
||||
- `[1-*]` denotes any positive number starting from 1
|
||||
|
||||
1. thermal sysfs driver interface functions
|
||||
===========================================
|
||||
|
||||
1.1 thermal zone device interface
|
||||
---------------------------------
|
||||
|
||||
::
|
||||
|
||||
struct thermal_zone_device
|
||||
*thermal_zone_device_register(char *type,
|
||||
int trips, int mask, void *devdata,
|
||||
struct thermal_zone_device_ops *ops,
|
||||
const struct thermal_zone_params *tzp,
|
||||
int passive_delay, int polling_delay))
|
||||
|
||||
This interface function adds a new thermal zone device (sensor) to
|
||||
/sys/class/thermal folder as `thermal_zone[0-*]`. It tries to bind all the
|
||||
thermal cooling devices registered at the same time.
|
||||
|
||||
type:
|
||||
the thermal zone type.
|
||||
trips:
|
||||
the total number of trip points this thermal zone supports.
|
||||
mask:
|
||||
Bit string: If 'n'th bit is set, then trip point 'n' is writeable.
|
||||
devdata:
|
||||
device private data
|
||||
ops:
|
||||
thermal zone device call-backs.
|
||||
|
||||
.bind:
|
||||
bind the thermal zone device with a thermal cooling device.
|
||||
.unbind:
|
||||
unbind the thermal zone device with a thermal cooling device.
|
||||
.get_temp:
|
||||
get the current temperature of the thermal zone.
|
||||
.set_trips:
|
||||
set the trip points window. Whenever the current temperature
|
||||
is updated, the trip points immediately below and above the
|
||||
current temperature are found.
|
||||
.get_mode:
|
||||
get the current mode (enabled/disabled) of the thermal zone.
|
||||
|
||||
- "enabled" means the kernel thermal management is
|
||||
enabled.
|
||||
- "disabled" will prevent kernel thermal driver action
|
||||
upon trip points so that user applications can take
|
||||
charge of thermal management.
|
||||
.set_mode:
|
||||
set the mode (enabled/disabled) of the thermal zone.
|
||||
.get_trip_type:
|
||||
get the type of certain trip point.
|
||||
.get_trip_temp:
|
||||
get the temperature above which the certain trip point
|
||||
will be fired.
|
||||
.set_emul_temp:
|
||||
set the emulation temperature which helps in debugging
|
||||
different threshold temperature points.
|
||||
tzp:
|
||||
thermal zone platform parameters.
|
||||
passive_delay:
|
||||
number of milliseconds to wait between polls when
|
||||
performing passive cooling.
|
||||
polling_delay:
|
||||
number of milliseconds to wait between polls when checking
|
||||
whether trip points have been crossed (0 for interrupt driven systems).
|
||||
|
||||
::
|
||||
|
||||
void thermal_zone_device_unregister(struct thermal_zone_device *tz)
|
||||
|
||||
This interface function removes the thermal zone device.
|
||||
It deletes the corresponding entry from /sys/class/thermal folder and
|
||||
unbinds all the thermal cooling devices it uses.
|
||||
|
||||
::
|
||||
|
||||
struct thermal_zone_device
|
||||
*thermal_zone_of_sensor_register(struct device *dev, int sensor_id,
|
||||
void *data,
|
||||
const struct thermal_zone_of_device_ops *ops)
|
||||
|
||||
This interface adds a new sensor to a DT thermal zone.
|
||||
This function will search the list of thermal zones described in
|
||||
device tree and look for the zone that refer to the sensor device
|
||||
pointed by dev->of_node as temperature providers. For the zone
|
||||
pointing to the sensor node, the sensor will be added to the DT
|
||||
thermal zone device.
|
||||
|
||||
The parameters for this interface are:
|
||||
|
||||
dev:
|
||||
Device node of sensor containing valid node pointer in
|
||||
dev->of_node.
|
||||
sensor_id:
|
||||
a sensor identifier, in case the sensor IP has more
|
||||
than one sensors
|
||||
data:
|
||||
a private pointer (owned by the caller) that will be
|
||||
passed back, when a temperature reading is needed.
|
||||
ops:
|
||||
`struct thermal_zone_of_device_ops *`.
|
||||
|
||||
============== =======================================
|
||||
get_temp a pointer to a function that reads the
|
||||
sensor temperature. This is mandatory
|
||||
callback provided by sensor driver.
|
||||
set_trips a pointer to a function that sets a
|
||||
temperature window. When this window is
|
||||
left the driver must inform the thermal
|
||||
core via thermal_zone_device_update.
|
||||
get_trend a pointer to a function that reads the
|
||||
sensor temperature trend.
|
||||
set_emul_temp a pointer to a function that sets
|
||||
sensor emulated temperature.
|
||||
============== =======================================
|
||||
|
||||
The thermal zone temperature is provided by the get_temp() function
|
||||
pointer of thermal_zone_of_device_ops. When called, it will
|
||||
have the private pointer @data back.
|
||||
|
||||
It returns error pointer if fails otherwise valid thermal zone device
|
||||
handle. Caller should check the return handle with IS_ERR() for finding
|
||||
whether success or not.
|
||||
|
||||
::
|
||||
|
||||
void thermal_zone_of_sensor_unregister(struct device *dev,
|
||||
struct thermal_zone_device *tzd)
|
||||
|
||||
This interface unregisters a sensor from a DT thermal zone which was
|
||||
successfully added by interface thermal_zone_of_sensor_register().
|
||||
This function removes the sensor callbacks and private data from the
|
||||
thermal zone device registered with thermal_zone_of_sensor_register()
|
||||
interface. It will also silent the zone by remove the .get_temp() and
|
||||
get_trend() thermal zone device callbacks.
|
||||
|
||||
::
|
||||
|
||||
struct thermal_zone_device
|
||||
*devm_thermal_zone_of_sensor_register(struct device *dev,
|
||||
int sensor_id,
|
||||
void *data,
|
||||
const struct thermal_zone_of_device_ops *ops)
|
||||
|
||||
This interface is resource managed version of
|
||||
thermal_zone_of_sensor_register().
|
||||
|
||||
All details of thermal_zone_of_sensor_register() described in
|
||||
section 1.1.3 is applicable here.
|
||||
|
||||
The benefit of using this interface to register sensor is that it
|
||||
is not require to explicitly call thermal_zone_of_sensor_unregister()
|
||||
in error path or during driver unbinding as this is done by driver
|
||||
resource manager.
|
||||
|
||||
::
|
||||
|
||||
void devm_thermal_zone_of_sensor_unregister(struct device *dev,
|
||||
struct thermal_zone_device *tzd)
|
||||
|
||||
This interface is resource managed version of
|
||||
thermal_zone_of_sensor_unregister().
|
||||
All details of thermal_zone_of_sensor_unregister() described in
|
||||
section 1.1.4 is applicable here.
|
||||
Normally this function will not need to be called and the resource
|
||||
management code will ensure that the resource is freed.
|
||||
|
||||
::
|
||||
|
||||
int thermal_zone_get_slope(struct thermal_zone_device *tz)
|
||||
|
||||
This interface is used to read the slope attribute value
|
||||
for the thermal zone device, which might be useful for platform
|
||||
drivers for temperature calculations.
|
||||
|
||||
::
|
||||
|
||||
int thermal_zone_get_offset(struct thermal_zone_device *tz)
|
||||
|
||||
This interface is used to read the offset attribute value
|
||||
for the thermal zone device, which might be useful for platform
|
||||
drivers for temperature calculations.
|
||||
|
||||
1.2 thermal cooling device interface
|
||||
------------------------------------
|
||||
|
||||
|
||||
::
|
||||
|
||||
struct thermal_cooling_device
|
||||
*thermal_cooling_device_register(char *name,
|
||||
void *devdata, struct thermal_cooling_device_ops *)
|
||||
|
||||
This interface function adds a new thermal cooling device (fan/processor/...)
|
||||
to /sys/class/thermal/ folder as `cooling_device[0-*]`. It tries to bind itself
|
||||
to all the thermal zone devices registered at the same time.
|
||||
|
||||
name:
|
||||
the cooling device name.
|
||||
devdata:
|
||||
device private data.
|
||||
ops:
|
||||
thermal cooling devices call-backs.
|
||||
|
||||
.get_max_state:
|
||||
get the Maximum throttle state of the cooling device.
|
||||
.get_cur_state:
|
||||
get the Currently requested throttle state of the
|
||||
cooling device.
|
||||
.set_cur_state:
|
||||
set the Current throttle state of the cooling device.
|
||||
|
||||
::
|
||||
|
||||
void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
|
||||
|
||||
This interface function removes the thermal cooling device.
|
||||
It deletes the corresponding entry from /sys/class/thermal folder and
|
||||
unbinds itself from all the thermal zone devices using it.
|
||||
|
||||
1.3 interface for binding a thermal zone device with a thermal cooling device
|
||||
-----------------------------------------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
|
||||
int trip, struct thermal_cooling_device *cdev,
|
||||
unsigned long upper, unsigned long lower, unsigned int weight);
|
||||
|
||||
This interface function binds a thermal cooling device to a particular trip
|
||||
point of a thermal zone device.
|
||||
|
||||
This function is usually called in the thermal zone device .bind callback.
|
||||
|
||||
tz:
|
||||
the thermal zone device
|
||||
cdev:
|
||||
thermal cooling device
|
||||
trip:
|
||||
indicates which trip point in this thermal zone the cooling device
|
||||
is associated with.
|
||||
upper:
|
||||
the Maximum cooling state for this trip point.
|
||||
THERMAL_NO_LIMIT means no upper limit,
|
||||
and the cooling device can be in max_state.
|
||||
lower:
|
||||
the Minimum cooling state can be used for this trip point.
|
||||
THERMAL_NO_LIMIT means no lower limit,
|
||||
and the cooling device can be in cooling state 0.
|
||||
weight:
|
||||
the influence of this cooling device in this thermal
|
||||
zone. See 1.4.1 below for more information.
|
||||
|
||||
::
|
||||
|
||||
int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz,
|
||||
int trip, struct thermal_cooling_device *cdev);
|
||||
|
||||
This interface function unbinds a thermal cooling device from a particular
|
||||
trip point of a thermal zone device. This function is usually called in
|
||||
the thermal zone device .unbind callback.
|
||||
|
||||
tz:
|
||||
the thermal zone device
|
||||
cdev:
|
||||
thermal cooling device
|
||||
trip:
|
||||
indicates which trip point in this thermal zone the cooling device
|
||||
is associated with.
|
||||
|
||||
1.4 Thermal Zone Parameters
|
||||
---------------------------
|
||||
|
||||
::
|
||||
|
||||
struct thermal_bind_params
|
||||
|
||||
This structure defines the following parameters that are used to bind
|
||||
a zone with a cooling device for a particular trip point.
|
||||
|
||||
.cdev:
|
||||
The cooling device pointer
|
||||
.weight:
|
||||
The 'influence' of a particular cooling device on this
|
||||
zone. This is relative to the rest of the cooling
|
||||
devices. For example, if all cooling devices have a
|
||||
weight of 1, then they all contribute the same. You can
|
||||
use percentages if you want, but it's not mandatory. A
|
||||
weight of 0 means that this cooling device doesn't
|
||||
contribute to the cooling of this zone unless all cooling
|
||||
devices have a weight of 0. If all weights are 0, then
|
||||
they all contribute the same.
|
||||
.trip_mask:
|
||||
This is a bit mask that gives the binding relation between
|
||||
this thermal zone and cdev, for a particular trip point.
|
||||
If nth bit is set, then the cdev and thermal zone are bound
|
||||
for trip point n.
|
||||
.binding_limits:
|
||||
This is an array of cooling state limits. Must have
|
||||
exactly 2 * thermal_zone.number_of_trip_points. It is an
|
||||
array consisting of tuples <lower-state upper-state> of
|
||||
state limits. Each trip will be associated with one state
|
||||
limit tuple when binding. A NULL pointer means
|
||||
<THERMAL_NO_LIMITS THERMAL_NO_LIMITS> on all trips.
|
||||
These limits are used when binding a cdev to a trip point.
|
||||
.match:
|
||||
This call back returns success(0) if the 'tz and cdev' need to
|
||||
be bound, as per platform data.
|
||||
|
||||
::
|
||||
|
||||
struct thermal_zone_params
|
||||
|
||||
This structure defines the platform level parameters for a thermal zone.
|
||||
This data, for each thermal zone should come from the platform layer.
|
||||
This is an optional feature where some platforms can choose not to
|
||||
provide this data.
|
||||
|
||||
.governor_name:
|
||||
Name of the thermal governor used for this zone
|
||||
.no_hwmon:
|
||||
a boolean to indicate if the thermal to hwmon sysfs interface
|
||||
is required. when no_hwmon == false, a hwmon sysfs interface
|
||||
will be created. when no_hwmon == true, nothing will be done.
|
||||
In case the thermal_zone_params is NULL, the hwmon interface
|
||||
will be created (for backward compatibility).
|
||||
.num_tbps:
|
||||
Number of thermal_bind_params entries for this zone
|
||||
.tbp:
|
||||
thermal_bind_params entries
|
||||
|
||||
2. sysfs attributes structure
|
||||
=============================
|
||||
|
||||
== ================
|
||||
RO read only value
|
||||
WO write only value
|
||||
RW read/write value
|
||||
== ================
|
||||
|
||||
Thermal sysfs attributes will be represented under /sys/class/thermal.
|
||||
Hwmon sysfs I/F extension is also available under /sys/class/hwmon
|
||||
if hwmon is compiled in or built as a module.
|
||||
|
||||
Thermal zone device sys I/F, created once it's registered::
|
||||
|
||||
/sys/class/thermal/thermal_zone[0-*]:
|
||||
|---type: Type of the thermal zone
|
||||
|---temp: Current temperature
|
||||
|---mode: Working mode of the thermal zone
|
||||
|---policy: Thermal governor used for this zone
|
||||
|---available_policies: Available thermal governors for this zone
|
||||
|---trip_point_[0-*]_temp: Trip point temperature
|
||||
|---trip_point_[0-*]_type: Trip point type
|
||||
|---trip_point_[0-*]_hyst: Hysteresis value for this trip point
|
||||
|---emul_temp: Emulated temperature set node
|
||||
|---sustainable_power: Sustainable dissipatable power
|
||||
|---k_po: Proportional term during temperature overshoot
|
||||
|---k_pu: Proportional term during temperature undershoot
|
||||
|---k_i: PID's integral term in the power allocator gov
|
||||
|---k_d: PID's derivative term in the power allocator
|
||||
|---integral_cutoff: Offset above which errors are accumulated
|
||||
|---slope: Slope constant applied as linear extrapolation
|
||||
|---offset: Offset constant applied as linear extrapolation
|
||||
|
||||
Thermal cooling device sys I/F, created once it's registered::
|
||||
|
||||
/sys/class/thermal/cooling_device[0-*]:
|
||||
|---type: Type of the cooling device(processor/fan/...)
|
||||
|---max_state: Maximum cooling state of the cooling device
|
||||
|---cur_state: Current cooling state of the cooling device
|
||||
|---stats: Directory containing cooling device's statistics
|
||||
|---stats/reset: Writing any value resets the statistics
|
||||
|---stats/time_in_state_ms: Time (msec) spent in various cooling states
|
||||
|---stats/total_trans: Total number of times cooling state is changed
|
||||
|---stats/trans_table: Cooing state transition table
|
||||
|
||||
|
||||
Then next two dynamic attributes are created/removed in pairs. They represent
|
||||
the relationship between a thermal zone and its associated cooling device.
|
||||
They are created/removed for each successful execution of
|
||||
thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device.
|
||||
|
||||
::
|
||||
|
||||
/sys/class/thermal/thermal_zone[0-*]:
|
||||
|---cdev[0-*]: [0-*]th cooling device in current thermal zone
|
||||
|---cdev[0-*]_trip_point: Trip point that cdev[0-*] is associated with
|
||||
|---cdev[0-*]_weight: Influence of the cooling device in
|
||||
this thermal zone
|
||||
|
||||
Besides the thermal zone device sysfs I/F and cooling device sysfs I/F,
|
||||
the generic thermal driver also creates a hwmon sysfs I/F for each _type_
|
||||
of thermal zone device. E.g. the generic thermal driver registers one hwmon
|
||||
class device and build the associated hwmon sysfs I/F for all the registered
|
||||
ACPI thermal zones.
|
||||
|
||||
::
|
||||
|
||||
/sys/class/hwmon/hwmon[0-*]:
|
||||
|---name: The type of the thermal zone devices
|
||||
|---temp[1-*]_input: The current temperature of thermal zone [1-*]
|
||||
|---temp[1-*]_critical: The critical trip point of thermal zone [1-*]
|
||||
|
||||
Please read Documentation/hwmon/sysfs-interface.rst for additional information.
|
||||
|
||||
Thermal zone attributes
|
||||
-----------------------
|
||||
|
||||
type
|
||||
Strings which represent the thermal zone type.
|
||||
This is given by thermal zone driver as part of registration.
|
||||
E.g: "acpitz" indicates it's an ACPI thermal device.
|
||||
In order to keep it consistent with hwmon sys attribute; this should
|
||||
be a short, lowercase string, not containing spaces nor dashes.
|
||||
RO, Required
|
||||
|
||||
temp
|
||||
Current temperature as reported by thermal zone (sensor).
|
||||
Unit: millidegree Celsius
|
||||
RO, Required
|
||||
|
||||
mode
|
||||
One of the predefined values in [enabled, disabled].
|
||||
This file gives information about the algorithm that is currently
|
||||
managing the thermal zone. It can be either default kernel based
|
||||
algorithm or user space application.
|
||||
|
||||
enabled
|
||||
enable Kernel Thermal management.
|
||||
disabled
|
||||
Preventing kernel thermal zone driver actions upon
|
||||
trip points so that user application can take full
|
||||
charge of the thermal management.
|
||||
|
||||
RW, Optional
|
||||
|
||||
policy
|
||||
One of the various thermal governors used for a particular zone.
|
||||
|
||||
RW, Required
|
||||
|
||||
available_policies
|
||||
Available thermal governors which can be used for a particular zone.
|
||||
|
||||
RO, Required
|
||||
|
||||
`trip_point_[0-*]_temp`
|
||||
The temperature above which trip point will be fired.
|
||||
|
||||
Unit: millidegree Celsius
|
||||
|
||||
RO, Optional
|
||||
|
||||
`trip_point_[0-*]_type`
|
||||
Strings which indicate the type of the trip point.
|
||||
|
||||
E.g. it can be one of critical, hot, passive, `active[0-*]` for ACPI
|
||||
thermal zone.
|
||||
|
||||
RO, Optional
|
||||
|
||||
`trip_point_[0-*]_hyst`
|
||||
The hysteresis value for a trip point, represented as an integer
|
||||
Unit: Celsius
|
||||
RW, Optional
|
||||
|
||||
`cdev[0-*]`
|
||||
Sysfs link to the thermal cooling device node where the sys I/F
|
||||
for cooling device throttling control represents.
|
||||
|
||||
RO, Optional
|
||||
|
||||
`cdev[0-*]_trip_point`
|
||||
The trip point in this thermal zone which `cdev[0-*]` is associated
|
||||
with; -1 means the cooling device is not associated with any trip
|
||||
point.
|
||||
|
||||
RO, Optional
|
||||
|
||||
`cdev[0-*]_weight`
|
||||
The influence of `cdev[0-*]` in this thermal zone. This value
|
||||
is relative to the rest of cooling devices in the thermal
|
||||
zone. For example, if a cooling device has a weight double
|
||||
than that of other, it's twice as effective in cooling the
|
||||
thermal zone.
|
||||
|
||||
RW, Optional
|
||||
|
||||
passive
|
||||
Attribute is only present for zones in which the passive cooling
|
||||
policy is not supported by native thermal driver. Default is zero
|
||||
and can be set to a temperature (in millidegrees) to enable a
|
||||
passive trip point for the zone. Activation is done by polling with
|
||||
an interval of 1 second.
|
||||
|
||||
Unit: millidegrees Celsius
|
||||
|
||||
Valid values: 0 (disabled) or greater than 1000
|
||||
|
||||
RW, Optional
|
||||
|
||||
emul_temp
|
||||
Interface to set the emulated temperature method in thermal zone
|
||||
(sensor). After setting this temperature, the thermal zone may pass
|
||||
this temperature to platform emulation function if registered or
|
||||
cache it locally. This is useful in debugging different temperature
|
||||
threshold and its associated cooling action. This is write only node
|
||||
and writing 0 on this node should disable emulation.
|
||||
Unit: millidegree Celsius
|
||||
|
||||
WO, Optional
|
||||
|
||||
WARNING:
|
||||
Be careful while enabling this option on production systems,
|
||||
because userland can easily disable the thermal policy by simply
|
||||
flooding this sysfs node with low temperature values.
|
||||
|
||||
sustainable_power
|
||||
An estimate of the sustained power that can be dissipated by
|
||||
the thermal zone. Used by the power allocator governor. For
|
||||
more information see Documentation/driver-api/thermal/power_allocator.rst
|
||||
|
||||
Unit: milliwatts
|
||||
|
||||
RW, Optional
|
||||
|
||||
k_po
|
||||
The proportional term of the power allocator governor's PID
|
||||
controller during temperature overshoot. Temperature overshoot
|
||||
is when the current temperature is above the "desired
|
||||
temperature" trip point. For more information see
|
||||
Documentation/driver-api/thermal/power_allocator.rst
|
||||
|
||||
RW, Optional
|
||||
|
||||
k_pu
|
||||
The proportional term of the power allocator governor's PID
|
||||
controller during temperature undershoot. Temperature undershoot
|
||||
is when the current temperature is below the "desired
|
||||
temperature" trip point. For more information see
|
||||
Documentation/driver-api/thermal/power_allocator.rst
|
||||
|
||||
RW, Optional
|
||||
|
||||
k_i
|
||||
The integral term of the power allocator governor's PID
|
||||
controller. This term allows the PID controller to compensate
|
||||
for long term drift. For more information see
|
||||
Documentation/driver-api/thermal/power_allocator.rst
|
||||
|
||||
RW, Optional
|
||||
|
||||
k_d
|
||||
The derivative term of the power allocator governor's PID
|
||||
controller. For more information see
|
||||
Documentation/driver-api/thermal/power_allocator.rst
|
||||
|
||||
RW, Optional
|
||||
|
||||
integral_cutoff
|
||||
Temperature offset from the desired temperature trip point
|
||||
above which the integral term of the power allocator
|
||||
governor's PID controller starts accumulating errors. For
|
||||
example, if integral_cutoff is 0, then the integral term only
|
||||
accumulates error when temperature is above the desired
|
||||
temperature trip point. For more information see
|
||||
Documentation/driver-api/thermal/power_allocator.rst
|
||||
|
||||
Unit: millidegree Celsius
|
||||
|
||||
RW, Optional
|
||||
|
||||
slope
|
||||
The slope constant used in a linear extrapolation model
|
||||
to determine a hotspot temperature based off the sensor's
|
||||
raw readings. It is up to the device driver to determine
|
||||
the usage of these values.
|
||||
|
||||
RW, Optional
|
||||
|
||||
offset
|
||||
The offset constant used in a linear extrapolation model
|
||||
to determine a hotspot temperature based off the sensor's
|
||||
raw readings. It is up to the device driver to determine
|
||||
the usage of these values.
|
||||
|
||||
RW, Optional
|
||||
|
||||
Cooling device attributes
|
||||
-------------------------
|
||||
|
||||
type
|
||||
String which represents the type of device, e.g:
|
||||
|
||||
- for generic ACPI: should be "Fan", "Processor" or "LCD"
|
||||
- for memory controller device on intel_menlow platform:
|
||||
should be "Memory controller".
|
||||
|
||||
RO, Required
|
||||
|
||||
max_state
|
||||
The maximum permissible cooling state of this cooling device.
|
||||
|
||||
RO, Required
|
||||
|
||||
cur_state
|
||||
The current cooling state of this cooling device.
|
||||
The value can any integer numbers between 0 and max_state:
|
||||
|
||||
- cur_state == 0 means no cooling
|
||||
- cur_state == max_state means the maximum cooling.
|
||||
|
||||
RW, Required
|
||||
|
||||
stats/reset
|
||||
Writing any value resets the cooling device's statistics.
|
||||
WO, Required
|
||||
|
||||
stats/time_in_state_ms:
|
||||
The amount of time spent by the cooling device in various cooling
|
||||
states. The output will have "<state> <time>" pair in each line, which
|
||||
will mean this cooling device spent <time> msec of time at <state>.
|
||||
Output will have one line for each of the supported states. usertime
|
||||
units here is 10mS (similar to other time exported in /proc).
|
||||
RO, Required
|
||||
|
||||
|
||||
stats/total_trans:
|
||||
A single positive value showing the total number of times the state of a
|
||||
cooling device is changed.
|
||||
|
||||
RO, Required
|
||||
|
||||
stats/trans_table:
|
||||
This gives fine grained information about all the cooling state
|
||||
transitions. The cat output here is a two dimensional matrix, where an
|
||||
entry <i,j> (row i, column j) represents the number of transitions from
|
||||
State_i to State_j. If the transition table is bigger than PAGE_SIZE,
|
||||
reading this will return an -EFBIG error.
|
||||
RO, Required
|
||||
|
||||
3. A simple implementation
|
||||
==========================
|
||||
|
||||
ACPI thermal zone may support multiple trip points like critical, hot,
|
||||
passive, active. If an ACPI thermal zone supports critical, passive,
|
||||
active[0] and active[1] at the same time, it may register itself as a
|
||||
thermal_zone_device (thermal_zone1) with 4 trip points in all.
|
||||
It has one processor and one fan, which are both registered as
|
||||
thermal_cooling_device. Both are considered to have the same
|
||||
effectiveness in cooling the thermal zone.
|
||||
|
||||
If the processor is listed in _PSL method, and the fan is listed in _AL0
|
||||
method, the sys I/F structure will be built like this::
|
||||
|
||||
/sys/class/thermal:
|
||||
|thermal_zone1:
|
||||
|---type: acpitz
|
||||
|---temp: 37000
|
||||
|---mode: enabled
|
||||
|---policy: step_wise
|
||||
|---available_policies: step_wise fair_share
|
||||
|---trip_point_0_temp: 100000
|
||||
|---trip_point_0_type: critical
|
||||
|---trip_point_1_temp: 80000
|
||||
|---trip_point_1_type: passive
|
||||
|---trip_point_2_temp: 70000
|
||||
|---trip_point_2_type: active0
|
||||
|---trip_point_3_temp: 60000
|
||||
|---trip_point_3_type: active1
|
||||
|---cdev0: --->/sys/class/thermal/cooling_device0
|
||||
|---cdev0_trip_point: 1 /* cdev0 can be used for passive */
|
||||
|---cdev0_weight: 1024
|
||||
|---cdev1: --->/sys/class/thermal/cooling_device3
|
||||
|---cdev1_trip_point: 2 /* cdev1 can be used for active[0]*/
|
||||
|---cdev1_weight: 1024
|
||||
|
||||
|cooling_device0:
|
||||
|---type: Processor
|
||||
|---max_state: 8
|
||||
|---cur_state: 0
|
||||
|
||||
|cooling_device3:
|
||||
|---type: Fan
|
||||
|---max_state: 2
|
||||
|---cur_state: 0
|
||||
|
||||
/sys/class/hwmon:
|
||||
|hwmon0:
|
||||
|---name: acpitz
|
||||
|---temp1_input: 37000
|
||||
|---temp1_crit: 100000
|
||||
|
||||
4. Event Notification
|
||||
=====================
|
||||
|
||||
The framework includes a simple notification mechanism, in the form of a
|
||||
netlink event. Netlink socket initialization is done during the _init_
|
||||
of the framework. Drivers which intend to use the notification mechanism
|
||||
just need to call thermal_generate_netlink_event() with two arguments viz
|
||||
(originator, event). The originator is a pointer to struct thermal_zone_device
|
||||
from where the event has been originated. An integer which represents the
|
||||
thermal zone device will be used in the message to identify the zone. The
|
||||
event will be one of:{THERMAL_AUX0, THERMAL_AUX1, THERMAL_CRITICAL,
|
||||
THERMAL_DEV_FAULT}. Notification can be sent when the current temperature
|
||||
crosses any of the configured thresholds.
|
||||
|
||||
5. Export Symbol APIs
|
||||
=====================
|
||||
|
||||
5.1. get_tz_trend
|
||||
-----------------
|
||||
|
||||
This function returns the trend of a thermal zone, i.e the rate of change
|
||||
of temperature of the thermal zone. Ideally, the thermal sensor drivers
|
||||
are supposed to implement the callback. If they don't, the thermal
|
||||
framework calculated the trend by comparing the previous and the current
|
||||
temperature values.
|
||||
|
||||
5.2. get_thermal_instance
|
||||
-------------------------
|
||||
|
||||
This function returns the thermal_instance corresponding to a given
|
||||
{thermal_zone, cooling_device, trip_point} combination. Returns NULL
|
||||
if such an instance does not exist.
|
||||
|
||||
5.3. thermal_notify_framework
|
||||
-----------------------------
|
||||
|
||||
This function handles the trip events from sensor drivers. It starts
|
||||
throttling the cooling devices according to the policy configured.
|
||||
For CRITICAL and HOT trip points, this notifies the respective drivers,
|
||||
and does actual throttling for other trip points i.e ACTIVE and PASSIVE.
|
||||
The throttling policy is based on the configured platform data; if no
|
||||
platform data is provided, this uses the step_wise throttling policy.
|
||||
|
||||
5.4. thermal_cdev_update
|
||||
------------------------
|
||||
|
||||
This function serves as an arbitrator to set the state of a cooling
|
||||
device. It sets the cooling device to the deepest cooling state if
|
||||
possible.
|
||||
|
||||
6. thermal_emergency_poweroff
|
||||
=============================
|
||||
|
||||
On an event of critical trip temperature crossing. Thermal framework
|
||||
allows the system to shutdown gracefully by calling orderly_poweroff().
|
||||
In the event of a failure of orderly_poweroff() to shut down the system
|
||||
we are in danger of keeping the system alive at undesirably high
|
||||
temperatures. To mitigate this high risk scenario we program a work
|
||||
queue to fire after a pre-determined number of seconds to start
|
||||
an emergency shutdown of the device using the kernel_power_off()
|
||||
function. In case kernel_power_off() fails then finally
|
||||
emergency_restart() is called in the worst case.
|
||||
|
||||
The delay should be carefully profiled so as to give adequate time for
|
||||
orderly_poweroff(). In case of failure of an orderly_poweroff() the
|
||||
emergency poweroff kicks in after the delay has elapsed and shuts down
|
||||
the system.
|
||||
|
||||
If set to 0 emergency poweroff will not be supported. So a carefully
|
||||
profiled non-zero positive value is a must for emergerncy poweroff to be
|
||||
triggered.
|
|
@ -0,0 +1,55 @@
|
|||
===================================
|
||||
Kernel driver: x86_pkg_temp_thermal
|
||||
===================================
|
||||
|
||||
Supported chips:
|
||||
|
||||
* x86: with package level thermal management
|
||||
|
||||
(Verify using: CPUID.06H:EAX[bit 6] =1)
|
||||
|
||||
Authors: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
|
||||
|
||||
Reference
|
||||
---------
|
||||
|
||||
Intel® 64 and IA-32 Architectures Software Developer’s Manual (Jan, 2013):
|
||||
Chapter 14.6: PACKAGE LEVEL THERMAL MANAGEMENT
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
This driver register CPU digital temperature package level sensor as a thermal
|
||||
zone with maximum two user mode configurable trip points. Number of trip points
|
||||
depends on the capability of the package. Once the trip point is violated,
|
||||
user mode can receive notification via thermal notification mechanism and can
|
||||
take any action to control temperature.
|
||||
|
||||
|
||||
Threshold management
|
||||
--------------------
|
||||
Each package will register as a thermal zone under /sys/class/thermal.
|
||||
|
||||
Example::
|
||||
|
||||
/sys/class/thermal/thermal_zone1
|
||||
|
||||
This contains two trip points:
|
||||
|
||||
- trip_point_0_temp
|
||||
- trip_point_1_temp
|
||||
|
||||
User can set any temperature between 0 to TJ-Max temperature. Temperature units
|
||||
are in milli-degree Celsius. Refer to "Documentation/driver-api/thermal/sysfs-api.rst" for
|
||||
thermal sys-fs details.
|
||||
|
||||
Any value other than 0 in these trip points, can trigger thermal notifications.
|
||||
Setting 0, stops sending thermal notifications.
|
||||
|
||||
Thermal notifications:
|
||||
To get kobject-uevent notifications, set the thermal zone
|
||||
policy to "user_space".
|
||||
|
||||
For example::
|
||||
|
||||
echo -n "user_space" > policy
|
Loading…
Add table
Add a link
Reference in a new issue